In recent years, object detection tasks have attracted widespread attention. PaddleClas open-sourced the ResNet50_vd_SSLD pretrained model based on ImageNet(Top1 Acc 82.4%). And based on the pretrained model, PaddleDetection provided the PSS-DET (Practical Server-side detection) with the help of the rich operators in PaddleDetection. The inference speed can reach 61FPS on single V100 GPU when COCO mAP is 41.6%, and 20FPS when COCO mAP is 47.8%.
We take the standard Faster RCNN ResNet50_vd FPN
as an example. The following table shows ablation study of PSS-DET.
| Trick | Train scale | Test scale | COCO mAP | Infer speed/FPS |
|- |:-: |:-: | :-: | :-: |
| baseline
| 640x640 | 640x640 | 36.4% | 43.589 |
| +test proposal=pre/post topk 500/300
| 640x640 | 640x640 | 36.2% | 52.512 |
| +fpn channel=64
| 640x640 | 640x640 | 35.1% | 67.450 |
| +ssld pretrain
| 640x640 | 640x640 | 36.3% | 67.450 |
| +ciou loss
| 640x640 | 640x640 | 37.1% | 67.450 |
| +DCNv2
| 640x640 | 640x640 | 39.4% | 60.345 |
| +3x, multi-scale training
| 640x640 | 640x640 | 41.0% | 60.345 |
| +auto augment
| 640x640 | 640x640 | 41.4% | 60.345 |
| +libra sampling
| 640x640 | 640x640 | 41.6% | 60.345 |
And the following figure shows mAP-Speed
curves for some common detectors.
Note
For fair comparison, inference time for PSS-DET models on V100 GPU is transformed to Titan V GPU by multiplying by 1.2 times.
Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs |
---|---|---|---|---|---|---|---|---|
ResNet50-vd-FPN-Dcnv2 | Faster | 2 | 3x | 61.425 | 41.6 | - | model | config |
ResNet50-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 20.001 | 47.8 | - | model | config |
ResNet101-vd-FPN-Dcnv2 | Cascade Faster | 2 | 3x | 19.523 | 49.4 | - | model | config |
Attention: Pretrained models whose congigurations are in the directory generic
just support inference but do not support training and evaluation as now.