English | 简体中文
coco_2017_train
, and tested on coco_2017_val
.The backbone models pretrained on ImageNet are available. All backbone models are pretrained on standard ImageNet-1k dataset and can be downloaded here.
Backbone | Type | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs |
---|---|---|---|---|---|---|---|---|
ResNet50 | Faster | 1 | 1x | 12.747 | 35.2 | - | model | config |
ResNet50 | Faster | 1 | 2x | 12.686 | 37.1 | - | model | config |
ResNet50 | Mask | 1 | 1x | 11.615 | 36.5 | 32.2 | model | config |
ResNet50 | Mask | 1 | 2x | 11.494 | 38.2 | 33.4 | model | config |
ResNet50-vd | Faster | 1 | 1x | 12.575 | 36.4 | - | model | config |
ResNet34-FPN | Faster | 2 | 1x | - | 36.7 | - | model | config |
ResNet34-vd-FPN | Faster | 2 | 1x | - | 37.4 | - | model | config |
ResNet50-FPN | Faster | 2 | 1x | 22.273 | 37.2 | - | model | config |
ResNet50-FPN | Faster | 2 | 2x | 22.297 | 37.7 | - | model | config |
ResNet50-FPN | Mask | 1 | 1x | 15.184 | 37.9 | 34.2 | model | config |
ResNet50-FPN | Mask | 1 | 2x | 15.881 | 38.7 | 34.7 | model | config |
ResNet50-FPN | Cascade Faster | 2 | 1x | 17.507 | 40.9 | - | model | config |
ResNet50-FPN | Cascade Mask | 1 | 1x | 12.43 | 41.3 | 35.5 | model | config |
ResNet50-vd-FPN | Faster | 2 | 2x | 21.847 | 38.9 | - | model | config |
ResNet50-vd-FPN | Mask | 1 | 2x | 15.825 | 39.8 | 35.4 | model | config |
CBResNet50-vd-FPN | Faster | 2 | 1x | - | 39.7 | - | model | config |
ResNet101 | Faster | 1 | 1x | 9.316 | 38.3 | - | model | config |
ResNet101-FPN | Faster | 1 | 1x | 17.297 | 38.7 | - | model | config |
ResNet101-FPN | Faster | 1 | 2x | 17.246 | 39.1 | - | model | config |
ResNet101-FPN | Mask | 1 | 1x | 12.983 | 39.5 | 35.2 | model | config |
ResNet101-vd-FPN | Faster | 1 | 1x | 17.011 | 40.5 | - | model | config |
ResNet101-vd-FPN | Faster | 1 | 2x | 16.934 | 40.8 | - | model | config |
ResNet101-vd-FPN | Mask | 1 | 1x | 13.105 | 41.4 | 36.8 | model | config |
CBResNet101-vd-FPN | Faster | 2 | 1x | - | 42.7 | - | model | config |
ResNeXt101-vd-64x4d-FPN | Faster | 1 | 1x | 8.815 | 42.2 | - | model | config |
ResNeXt101-vd-64x4d-FPN | Faster | 1 | 2x | 8.809 | 41.7 | - | model | config |
ResNeXt101-vd-64x4d-FPN | Mask | 1 | 1x | 7.689 | 42.9 | 37.9 | model | config |
ResNeXt101-vd-64x4d-FPN | Mask | 1 | 2x | 7.859 | 42.6 | 37.6 | model | config |
SENet154-vd-FPN | Faster | 1 | 1.44x | 3.408 | 42.9 | - | model | config |
SENet154-vd-FPN | Mask | 1 | 1.44x | 3.233 | 44.0 | 38.7 | model | config |
ResNet101-vd-FPN | CascadeClsAware Faster | 2 | 1x | - | 44.7(softnms) | - | model | config |
ResNet101-vd-FPN | CascadeClsAware Faster | 2 | 1x | - | 46.5(multi-scale test) | - | model | config |
Backbone | Type | Conv | Image/gpu | Lr schd | Inf time (fps) | Box AP | Mask AP | Download | Configs |
---|---|---|---|---|---|---|---|---|---|
ResNet50-FPN | Faster | c3-c5 | 2 | 1x | 19.978 | 41.0 | - | model | config |
ResNet50-vd-FPN | Faster | c3-c5 | 2 | 2x | 19.222 | 42.4 | - | model | config |
ResNet101-vd-FPN | Faster | c3-c5 | 2 | 1x | 14.477 | 44.1 | - | model | config |
ResNeXt101-vd-64x4d-FPN | Faster | c3-c5 | 1 | 1x | 7.209 | 45.2 | - | model | config |
ResNet50-FPN | Mask | c3-c5 | 1 | 1x | 14.53 | 41.9 | 37.3 | model | config |
ResNet50-vd-FPN | Mask | c3-c5 | 1 | 2x | 14.832 | 42.9 | 38.0 | model | config |
ResNet101-vd-FPN | Mask | c3-c5 | 1 | 1x | 11.546 | 44.6 | 39.2 | model | config |
ResNeXt101-vd-64x4d-FPN | Mask | c3-c5 | 1 | 1x | 6.45 | 46.2 | 40.4 | model | config |
ResNet50-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 44.2 | - | model | config |
ResNet101-vd-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 46.4 | - | model | config |
ResNeXt101-vd-FPN | Cascade Faster | c3-c5 | 2 | 1x | - | 47.3 | - | model | config |
SENet154-vd-FPN | Cascade Mask | c3-c5 | 1 | 1.44x | - | 51.9 | 43.9 | model | config |
ResNet200-vd-FPN-Nonlocal | CascadeClsAware Faster | c3-c5 | 1 | 2.5x | 3.103 | 51.7%(softnms) | - | model | config |
CBResNet200-vd-FPN-Nonlocal | Cascade Faster | c3-c5 | 1 | 2.5x | 1.68 | 53.3%(softnms) | - | model | config |
Notes:
c3-c5
means adding dcn
in resnet stage 3 to 5.Backbone | Type | Image/gpu | Lr schd | Box AP | Mask AP | Download | Configs |
---|---|---|---|---|---|---|---|
ResNet50-FPN | Faster | 2 | 2x | 39.7 | - | model | config |
ResNet50-FPN | Mask | 1 | 2x | 40.1 | 35.8 | model | config |
Notes:
Backbone | Pretrain dataset | Size | deformable Conv | Image/gpu | Lr schd | Inf time (fps) | Box AP | Download | Configs |
---|---|---|---|---|---|---|---|---|---|
DarkNet53 (paper) | ImageNet | 608 | False | 8 | 270e | - | 33.0 | - | - |
DarkNet53 (paper) | ImageNet | 416 | False | 8 | 270e | - | 31.0 | - | - |
DarkNet53 (paper) | ImageNet | 320 | False | 8 | 270e | - | 28.2 | - | - |
DarkNet53 | ImageNet | 608 | False | 8 | 270e | 45.571 | 38.9 | model | config |
DarkNet53 | ImageNet | 416 | False | 8 | 270e | - | 37.5 | model | config |
DarkNet53 | ImageNet | 320 | False | 8 | 270e | - | 34.8 | model | config |
MobileNet-V1 | ImageNet | 608 | False | 8 | 270e | 78.302 | 29.3 | model | config |
MobileNet-V1 | ImageNet | 416 | False | 8 | 270e | - | 29.3 | model | config |
MobileNet-V1 | ImageNet | 320 | False | 8 | 270e | - | 27.1 | model | config |
MobileNet-V3 | ImageNet | 608 | False | 8 | 270e | - | 31.6 | model | config |
MobileNet-V3 | ImageNet | 416 | False | 8 | 270e | - | 29.9 | model | config |
MobileNet-V3 | ImageNet | 320 | False | 8 | 270e | - | 27.1 | model | config |
ResNet34 | ImageNet | 608 | False | 8 | 270e | 63.356 | 36.2 | model | config |
ResNet34 | ImageNet | 416 | False | 8 | 270e | - | 34.3 | model | config |
ResNet34 | ImageNet | 320 | False | 8 | 270e | - | 31.4 | model | config |
ResNet50_vd | ImageNet | 608 | True | 8 | 270e | - | 39.1 | model | config |
ResNet50_vd | Object365 | 608 | True | 8 | 270e | - | 41.4 | model | config |
Backbone | Size | Image/gpu | Lr schd | Inf time (fps) | Box AP(0.5) | Download | Configs |
---|---|---|---|---|---|---|---|
DarkNet53 | 608 | 8 | 270e | 54.977 | 83.5 | model | config |
DarkNet53 | 416 | 8 | 270e | - | 83.6 | model | config |
DarkNet53 | 320 | 8 | 270e | - | 82.2 | model | config |
DarkNet53 Diou-Loss | 608 | 8 | 270e | - | 83.5 | model | config |
MobileNet-V1 | 608 | 8 | 270e | 104.291 | 76.2 | model | config |
MobileNet-V1 | 416 | 8 | 270e | - | 76.7 | model | config |
MobileNet-V1 | 320 | 8 | 270e | - | 75.3 | model | config |
ResNet34 | 608 | 8 | 270e | 82.247 | 82.6 | model | config |
ResNet34 | 416 | 8 | 270e | - | 81.9 | model | config |
ResNet34 | 320 | 8 | 270e | - | 80.1 | model | config |
Notes:
Backbone | Image/gpu | Lr schd | Inf time (fps) | Box AP | Download | Configs |
---|---|---|---|---|---|---|
ResNet50-FPN | 2 | 1x | - | 36.0 | model | config |
ResNet101-FPN | 2 | 1x | - | 37.3 | model | config |
ResNeXt101-vd-FPN | 1 | 1x | - | 40.5 | model | config |
Notes: In RetinaNet, the base LR is changed to 0.01 for minibatch size 16.
Scale | Image/gpu | Lr schd | Box AP | Download |
---|---|---|---|---|
EfficientDet-D0 | 16 | 300 epochs | 33.8 | model |
Notes: base LR is 0.16 for minibatch size 128 (8x16).
Backbone | Size | Image/gpu | Lr schd | Inf time (fps) | Box AP | Download | Configs |
---|---|---|---|---|---|---|---|
MobileNet_v1 | 300 | 64 | Cosine decay(40w) | - | 23.6 | model | config |
MobileNet_v3 small | 320 | 64 | Cosine decay(40w) | - | 16.2 | model | config |
MobileNet_v3 large | 320 | 64 | Cosine decay(40w) | - | 23.3 | model | config |
MobileNet_v3 small w/ FPN | 320 | 64 | Cosine decay(40w) | - | 18.9 | model | config |
MobileNet_v3 large w/ FPN | 320 | 64 | Cosine decay(40w) | - | 24.3 | model | config |
GhostNet | 320 | 64 | Cosine decay(40w) | - | 23.3 | model | config |
Notes: SSDLite
is trained in 8 GPU with total batch size as 512 and uses cosine decay strategy to train.
Backbone | Size | Image/gpu | Lr schd | Inf time (fps) | Box AP | Download | Configs |
---|---|---|---|---|---|---|---|
VGG16 | 300 | 8 | 40w | 81.613 | 25.1 | model | config |
VGG16 | 512 | 8 | 40w | 46.007 | 29.1 | model | config |
Notes: VGG-SSD is trained in 4 GPU with total batch size as 32 and trained 400000 iters.
Backbone | Size | Image/gpu | Lr schd | Inf time (fps) | Box AP(0.5) | Download | Configs |
---|---|---|---|---|---|---|---|
MobileNet v1 | 300 | 32 | 120e | 159.543 | 73.2 | model | config |
VGG16 | 300 | 8 | 240e | 117.279 | 77.5 | model | config |
VGG16 | 512 | 8 | 240e | 65.975 | 80.2 | model | config |
NOTE: MobileNet-SSD is trained in 2 GPU with totoal batch size as 64 and trained 120 epoches. VGG-SSD is trained in 4 GPU with total batch size as 32 and trained 240 epoches. SSD training data augmentations: randomly color distortion, randomly cropping, randomly expansion, randomly flipping.
Please refer face detection models for details.
Please refer Open Images Dataset V5 Baseline model for details.
Please refer Anchor Free Models for details.