Model Zoo and Benchmark

Environment

Python 2.7.1
PaddlePaddle >=1.5
CUDA 9.0
cuDNN >=7.4
NCCL 2.1.2

Common settings

All models below were trained on coco_2017_train, and tested on coco_2017_val.
Batch Normalization layers in backbones are replaced by Affine Channel layers.
Unless otherwise noted, all ResNet backbones adopt the ResNet-B variant..
For RCNN and RetinaNet models, only horizontal flipping data augmentation was used in the training phase and no augmentations were used in the testing phase.
Inf time (fps): the inference time is measured with fps (image/s) on a single GPU (Tesla V100) with cuDNN 7.5 by running 'tools/eval.py' on all validation set, which including data loadding, network forward and post processing. The batch size is 1.

Training Schedules

We adopt exactly the same training schedules as Detectron.
1x indicates the schedule starts at a LR of 0.02 and is decreased by a factor of 10 after 60k and 80k iterations and eventually terminates at 90k iterations for minibatch size 16. For batch size 8, LR is decreased to 0.01, total training iterations are doubled, and the decay milestones are scaled by 2.
2x schedule is twice as long as 1x, with the LR milestones scaled accordingly.

ImageNet Pretrained Models

The backbone models pretrained on ImageNet are available. All backbone models are pretrained on standard ImageNet-1k dataset and can be downloaded here.

Notes: The ResNet50 model was trained with cosine LR decay schedule and can be downloaded here.

Baselines

Faster & Mask R-CNN

Backbone	Type	Image/gpu	Lr schd	Inf time (fps)	Box AP	Mask AP	Download	Configs
ResNet50	Faster	1	1x	12.747	35.2	-	model	config
ResNet50	Faster	1	2x	12.686	37.1	-	model	config
ResNet50	Mask	1	1x	11.615	36.5	32.2	model	config
ResNet50	Mask	1	2x	11.494	38.2	33.4	model	config
ResNet50-vd	Faster	1	1x	12.575	36.4	-	model	config
ResNet34-FPN	Faster	2	1x	-	36.7	-	model	config
ResNet34-vd-FPN	Faster	2	1x	-	37.4	-	model	config
ResNet50-FPN	Faster	2	1x	22.273	37.2	-	model	config
ResNet50-FPN	Faster	2	2x	22.297	37.7	-	model	config
ResNet50-FPN	Mask	1	1x	15.184	37.9	34.2	model	config
ResNet50-FPN	Mask	1	2x	15.881	38.7	34.7	model	config
ResNet50-FPN	Cascade Faster	2	1x	17.507	40.9	-	model	config
ResNet50-FPN	Cascade Mask	1	1x	12.43	41.3	35.5	model	config
ResNet50-vd-FPN	Faster	2	2x	21.847	38.9	-	model	config
ResNet50-vd-FPN	Mask	1	2x	15.825	39.8	35.4	model	config
CBResNet50-vd-FPN	Faster	2	1x	-	39.7	-	model	config
ResNet101	Faster	1	1x	9.316	38.3	-	model	config
ResNet101-FPN	Faster	1	1x	17.297	38.7	-	model	config
ResNet101-FPN	Faster	1	2x	17.246	39.1	-	model	config
ResNet101-FPN	Mask	1	1x	12.983	39.5	35.2	model	config
ResNet101-vd-FPN	Faster	1	1x	17.011	40.5	-	model	config
ResNet101-vd-FPN	Faster	1	2x	16.934	40.8	-	model	config
ResNet101-vd-FPN	Mask	1	1x	13.105	41.4	36.8	model	config
CBResNet101-vd-FPN	Faster	2	1x	-	42.7	-	model	config
ResNeXt101-vd-64x4d-FPN	Faster	1	1x	8.815	42.2	-	model	config
ResNeXt101-vd-64x4d-FPN	Faster	1	2x	8.809	41.7	-	model	config
ResNeXt101-vd-64x4d-FPN	Mask	1	1x	7.689	42.9	37.9	model	config
ResNeXt101-vd-64x4d-FPN	Mask	1	2x	7.859	42.6	37.6	model	config
SENet154-vd-FPN	Faster	1	1.44x	3.408	42.9	-	model	config
SENet154-vd-FPN	Mask	1	1.44x	3.233	44.0	38.7	model	config
ResNet101-vd-FPN	CascadeClsAware Faster	2	1x	-	44.7(softnms)	-	model	config
ResNet101-vd-FPN	CascadeClsAware Faster	2	1x	-	46.5(multi-scale test)	-	model	config

Deformable ConvNets v2

Backbone	Type	Conv	Image/gpu	Lr schd	Inf time (fps)	Box AP	Mask AP	Download	Configs
ResNet50-FPN	Faster	c3-c5	2	1x	19.978	41.0	-	model	config
ResNet50-vd-FPN	Faster	c3-c5	2	2x	19.222	42.4	-	model	config
ResNet101-vd-FPN	Faster	c3-c5	2	1x	14.477	44.1	-	model	config
ResNeXt101-vd-64x4d-FPN	Faster	c3-c5	1	1x	7.209	45.2	-	model	config
ResNet50-FPN	Mask	c3-c5	1	1x	14.53	41.9	37.3	model	config
ResNet50-vd-FPN	Mask	c3-c5	1	2x	14.832	42.9	38.0	model	config
ResNet101-vd-FPN	Mask	c3-c5	1	1x	11.546	44.6	39.2	model	config
ResNeXt101-vd-64x4d-FPN	Mask	c3-c5	1	1x	6.45	46.2	40.4	model	config
ResNet50-FPN	Cascade Faster	c3-c5	2	1x	-	44.2	-	model	config
ResNet101-vd-FPN	Cascade Faster	c3-c5	2	1x	-	46.4	-	model	config
ResNeXt101-vd-FPN	Cascade Faster	c3-c5	2	1x	-	47.3	-	model	config
SENet154-vd-FPN	Cascade Mask	c3-c5	1	1.44x	-	51.9	43.9	model	config
ResNet200-vd-FPN-Nonlocal	CascadeClsAware Faster	c3-c5	1	2.5x	3.103	51.7%(softnms)	-	model	config
CBResNet200-vd-FPN-Nonlocal	Cascade Faster	c3-c5	1	2.5x	1.68	53.3%(softnms)	-	model	config

Notes:

Deformable ConvNets v2(dcn_v2) reference from Deformable ConvNets v2.
c3-c5 means adding dcn in resnet stage 3 to 5.
Detailed configuration file in configs/dcn

HRNet

See more details in HRNet model zoo.

Res2Net

See more details in Res2Net model zoo.

IOU loss

GIOU loss and DIOU loss are included now. See more details in IOU loss model zoo.

GCNet

See more details in GCNet model zoo.

Libra R-CNN

See more details in Libra R-CNN model zoo.

Auto Augmentation

See more details in Auto Augmentation model zoo.

Group Normalization

Backbone	Type	Image/gpu	Lr schd	Box AP	Mask AP	Download	Configs
ResNet50-FPN	Faster	2	2x	39.7	-	model	config
ResNet50-FPN	Mask	1	2x	40.1	35.8	model	config

Notes:

Group Normalization reference from Group Normalization.
Detailed configuration file in configs/gn

YOLO v3

Backbone	Pretrain dataset	Size	deformable Conv	Image/gpu	Lr schd	Inf time (fps)	Box AP	Download	Configs
DarkNet53 (paper)	ImageNet	608	False	8	270e	-	33.0	-	-
DarkNet53 (paper)	ImageNet	416	False	8	270e	-	31.0	-	-
DarkNet53 (paper)	ImageNet	320	False	8	270e	-	28.2	-	-
DarkNet53	ImageNet	608	False	8	270e	45.571	38.9	model	config
DarkNet53	ImageNet	416	False	8	270e	-	37.5	model	config
DarkNet53	ImageNet	320	False	8	270e	-	34.8	model	config
MobileNet-V1	ImageNet	608	False	8	270e	78.302	29.3	model	config
MobileNet-V1	ImageNet	416	False	8	270e	-	29.3	model	config
MobileNet-V1	ImageNet	320	False	8	270e	-	27.1	model	config
MobileNet-V3	ImageNet	608	False	8	270e	-	31.6	model	config
MobileNet-V3	ImageNet	416	False	8	270e	-	29.9	model	config
MobileNet-V3	ImageNet	320	False	8	270e	-	27.1	model	config
ResNet34	ImageNet	608	False	8	270e	63.356	36.2	model	config
ResNet34	ImageNet	416	False	8	270e	-	34.3	model	config
ResNet34	ImageNet	320	False	8	270e	-	31.4	model	config
ResNet50_vd	ImageNet	608	True	8	270e	-	39.1	model	config
ResNet50_vd	Object365	608	True	8	270e	-	41.4	model	config

YOLO v3 on Pascal VOC

Backbone	Size	Image/gpu	Lr schd	Inf time (fps)	Box AP(0.5)	Download	Configs
DarkNet53	608	8	270e	54.977	83.5	model	config
DarkNet53	416	8	270e	-	83.6	model	config
DarkNet53	320	8	270e	-	82.2	model	config
DarkNet53 Diou-Loss	608	8	270e	-	83.5	model	config
MobileNet-V1	608	8	270e	104.291	76.2	model	config
MobileNet-V1	416	8	270e	-	76.7	model	config
MobileNet-V1	320	8	270e	-	75.3	model	config
ResNet34	608	8	270e	82.247	82.6	model	config
ResNet34	416	8	270e	-	81.9	model	config
ResNet34	320	8	270e	-	80.1	model	config

Notes:

YOLOv3-DarkNet53 performance in paper YOLOv3 is also provided above, our implements improved performance mainly by using L1 loss in bounding box width and height regression, image mixup and label smooth.
YOLO v3 is trained in 8 GPU with total batch size as 64 and trained 270 epoches. YOLO v3 training data augmentations: mixup, randomly color distortion, randomly cropping, randomly expansion, randomly interpolation method, randomly flippling. YOLO v3 used randomly reshaped minibatch in training, inferences can be performed on different image sizes with the same model weights, and we provided evaluation results of image size 608/416/320 above. Deformable conv is added on stage 5 of backbone.
Compared with YOLOv3-DarkNet53, the average AP of YOLOv3-DarkNet53 with Diou-Loss increases about 2% in VOC dataset.
YOLO v3 enhanced model improves the precision to 43.6 involved with deformable conv, dropblock, IoU loss and IoU aware. See more details in YOLOv3_ENHANCEMENT

RetinaNet

Backbone	Image/gpu	Lr schd	Inf time (fps)	Box AP	Download	Configs
ResNet50-FPN	2	1x	-	36.0	model	config
ResNet101-FPN	2	1x	-	37.3	model	config
ResNeXt101-vd-FPN	1	1x	-	40.5	model	config

Notes: In RetinaNet, the base LR is changed to 0.01 for minibatch size 16.

EfficientDet

Scale	Image/gpu	Lr schd	Box AP	Download
EfficientDet-D0	16	300 epochs	33.8	model

Notes: base LR is 0.16 for minibatch size 128 (8x16).

SSDLite

Backbone	Size	Image/gpu	Lr schd	Inf time (fps)	Box AP	Download	Configs
MobileNet_v1	300	64	Cosine decay(40w)	-	23.6	model	config
MobileNet_v3 small	320	64	Cosine decay(40w)	-	16.2	model	config
MobileNet_v3 large	320	64	Cosine decay(40w)	-	23.3	model	config
MobileNet_v3 small w/ FPN	320	64	Cosine decay(40w)	-	18.9	model	config
MobileNet_v3 large w/ FPN	320	64	Cosine decay(40w)	-	24.3	model	config
GhostNet	320	64	Cosine decay(40w)	-	23.3	model	config

Notes: SSDLite is trained in 8 GPU with total batch size as 512 and uses cosine decay strategy to train.

SSD

Backbone	Size	Image/gpu	Lr schd	Inf time (fps)	Box AP	Download	Configs
VGG16	300	8	40w	81.613	25.1	model	config
VGG16	512	8	40w	46.007	29.1	model	config

Notes: VGG-SSD is trained in 4 GPU with total batch size as 32 and trained 400000 iters.

SSD on Pascal VOC

Backbone	Size	Image/gpu	Lr schd	Inf time (fps)	Box AP(0.5)	Download	Configs
MobileNet v1	300	32	120e	159.543	73.2	model	config
VGG16	300	8	240e	117.279	77.5	model	config
VGG16	512	8	240e	65.975	80.2	model	config

NOTE: MobileNet-SSD is trained in 2 GPU with totoal batch size as 64 and trained 120 epoches. VGG-SSD is trained in 4 GPU with total batch size as 32 and trained 240 epoches. SSD training data augmentations: randomly color distortion, randomly cropping, randomly expansion, randomly flipping.

Face Detection

Please refer face detection models for details.

Object Detection in Open Images Dataset V5

Please refer Open Images Dataset V5 Baseline model for details.

Anchor Free Models

Please refer Anchor Free Models for details.

MODEL_ZOO.md 35 KB Permalink History Raw

Model Zoo and Benchmark

Environment

Common settings

Training Schedules

ImageNet Pretrained Models

Baselines

Faster & Mask R-CNN

Deformable ConvNets v2

HRNet

Res2Net

IOU loss

GCNet

Libra R-CNN

Auto Augmentation

Group Normalization

YOLO v3

YOLO v3 on Pascal VOC

RetinaNet

EfficientDet

SSDLite

SSD

SSD on Pascal VOC

Face Detection

Object Detection in Open Images Dataset V5

Anchor Free Models

MODEL_ZOO.md 35 KB

Permalink History Raw