Model Compression

In PaddleDetection, a complete tutorial and benchmarks for model compression based on PaddleSlim are provided. Currently supported methods:

It is recommended that you use a combination of prunning and distillation training, or use prunning and quantization for test model compression. The following takes YOLOv3 as an example to carry out cutting, distillation and quantization experiments.

Experimental Environment

Python 3.7+
PaddlePaddle >= 2.1.0
PaddleSlim >= 2.1.0
CUDA 10.1+
cuDNN >=7.6.5

Version Dependency between PaddleDetection, Paddle and PaddleSlim Version | PaddleDetection Version | PaddlePaddle Version | PaddleSlim Version | Note | | :---------------------: | :------------------: | :----------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | | release/2.1 | >= 2.1.0 | 2.1 | Quantitative model exports rely on the latest Paddle Develop branch, available inPaddlePaddle Daily version | | release/2.0 | >= 2.0.1 | 2.0 | Quantization depends on Paddle 2.1 and PaddleSlim 2.1 |

Install PaddleSlim

Method 1: Install it directly：

pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple

Method 2: Compile and install：

git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddleSlim
python setup.py install

Quick Start

Train

python tools/train.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml}

-c: Specify the model configuration file.
--slim_config: Specify the compression policy profile.

Evaluation

python tools/eval.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final

-c: Specify the model configuration file.
--slim_config: Specify the compression policy profile.
-o weights: Specifies the path of the model trained by the compression algorithm.

Test

python tools/infer.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} \
    -o weights=output/{SLIM_CONFIG}/model_final
    --infer_img={IMAGE_PATH}

-c: Specify the model configuration file.
--slim_config: Specify the compression policy profile.
-o weights: Specifies the path of the model trained by the compression algorithm.
--infer_img: Specifies the test image path.

Full Chain Deployment

the model is derived from moving to static

python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final

-c: Specify the model configuration file.
--slim_config: Specify the compression policy profile.
-o weights: Specifies the path of the model trained by the compression algorithm.

prediction and deployment

Paddle-Inference Prediction：
Server deployment: UsedPaddleServing
Mobile deployment: UsePaddle-Lite Deploy it on the mobile terminal.

Benchmark

Prunning

Pascal VOC Benchmark

Model	Compression Strategy	GFLOPs	Model Volume(MB)	Input Size	Predict Delay(SD855)	Box AP	Download	Model Configuration File	Compression Algorithm Configuration File
YOLOv3-MobileNetV1	baseline	24.13	93	608	332.0ms	75.1	link	configuration file	-
YOLOv3-MobileNetV1	剪裁-l1_norm(sensity)	15.78(-34.49%)	66(-29%)	608	-	78.4(+3.3)	link	configuration file	slim configuration file

COCO Benchmark

Mode	Compression Strategy	GFLOPs	Model Volume(MB)	Input Size	Predict Delay(SD855)	Box AP	Download	Model Configuration File	Compression Algorithm Configuration File
PP-YOLO-MobileNetV3_large	baseline	--	18.5	608	25.1ms	23.2	link	configuration file	-
PP-YOLO-MobileNetV3_large	剪裁-FPGM	-37%	12.6	608	-	22.3	link	configuration file	slim configuration file
YOLOv3-DarkNet53	baseline	--	238.2	608	-	39.0	link	configuration file	-
YOLOv3-DarkNet53	剪裁-FPGM	-24%	-	608	-	37.6	link	configuration file	slim configuration file
PP-YOLO_R50vd	baseline	--	183.3	608	-	44.8	link	configuration file	-
PP-YOLO_R50vd	剪裁-FPGM	-35%	-	608	-	42.1	link	configuration file	slim configuration file

Description:

Currently, all models except RCNN series models are supported.
The SD855 predicts the delay for deployment using Paddle Lite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.

Quantitative

COCO Benchmark

Model	Compression Strategy	Input Size	Model Volume(MB)	Prediction Delay(V100)	Prediction Delay(SD855)	Box AP	Download	Download of Inference Model	Model Configuration File	Compression Algorithm Configuration File
PP-YOLOv2_R50vd	baseline	640	208.6	19.1ms	--	49.1	link	link	Configuration File	-
PP-YOLOv2_R50vd	PACT Online quantitative	640	--	17.3ms	--	48.1	link	link	Configuration File	Configuration File
PP-YOLO_R50vd	baseline	608	183.3	17.4ms	--	44.8	link	link	Configuration File	-
PP-YOLO_R50vd	PACT Online quantitative	608	67.3	13.8ms	--	44.3	link	link	Configuration File	Configuration File
PP-YOLO-MobileNetV3_large	baseline	320	18.5	2.7ms	27.9ms	23.2	link	link	Configuration File	-
PP-YOLO-MobileNetV3_large	Common Online quantitative	320	5.6	--	25.1ms	24.3	link	link	Configuration File	Configuration File
YOLOv3-MobileNetV1	baseline	608	94.2	8.9ms	332ms	29.4	link	link	Configuration File	-
YOLOv3-MobileNetV1	Common Online quantitative	608	25.4	6.6ms	248ms	30.5	link	link	Configuration File	slim Configuration File
YOLOv3-MobileNetV3	baseline	608	90.3	9.4ms	367.2ms	31.4	link	link	Configuration File	-
YOLOv3-MobileNetV3	PACT Online quantitative	608	24.4	8.0ms	280.0ms	31.1	link	link	Configuration File	slim Configuration File
YOLOv3-DarkNet53	baseline	608	238.2	16.0ms	--	39.0	link	link	Configuration File	-
YOLOv3-DarkNet53	Common Online quantitative	608	78.8	12.4ms	--	38.8	link	link	Configuration File	slim Configuration File
SSD-MobileNet_v1	baseline	300	22.5	4.4ms	26.6ms	73.8	link	link	Configuration File	-
SSD-MobileNet_v1	Common Online quantitative	300	7.1	--	21.5ms	72.9	link	link	Configuration File	slim Configuration File
Mask-ResNet50-FPN	baseline	(800, 1333)	174.1	359.5ms	--	39.2/35.6	link	link	Configuration File	-
Mask-ResNet50-FPN	Common Online quantitative	(800, 1333)	--	--	--	39.7(+0.5)/35.9(+0.3)	link	link	Configuration File	slim Configuration File

Description:

The above V100 prediction delay non-quantified model is tested by TensorRT FP32, and the quantified model is tested by TensorRT INT8, and both of them include NMS time.
The SD855 predicts the delay for deployment using PaddleLite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.

Distillation

COCO Benchmark

Model	Compression Strategy	Input Size	Box AP	Download	Model Configuration File	Compression Strategy Configuration File
YOLOv3-MobileNetV1	baseline	608	29.4	link	Configuration File	-
YOLOv3-MobileNetV1	Distillation	608	31.0(+1.6)	link	Configuration File	slimConfiguration File

Please refer to the specific distillation methodDistillation Policy Document

Distillation Prunning Combined Strategy

COCO Benchmark

Model	Compression Strategy	Input Size	GFLOPs	Model Volume(MB)	Prediction Delay(SD855)	Box AP	Download	Model Configuration File	Compression Algorithm Configuration File
YOLOv3-MobileNetV1	baseline	608	24.65	94.2	332.0ms	29.4	link	Configuration File	-
YOLOv3-MobileNetV1	Distillation + Tailoring	608	7.54(-69.4%)	30.9(-67.2%)	166.1ms	28.4(-1.0)	link	Configuration File	slimConfiguration File

README_en.md 26 KB History Raw

Model Compression

Experimental Environment

Install PaddleSlim

Quick Start

Train

Evaluation

Test

Full Chain Deployment

the model is derived from moving to static

prediction and deployment

Benchmark

Prunning

Pascal VOC Benchmark

COCO Benchmark

Quantitative

COCO Benchmark

Distillation

COCO Benchmark

Distillation Prunning Combined Strategy

COCO Benchmark

README_en.md 26 KB

History Raw