README_en.md 26 KB

Model Compression

In PaddleDetection, a complete tutorial and benchmarks for model compression based on PaddleSlim are provided. Currently supported methods:

It is recommended that you use a combination of prunning and distillation training, or use prunning and quantization for test model compression. The following takes YOLOv3 as an example to carry out cutting, distillation and quantization experiments.

Experimental Environment

  • Python 3.7+
  • PaddlePaddle >= 2.1.0
  • PaddleSlim >= 2.1.0
  • CUDA 10.1+
  • cuDNN >=7.6.5

Version Dependency between PaddleDetection, Paddle and PaddleSlim Version | PaddleDetection Version | PaddlePaddle Version | PaddleSlim Version | Note | | :---------------------: | :------------------: | :----------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | | release/2.1 | >= 2.1.0 | 2.1 | Quantitative model exports rely on the latest Paddle Develop branch, available inPaddlePaddle Daily version | | release/2.0 | >= 2.0.1 | 2.0 | Quantization depends on Paddle 2.1 and PaddleSlim 2.1 |

Install PaddleSlim

  • Method 1: Install it directly:

    pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
    
  • Method 2: Compile and install:

    git clone https://github.com/PaddlePaddle/PaddleSlim.git
    cd PaddleSlim
    python setup.py install
    

Quick Start

Train

python tools/train.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml}
  • -c: Specify the model configuration file.
  • --slim_config: Specify the compression policy profile.

Evaluation

python tools/eval.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final
  • -c: Specify the model configuration file.
  • --slim_config: Specify the compression policy profile.
  • -o weights: Specifies the path of the model trained by the compression algorithm.

Test

python tools/infer.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} \
    -o weights=output/{SLIM_CONFIG}/model_final
    --infer_img={IMAGE_PATH}
  • -c: Specify the model configuration file.
  • --slim_config: Specify the compression policy profile.
  • -o weights: Specifies the path of the model trained by the compression algorithm.
  • --infer_img: Specifies the test image path.

Full Chain Deployment

the model is derived from moving to static

python tools/export_model.py -c configs/{MODEL.yml} --slim_config configs/slim/{SLIM_CONFIG.yml} -o weights=output/{SLIM_CONFIG}/model_final
  • -c: Specify the model configuration file.
  • --slim_config: Specify the compression policy profile.
  • -o weights: Specifies the path of the model trained by the compression algorithm.

prediction and deployment

Benchmark

Prunning

Pascal VOC Benchmark

Model Compression Strategy GFLOPs Model Volume(MB) Input Size Predict Delay(SD855) Box AP Download Model Configuration File Compression Algorithm Configuration File
YOLOv3-MobileNetV1 baseline 24.13 93 608 332.0ms 75.1 link configuration file -
YOLOv3-MobileNetV1 剪裁-l1_norm(sensity) 15.78(-34.49%) 66(-29%) 608 - 78.4(+3.3) link configuration file slim configuration file

COCO Benchmark

Mode Compression Strategy GFLOPs Model Volume(MB) Input Size Predict Delay(SD855) Box AP Download Model Configuration File Compression Algorithm Configuration File
PP-YOLO-MobileNetV3_large baseline -- 18.5 608 25.1ms 23.2 link configuration file -
PP-YOLO-MobileNetV3_large 剪裁-FPGM -37% 12.6 608 - 22.3 link configuration file slim configuration file
YOLOv3-DarkNet53 baseline -- 238.2 608 - 39.0 link configuration file -
YOLOv3-DarkNet53 剪裁-FPGM -24% - 608 - 37.6 link configuration file slim configuration file
PP-YOLO_R50vd baseline -- 183.3 608 - 44.8 link configuration file -
PP-YOLO_R50vd 剪裁-FPGM -35% - 608 - 42.1 link configuration file slim configuration file

Description:

  • Currently, all models except RCNN series models are supported.
  • The SD855 predicts the delay for deployment using Paddle Lite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.

Quantitative

COCO Benchmark

Model Compression Strategy Input Size Model Volume(MB) Prediction Delay(V100) Prediction Delay(SD855) Box AP Download Download of Inference Model Model Configuration File Compression Algorithm Configuration File
PP-YOLOv2_R50vd baseline 640 208.6 19.1ms -- 49.1 link link Configuration File -
PP-YOLOv2_R50vd PACT Online quantitative 640 -- 17.3ms -- 48.1 link link Configuration File Configuration File
PP-YOLO_R50vd baseline 608 183.3 17.4ms -- 44.8 link link Configuration File -
PP-YOLO_R50vd PACT Online quantitative 608 67.3 13.8ms -- 44.3 link link Configuration File Configuration File
PP-YOLO-MobileNetV3_large baseline 320 18.5 2.7ms 27.9ms 23.2 link link Configuration File -
PP-YOLO-MobileNetV3_large Common Online quantitative 320 5.6 -- 25.1ms 24.3 link link Configuration File Configuration File
YOLOv3-MobileNetV1 baseline 608 94.2 8.9ms 332ms 29.4 link link Configuration File -
YOLOv3-MobileNetV1 Common Online quantitative 608 25.4 6.6ms 248ms 30.5 link link Configuration File slim Configuration File
YOLOv3-MobileNetV3 baseline 608 90.3 9.4ms 367.2ms 31.4 link link Configuration File -
YOLOv3-MobileNetV3 PACT Online quantitative 608 24.4 8.0ms 280.0ms 31.1 link link Configuration File slim Configuration File
YOLOv3-DarkNet53 baseline 608 238.2 16.0ms -- 39.0 link link Configuration File -
YOLOv3-DarkNet53 Common Online quantitative 608 78.8 12.4ms -- 38.8 link link Configuration File slim Configuration File
SSD-MobileNet_v1 baseline 300 22.5 4.4ms 26.6ms 73.8 link link Configuration File -
SSD-MobileNet_v1 Common Online quantitative 300 7.1 -- 21.5ms 72.9 link link Configuration File slim Configuration File
Mask-ResNet50-FPN baseline (800, 1333) 174.1 359.5ms -- 39.2/35.6 link link Configuration File -
Mask-ResNet50-FPN Common Online quantitative (800, 1333) -- -- -- 39.7(+0.5)/35.9(+0.3) link link Configuration File slim Configuration File

Description:

  • The above V100 prediction delay non-quantified model is tested by TensorRT FP32, and the quantified model is tested by TensorRT INT8, and both of them include NMS time.
  • The SD855 predicts the delay for deployment using PaddleLite, using the ARM8 architecture and using 4 Threads (4 Threads) to reason the delay.

Distillation

COCO Benchmark

Model Compression Strategy Input Size Box AP Download Model Configuration File Compression Strategy Configuration File
YOLOv3-MobileNetV1 baseline 608 29.4 link Configuration File -
YOLOv3-MobileNetV1 Distillation 608 31.0(+1.6) link Configuration File slimConfiguration File

Distillation Prunning Combined Strategy

COCO Benchmark

Model Compression Strategy Input Size GFLOPs Model Volume(MB) Prediction Delay(SD855) Box AP Download Model Configuration File Compression Algorithm Configuration File
YOLOv3-MobileNetV1 baseline 608 24.65 94.2 332.0ms 29.4 link Configuration File -
YOLOv3-MobileNetV1 Distillation + Tailoring 608 7.54(-69.4%) 30.9(-67.2%) 166.1ms 28.4(-1.0) link Configuration File slimConfiguration File