3
0

KeypointBenchmark.md 3.0 KB

Keypoint Inference Benchmark

Benchmark on Server

We tested benchmarks in different runtime environments。 See the table below for details.

Model CPU + MKLDNN (thread=1) CPU + MKLDNN (thread=4) GPU TensorRT (FP32) TensorRT (FP16)
LiteHRNet-18-256x192 88.8 ms 40.7 ms 4.4 ms 2.0 ms 1.8 ms
LiteHRNet-18-384x288 188.0 ms 79.3 ms 4.8 ms 3.6 ms 3.2 ms
LiteHRNet-30-256x192 148.4 ms 69.0 ms 7.1 ms 3.1 ms 2.8 ms
LiteHRNet-30-384x288 309.8 ms 133.5 ms 8.2 ms 6.0 ms 5.3 ms
PP-TinyPose-128x96 25.2 ms 14.1 ms 2.7 ms 0.9 ms 0.8 ms
PP-TinyPose-256x192 82.4 ms 36.1 ms 3.0 ms 1.5 ms 1.1 ms

Notes:

  • These tests above are based Python deployment.
  • The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6.
  • The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8.
  • The time only includes inference time.
Model CPU + MKLDNN (thread=1) CPU + MKLDNN (thread=4) GPU TensorRT (FP32) TensorRT (FP16)
DARK_HRNet_w32-256x192 363.93 ms 97.38 ms 4.13 ms 3.74 ms 1.75 ms
DARK_HRNet_w32-384x288 823.71 ms 218.55 ms 9.44 ms 8.91 ms 2.96 ms
HRNet_w32-256x192 363.67 ms 97.64 ms 4.11 ms 3.71 ms 1.72 ms
HRNet_w32-256x256_mpii 485.56 ms 131.48 ms 4.81 ms 4.26 ms 2.00 ms
HRNet_w32-384x288 822.73 ms 215.48 ms 9.40 ms 8.81 ms 2.97 ms
PP-TinyPose-128x96 24.06 ms 13.05 ms 2.43 ms 0.75 ms 0.72 ms
PP-TinyPose-256x192 82.73 ms 36.25 ms 2.57 ms 1.38 ms 1.15 ms

Notes:

  • These tests above are based C++ deployment.
  • The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6.
  • The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8.
  • The time only includes inference time.

Benchmark on Mobile

We tested benchmarks on Kirin and Qualcomm Snapdragon devices. See the table below for details.

Model Kirin 980 (1-thread) Kirin 980 (4-threads) Qualcomm Snapdragon 845 (1-thread) Qualcomm Snapdragon 845 (4-threads) Qualcomm Snapdragon 660 (1-thread) Qualcomm Snapdragon 660 (4-threads)
PicoDet-s-192x192 (det) 14.85 ms 5.45 ms 17.50 ms 7.56 ms 80.08 ms 27.36 ms
PicoDet-s-320x320 (det) 38.09 ms 12.00 ms 45.26 ms 17.07 ms 232.81 ms 58.68 ms
PP-TinyPose-128x96 (pose) 12.03 ms 5.09 ms 13.14 ms 6.73 ms 71.87 ms 20.04 ms

Notes:

  • These tests above are based Paddle Lite deployment, and version is v2.10-rc.
  • The time only includes inference time.