YOLOv8 Pose

This document describes how to run YOLOv8 Pose on NPU.

info

Refer to Model Zoo Download for the example.

YOLOv8 Pose Example Directory Structure:

$ tree ./
./
├── CMakeLists.txt
├── convert_model
│   ├── config_yml.py
│   ├── convert_model_env.sh
│   ├── python
│   │   ├── onnx_extract.py
│   │   └── yolov8s-pose_640.txt
│   └── yolov8s-pose_9.txt
├── figures
│   ├── diff_img.png
│   └── out_yolov8_pose_pcq.png
├── main.cpp
├── model
│   └── COCO_train2014_000000500390.jpg
├── model_config.h
├── README.md
├── yolov8_pose_9_post.cpp
└── yolov8_pose_9_pre.cpp

Model Conversion

Configure Virtual Environment

X86 Linux PC

python -m venv .venv && source .venv/bin/activate
pip install ultralytics==8.1.0 onnxsim

Export ONNX Model

X86 Linux PC

cd convert_model/python/
yolo export model=yolov8s-pose.pt format=onnx dynamic=True opset=11

Fixed Shape

X86 Linux PC

python3 -m onnxsim yolov8s-pose.onnx yolov8s-pose_640.onnx --input-shape=1,3,640,640

Prune Model

X86 Linux PC

python3 onnx_extract.py
cd ..

Create Symlink for Conversion Script

X86 Linux PC

./convert_model_env.sh

Model Import/Quantization/Conversion

You need to enter the container development environment first. Refer to the Create Container section in Model Zoo Download.

info

Different platforms use corresponding Docker images:

A733: ubuntu-npu:v2.0.10.1
T527: ubuntu-npu:v1.8.11

X86 Linux PC

docker exec -it model-zoo /bin/bash

After entering the container, navigate to the corresponding directory and run the script.

X86 Linux PC

cd /workspace/examples/yolov8_pose/convert_model/

X86 Linux PC

./pegasus_import.sh yolov8s-pose_9
./pegasus_quantize.sh yolov8s-pose_9 uint8 12

A733
T527

X86 Linux PC

./pegasus_export_ovx_nbg.sh yolov8s-pose_9 uint8 a733

X86 Linux PC

./pegasus_export_ovx_nbg.sh yolov8s-pose_9 uint8 t527

The exported model files are stored in the ../model directory.

Compile Example

Now you can compile the example. First exit the container, then execute the following command to compile the example.

First, you need to configure third-party libraries and cross-compilation toolchain.

info

You can skip this step if you have already configured third-party libraries and cross-compilation toolchain in other examples.

X86 Linux PC

cd ../../../3rdparty/opencv/
unzip opencv-4.9.0-aarch64-linux-sunxi-glibc.zip
cd ../../0-toolchains/

You need to manually download via this link first, then place it in 0-toolchains/ before executing the following command:

X86 Linux PC

tar -xvf gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz

X86 Linux PC

cd ../examples/yolov8_pose/

A733
T527

X86 Linux PC

../build_linux.sh -t a733 -s debian11

X86 Linux PC

../build_linux.sh -t t527 -s debian11

Model Deployment

After compilation, the example will be installed in the install directory. You can use scp to transfer it to the board.

Configure NPU Driver

info

You can skip this step if you have already configured NPU driver in other examples.

Transfer the driver library to the board's lib directory via scp.

A733 corresponds to the common/lib_linux_aarch64/A733 directory
T527 corresponds to the common/lib_linux_aarch64/T527 directory

Then execute the following command to export to environment variables.

Radxa SBC

echo 'export LD_LIBRARY_PATH=$HOME/lib:$LD_LIBRARY_PATH' >> ~/.bashrc

Run Example

After configuring the driver, you can run the example.

tip

For T527 platform, you need to first enable NPU by referring to the A5E's "Enable NPU on Board" documentation, then use the following command to grant the current user permission to use /dev/vipcore.

Radxa SBC

sudo chmod 777 /dev/vipcore

A733
T527

Radxa SBC

cd yolov8_pose_demo_linux_a733/

Radxa SBC

chmod +x ./yolov8_pose_demo_a733
./yolov8_pose_demo_a733 -nb model/yolov8s-pose_9_uint8_a733.nb -i model/COCO_train2014_000000500390.jpg

The running result is as follows:

$ ./yolov8_pose_demo_a733 -nb model/yolov8s-pose_9_uint8_a733.nb -i model/COCO_train2014_000000500390.jpg
model_file=model/yolov8s-pose_9_uint8_a733.nb, input=model/COCO_train2014_000000500390.jpg, loop_count=1, malloc_mbyte=10
VIPLite driver software version 2.0.3.2-AW-2024-08-30
input  0 dim 3 640 640 1, data_format=2, quant_format=0, name=input/output[0], none-quant
output 0 dim 80 80 64 1, data_format=0, name=uid_17_out_0b_uid_1_out_0, none-quant
output 1 dim 80 80 1 1, data_format=0, name=uid_16_out_0b_uid_1_out_0, none-quant
output 2 dim 80 80 51 1, data_format=0, name=uid_15_out_0b_uid_1_out_0, none-quant
output 3 dim 40 40 64 1, data_format=0, name=uid_14_out_0b_uid_1_out_0, none-quant
output 4 dim 40 40 1 1, data_format=0, name=uid_13_out_0b_uid_1_out_0, none-quant
output 5 dim 40 40 51 1, data_format=0, name=uid_12_out_0b_uid_1_out_0, none-quant
output 6 dim 20 20 64 1, data_format=0, name=uid_11_out_0b_uid_1_out_0, none-quant
output 7 dim 20 20 1 1, data_format=0, name=uid_10_out_0b_uid_1_out_0, none-quant
output 8 dim 20 20 51 1, data_format=0, name=uid_9_out_0ub_uid_1_out_0, none-quant
nbg name=model/yolov8s-pose_9_uint8_a733.nb, size: 7768344.
create network 0: 18985 us.
prepare network: 5711 us.
buffer ptr: 0x28344380, buffer size: 1228800
network: 0, loop count: 1
run time for this network 0: 32958 us.
output 0, ptr 0x28470480, size 409600.
output 1, ptr 0x28600500, size 6400.
output 2, ptr 0x28606980, size 326400.
output 3, ptr 0x28745640, size 102400.
output 4, ptr 0x287a96c0, size 1600.
output 5, ptr 0x287ab040, size 81600.
output 6, ptr 0x287fabc0, size 25600.
output 7, ptr 0x28813c80, size 400.
output 8, ptr 0x28814340, size 20400.
post process time : 4 ms
detection num: 3
 0:  93%, [ 373,    1,  587,  346], person
411.58 41.32 = 0.98922
419.64 35.78 = 0.98396
416.36 37.19 = 0.76423
440.57 37.33 = 0.97060
422.52 38.08 = 0.12822
450.85 69.58 = 0.99924
422.50 75.59 = 0.99804
473.26 121.09 = 0.99354
405.71 108.13 = 0.95213
449.46 97.35 = 0.98640
389.93 81.26 = 0.92587
461.08 161.27 = 0.99969
461.53 162.40 = 0.99945
405.04 226.47 = 0.99954
489.77 240.76 = 0.99892
415.75 320.01 = 0.99481
555.85 276.33 = 0.99307
 0:  93%, [  86,   28,  288,  390], person
155.68 76.87 = 0.99271
162.21 68.34 = 0.98739
145.10 65.45 = 0.95864
175.03 64.92 = 0.91619
141.19 64.45 = 0.68796
199.98 93.83 = 0.99730
160.28 98.94 = 0.99395
214.27 138.31 = 0.99138
164.10 156.53 = 0.98026
175.57 174.47 = 0.98414
136.65 193.82 = 0.97464
216.19 199.03 = 0.99952
180.07 198.95 = 0.99935
240.79 270.84 = 0.99790
150.18 279.74 = 0.99727
293.96 281.26 = 0.98766
128.72 359.77 = 0.98534
 0:  91%, [ 227,   36,  398,  405], person
281.36 106.56 = 0.99230
287.73 97.59 = 0.98680
279.19 104.41 = 0.88281
308.31 83.30 = 0.95046
275.74 96.24 = 0.19450
328.64 102.08 = 0.99900
275.67 126.89 = 0.99741
373.27 126.41 = 0.99145
278.76 161.70 = 0.95360
382.76 163.57 = 0.98179
249.65 205.81 = 0.91721
332.66 214.56 = 0.99969
309.45 218.78 = 0.99948
293.53 304.20 = 0.99923
310.05 306.35 = 0.99817
279.65 380.53 = 0.99397
363.50 304.91 = 0.99248
destroy npu finished.
~NpuUint.

This performance data only calculates the time consumption of model inference. Unless otherwise specified, it does not include the time consumption of pre-processing and post-processing.

SoC	NPU	Model	Input Resolution	Network Creation Time	Network Preparation Time	Single Frame Inference Time	Post-processing Time	Total Time	Frame Rate
Allwinner A733	Vivante VIP9000	yolov8s-pose	640×640	19.0 ms	5.7 ms	33.0 ms	4.0 ms	61.7 ms	30.3 FPS

Radxa SBC

cd yolov8_pose_demo_linux_t527/

Radxa SBC

chmod +x ./yolov8_pose_demo_t527
./yolov8_pose_demo_t527 -nb model/yolov8s-pose_9_uint8_t527.nb -i model/COCO_train2014_000000500390.jpg

The running result is as follows:

$ ./yolov8_pose_demo_t527 -nb model/yolov8s-pose_9_uint8_t527.nb -i model/COCO_train2014_000000500390.jpg
model_file=model/yolov8s-pose_9_uint8_t527.nb, input=model/COCO_train2014_000000500390.jpg, loop_count=1, malloc_mbyte=10
VIPLite driver software version 1.13.0.0-AW-2023-10-19
input  0 dim 3 640 640 1, data_format=2, quant_format=0, name=input[0], none-quant
output 0 dim 80 80 64 1, data_format=0, name=uid_20000_sub_uid_1_out_0, none-quant
output 1 dim 80 80 1 1, data_format=0, name=uid_20001_sub_uid_1_out_0, none-quant
output 2 dim 80 80 51 1, data_format=0, name=uid_20002_sub_uid_1_out_0, none-quant
output 3 dim 40 40 64 1, data_format=0, name=uid_20003_sub_uid_1_out_0, none-quant
output 4 dim 40 40 1 1, data_format=0, name=uid_20004_sub_uid_1_out_0, none-quant
output 5 dim 40 40 51 1, data_format=0, name=uid_20005_sub_uid_1_out_0, none-quant
output 6 dim 20 20 64 1, data_format=0, name=uid_20006_sub_uid_1_out_0, none-quant
output 7 dim 20 20 1 1, data_format=0, name=uid_20007_sub_uid_1_out_0, none-quant
output 8 dim 20 20 51 1, data_format=0, name=uid_20008_sub_uid_1_out_0, none-quant
nbg name=model/yolov8s-pose_9_uint8_t527.nb, size: 10369024.
create network 0: 25460 us.
prepare network: 20276 us.
buffer ptr: 0x2a796380, buffer size: 1228800
network: 0, loop count: 1
run time for this network 0: 71053 us.
output 0, ptr 0x2a8c2440, size 409600.
output 1, ptr 0x2aa52500, size 6400.
output 2, ptr 0x2aa58980, size 326400.
output 3, ptr 0x2ab97600, size 102400.
output 4, ptr 0x2abfb680, size 1600.
output 5, ptr 0x2abfd040, size 81600.
output 6, ptr 0x2ac4cbc0, size 25600.
output 7, ptr 0x2ac65c40, size 400.
output 8, ptr 0x2ac66300, size 20400.
post process time : 11 ms
detection num: 3
 0:  93%, [ 373,    1,  587,  347], person
411.64 37.38 = 0.98651
419.72 33.34 = 0.98042
415.68 33.34 = 0.72047
435.88 33.34 = 0.97493
423.76 37.38 = 0.13819
452.04 65.66 = 0.99938
423.76 73.73 = 0.99807
472.24 118.17 = 0.99438
407.60 102.01 = 0.94482
452.04 93.93 = 0.98564
391.45 81.81 = 0.90655
460.12 158.56 = 0.99969
460.12 158.56 = 0.99945
403.56 227.24 = 0.99963
492.44 243.40 = 0.99904
411.64 320.15 = 0.99614
557.07 283.79 = 0.99438
 0:  93%, [  86,   28,  288,  389], person
155.96 77.46 = 0.99278
164.04 69.38 = 0.98809
143.84 65.34 = 0.95914
176.16 65.34 = 0.92141
139.80 65.34 = 0.68079
200.40 93.62 = 0.99736
160.00 97.66 = 0.99363
212.51 138.05 = 0.99128
164.04 158.25 = 0.97784
176.16 174.41 = 0.98374
135.76 194.60 = 0.97166
216.55 198.64 = 0.99949
180.20 198.64 = 0.99930
240.79 271.36 = 0.99781
151.92 279.44 = 0.99700
293.30 283.47 = 0.98732
127.68 356.19 = 0.98472
 0:  92%, [ 227,   36,  398,  406], person
279.60 105.73 = 0.99278
287.68 97.66 = 0.98809
279.60 105.73 = 0.88286
307.88 81.50 = 0.95390
275.56 97.66 = 0.18974
328.08 101.70 = 0.99904
275.56 125.93 = 0.99736
372.51 125.93 = 0.99181
279.60 162.29 = 0.94802
384.63 162.29 = 0.98270
251.33 206.72 = 0.91177
332.12 214.80 = 0.99969
307.88 218.84 = 0.99945
291.72 303.67 = 0.99925
307.88 307.71 = 0.99807
279.60 380.42 = 0.99438
364.44 303.67 = 0.99231
destroy npu finished.
~NpuUint.

This performance data only calculates the time consumption of model inference. Unless otherwise specified, it does not include the time consumption of pre-processing and post-processing.

SoC	NPU	Model	Input Resolution	Network Creation Time	Network Preparation Time	Single Frame Inference Time	Post-processing Time	Total Time	Frame Rate
Allwinner T527	Vivante VIP9000	yolov8s-pose	640×640	25.5 ms	20.3 ms	71.1 ms	11.0 ms	127.9 ms	14.1 FPS

Model Conversion​

Configure Virtual Environment​

Export ONNX Model​

Fixed Shape​

Prune Model​

Create Symlink for Conversion Script​

Model Import/Quantization/Conversion​

Compile Example​

Model Deployment​

Configure NPU Driver​

Run Example​

Model Conversion

Configure Virtual Environment

Export ONNX Model

Fixed Shape

Prune Model

Create Symlink for Conversion Script

Model Import/Quantization/Conversion

Compile Example

Model Deployment

Configure NPU Driver

Run Example