RetinaFace

本文档讲述如何在 NPU 上运行 RetinaFace。

信息

RetinaFace 示例目录结构：

$ tree ./
./
├── CMakeLists.txt
├── convert_model
│   ├── config_yml.py
│   ├── convert_model_env.sh
│   └── Retinaface_resnet50_320.txt
├── figures
│   └── out_retinaface.png
├── main.cpp
├── model
│   └── test.jpg
├── model_config.h
├── README.md
├── retinaface_post.cpp
└── retinaface_pre.cpp

模型转换

导出 onnx 模型

点击下载 Resnet50_Final.pth 。

下载 onnx 模型

可以下载修改好的模型。

点击下载 Retinaface_resnet50_320.onnx 。

点击下载 Retinaface_mobilenet0.25_320.onnx 。

然后移动到 convert_model/ 目录下。

创建转换脚本的软链接

X86 Linux PC

./convert_model_env.sh

模型导入/量化/转换

需要先进入容器开发环境。可以参考 Model Zoo 下载中创建容器这一部分。

信息

不同平台请使用对应的 Docker 镜像：

A733：ubuntu-npu:v2.0.10.1
T527：ubuntu-npu:v1.8.11

X86 Linux PC

docker exec -it model-zoo /bin/bash

进入容器对应目录之后运行脚本。

X86 Linux PC

cd /workspace/examples/retinaface/convert_model/

X86 Linux PC

./pegasus_import.sh Retinaface_resnet50_320
./pegasus_quantize.sh Retinaface_resnet50_320 uint8 10

A733
T527

X86 Linux PC

./pegasus_export_ovx_nbg.sh Retinaface_resnet50_320 uint8 a733

X86 Linux PC

./pegasus_export_ovx_nbg.sh Retinaface_resnet50_320 uint8 t527

导出的模型文件存放在../model目录。

编译示例

接下来可以编译示例，先 exit 退出容器，然后执行下面的命令编译示例。

首先需要配置第三方库和交叉编译工具链。

信息

如果你已经在其他示例中配置过第三方库和交叉编译工具链则可以跳过这一步。

X86 Linux PC

cd ../../../3rdparty/opencv/
unzip opencv-4.9.0-aarch64-linux-sunxi-glibc.zip
cd ../../0-toolchains/

需要先手动点击链接下载之后放到 0-toolchains/ 再执行下面的命令：

X86 Linux PC

tar -xvf gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.tar.xz

X86 Linux PC

cd ../examples/retinaface/

A733
T527

X86 Linux PC

../build_linux.sh -t a733 -s debian11

X86 Linux PC

../build_linux.sh -t t527 -s debian11

模型部署

编译示例完成之后，示例会安装到 install 目录，可以使用 scp 传输到板端。

配置 NPU 驱动

信息

如果你已经在其他示例中配置过 NPU 驱动则可以跳过这一步。

将驱动库 scp 传输到板端的 lib 目录。

A733 对应 common/lib_linux_aarch64/A733 目录
T527 对应 common/lib_linux_aarch64/T527 目录

然后执行下面的命令导出到环境变量。

Radxa SBC

echo 'export LD_LIBRARY_PATH=$HOME/lib:$LD_LIBRARY_PATH' >> ~/.bashrc

运行示例

配置好驱动之后就可以运行示例了。

提示

对于 T527 平台，你还需要参考 A5E 的板端启用 NPU文档先启用 NPU ，然后使用下面的命令增加当前用户使用 /dev/vipcore 的权限。

Radxa SBC

sudo chmod 777 /dev/vipcore

A733
T527

Radxa SBC

cd retinaface_demo_linux_a733/

Radxa SBC

chmod +x ./retinaface_demo_a733
./retinaface_demo_a733 -nb model/Retinaface_resnet50_320_uint8_a733.nb -i model/test.jpg

运行结果如下：

$ ./retinaface_demo_a733 -nb model/Retinaface_resnet50_320_uint8_a733.nb -i model/test.jpg
model_file=model/Retinaface_resnet50_320_uint8_a733.nb, input=model/test.jpg, loop_count=1, malloc_mbyte=10
VIPLite driver software version 2.0.3.2-AW-2024-08-30
input  0 dim 3 320 320 1, data_format=2, quant_format=0, name=input/output[0], none-quant
output 0 dim 4 4200 1 0, data_format=0, name=uid_20000_sub_uid_1_out_0, none-quant
output 1 dim 2 4200 1 0, data_format=0, name=uid_20001_sub_uid_1_out_0, none-quant
output 2 dim 10 4200 1 0, data_format=0, name=uid_20002_sub_uid_1_out_0, none-quant
nbg name=model/Retinaface_resnet50_320_uint8_a733.nb, size: 19056048.
create network 0: 20781 us.
prepare network: 2285 us.
buffer ptr: 0x25971380, buffer size: 307200
network: 0, loop count: 1
run time for this network 0: 15703 us.
output 0, ptr 0x259bc480, size 16800.
output 1, ptr 0x259ccb80, size 8400.
output 2, ptr 0x259d4f40, size 42000.
post process time : 0 ms
detection num: 1
100%, [ 244,   46,  363,  209], face
275.10 113.49
328.03 112.26
300.95 147.94
277.57 165.17
326.80 165.17
destroy npu finished.
~NpuUint.

此性能数据仅计算模型推理的时间消耗。如无特别说明，不包含预处理和后处理的时间消耗。

SoC	NPU	模型	输入分辨率	网络创建耗时	网络准备耗时	单帧推理耗时	后处理耗时	总耗时	帧率
全志 A733	Vivante VIP9000	Retinaface_resnet50	320×320	20.8 ms	2.3 ms	15.7 ms	0.0 ms	38.8 ms	63.7 FPS

Radxa SBC

cd retinaface_demo_linux_t527/

Radxa SBC

chmod +x ./retinaface_demo_t527
./retinaface_demo_t527 -nb model/Retinaface_resnet50_320_uint8_t527.nb -i model/test.jpg

运行结果如下：

$ ./retinaface_demo_t527 -nb model/Retinaface_resnet50_320_uint8_t527.nb -i model/test.jpg
model_file=model/Retinaface_resnet50_320_uint8_t527.nb, input=model/test.jpg, loop_count=1, malloc_mbyte=10
VIPLite driver software version 1.13.0.0-AW-2023-10-19
input  0 dim 3 320 320 1, data_format=2, quant_format=0, name=input[0], none-quant
output 0 dim 4 4200 1 0, data_format=0, name=uid_20000_sub_uid_1_out_0, none-quant
output 1 dim 2 4200 1 0, data_format=0, name=uid_20001_sub_uid_1_out_0, none-quant
output 2 dim 10 4200 1 0, data_format=0, name=uid_20002_sub_uid_1_out_0, none-quant
nbg name=model/Retinaface_resnet50_320_uint8_t527.nb, size: 18714688.
create network 0: 27602 us.
prepare network: 5276 us.
buffer ptr: 0x23c57380, buffer size: 307200
network: 0, loop count: 1
run time for this network 0: 30483 us.
output 0, ptr 0x23ca2440, size 16800.
output 1, ptr 0x23cb2b40, size 8400.
output 2, ptr 0x23cbaf40, size 42000.
post process time : 1 ms
detection num: 1
100%, [ 244,   45,  363,  208], face
275.10 113.49
328.03 112.26
300.95 147.94
277.57 166.40
326.80 165.17
destroy npu finished.
~NpuUint.

此性能数据仅计算模型推理的时间消耗。如无特别说明，不包含预处理和后处理的时间消耗。

SoC	NPU	模型	输入分辨率	网络创建耗时	网络准备耗时	单帧推理耗时	后处理耗时	总耗时	帧率
全志 T527	Vivante VIP9000	Retinaface_resnet50	320×320	27.6 ms	5.3 ms	30.5 ms	1.0 ms	64.4 ms	32.8 FPS

模型转换​

导出 onnx 模型​

下载 onnx 模型​

创建转换脚本的软链接​

模型导入/量化/转换​

编译示例​

模型部署​

配置 NPU 驱动​

运行示例​

模型转换

导出 onnx 模型

下载 onnx 模型

创建转换脚本的软链接

模型导入/量化/转换

编译示例

模型部署

配置 NPU 驱动

运行示例