ACUITY Toolkit Usage Example

This example uses a MobileNetV2 target recognition model in Keras format, and uses Acuity Toolkit to parse, quantize, and compile the model, generating project code, and using Vivante IDE for simulation, finally inferring on the VIP9000 series NPU.

NPU Version Table

Product	NPU	SoC	NPU Version	NPU Software Version
Radxa A5E	2 Tops	T527	v2	v1.13
Radxa A7A	3 Tops	A733	v3	v2.0

Download example repository

In ACUITY Docker container download example repository

Download repository

X86 Linux PC

git clone https://github.com/ZIFENG278/ai-sdk.git

Configure model compilation script

X86 Linux PC

cd ai-sdk/models
source env.sh v3 # NPU_VERSION
cp ../scripts/* .

tip

Specify NPU_VERSION, A733 specify v3, T527 specify v1.13. For NPU version selection, please refer to NPU version comparison table

Model parsing

Enter the MobileNetV2_Imagenet model example directory in the Docker container of the example repository.

X86 Linux PC

cd ai-sdk/models/MobileNetV2_Imagenet

This directory contains the following files:

MobilNetV2_Imagenet.h5 original model file (required).
channel_mean_value.txt model input mean and scale value file.
dataset.txt model quantization calibration dataset file.
space_shuttle_224x224.jpg test input image and calibration image in calibration dataset.
input_outputs.txt model input output node file.

.
|-- MobileNetV2_Imagenet.h5
|-- channel_mean_value.txt
|-- dataset.txt
|-- inputs_outputs.txt
`-- space_shuttle_224x224.jpg

0 directories, 5 files

Import Model

pegasus_import.sh model import script, can parse the model structure and weights of various AI frameworks, and output the parsed model files:

Model architecture will be saved in MODEL_DIR.json
Model weights will be saved in MODEL_DIR.data
Automatically generate model input file template MODEL_DIR_inputmeta.yml
Automatically generate model postprocessing file template MODEL_DIR_postprocess_file.yml

X86 Linux PC

# pegasus_import.sh MODEL_DIR
./pegasus_import.sh MobileNetV2_Imagenet/

Parameters:

MODEL_DIR directory containing source model files

Manually Modify Model Input File

Here, we need to manually set the mean and scale in MobileNetV2_Imagenet_inputmeta.yml based on the model input preprocessing mean and scale. Here, we use MobileNetV2_ImageNet as an example, because MobileNetV2 outputs (1,224,224,3) RGB three channels, according to the model preprocessing formula:

x1 = (x - mean) / std
x1 = (x - mean) * scale
scale = 1 / std

Because the training dataset is ImageNet, here the ImageNet training set's normalized mean is [0.485, 0.456, 0.406], std is [0.229, 0.224, 0.225], here needs to perform reverse normalization calculation. Normalization data reference pytorch documentation

# mean
485 * 255 = 123.675
456 * 255 = 116.28
406 * 255 = 103.53

# scale
/ (0.229 * 255) = 0.01712
/ (0.224 * 255) = 0.01751
/ (0.225 * 255) = 0.01743

Here, we manually set the mean and scale in MobileNetV2_Imagenet_inputmeta.yml based on the calculated mean and scale:

mean:
- 123.675
- 116.28
- 103.53
scale:
- 0.01712
- 0.01751
- 0.01743

Quantize Model

Before conversion, models can be quantized in different types, ACUITY supports uint8 / int16 / bf16 / pcq (int8 per-channel quantized) multiple quantization types, if using float, it means no quantization.

Use pegasus_quantize.sh script to quantize the model to the specified type.

tip

If the source model is already a quantized model, no quantization is needed here, otherwise it will report an error.

X86 Linux PC

# pegasus_quantize.sh MODEL_DIR QUANTIZED ITERATION
pegasus_quantize.sh MobileNetV2_Imagenet int16 10

Quantization will generate the quantized file MODEL_DIR_QUANTIZED.quantize corresponding to the quantization method.

QUANTIZED	TYPE	QUANTIZER
uint8	uint8	asymmetric_affine
int16	int16	dynamic_fixed_point
pcq	int8	perchannel_symmetric_affine
bf16	bf16	qbfloat16

Inference quantized model

After quantization, the performance will be improved to different degrees, but the precision will be slightly reduced. The quantized model can be inferred through pegasus_inference.sh to verify whether the quantized model meets the precision requirements. Test inference input is the first picture in dataset.txt. space_shuttle

Inference float model

Infer the unquantized float model to get the result as the reference for the quantized model.

X86 Linux PC

# pegasus_inference.sh MODEL_DIR QUANTIZED ITERATION
pegasus_inference.sh MobileNetV2_Imagenet/ float

Inference result output is

I 07:01:06 Iter(0), top(5), tensor(@attach_Logits/Softmax/out0_0:out0) :
I 07:01:06 812: 0.9990391731262207
I 07:01:06 814: 0.0001562383840791881
I 07:01:06 627: 8.89502334757708e-05
I 07:01:06 864: 6.59249781165272e-05
I 07:01:06 536: 2.808812860166654e-05

The top5 confidence is 812, corresponding to the label space shuttle, which matches the actual input image type, indicating that the model input preprocessing mean and scale are set correctly.

The tensor of inference is saved in the MODEL_DIR/inf/MODEL_DIR_QUANTIZED folder at the same time.

iter_0_input_1_158_out0_1_224_224_3.qnt.tensor is the original image tensor
iter_0_input_1_158_out0_1_224_224_3.tensor is the model input tensor after preprocessing
iter_0_attach_Logits_Softmax_out0_0_out0_1_1000.tensor is the model output tensor

.
|-- iter_0_attach_Logits_Softmax_out0_0_out0_1_1000.tensor
|-- iter_0_input_1_158_out0_1_224_224_3.qnt.tensor
`-- iter_0_input_1_158_out0_1_224_224_3.tensor

0 directories, 3 files

Inference uint8 quantized model

X86 Linux PC

# pegasus_inference.sh MODEL_DIR QUANTIZED ITERATION
pegasus_inference.sh MobileNetV2_Imagenet/ uint8

Inference result output is

I 07:02:20 Iter(0), top(5), tensor(@attach_Logits/Softmax/out0_0:out0) :
I 07:02:20 904: 0.8729746341705322
I 07:02:20 530: 0.012925799004733562
I 07:02:20 905: 0.01022859662771225
I 07:02:20 468: 0.006405209191143513
I 07:02:20 466: 0.005068646278232336

warning

The top5 confidence is 904, this needs to be noted that 904 corresponds to the label wig, which does not match the actual input image type, and does not match the inference result of the float type, which means that the uint8 quantization has caused precision loss. At this time, higher precision quantization models can be applied, such as pcq or int16. For methods to improve model precision, please refer to Hybrid Quantization

Inference pcq quantized model

X86 Linux PC

# pegasus_inference.sh MODEL_DIR QUANTIZED ITERATION
pegasus_inference.sh MobileNetV2_Imagenet/ pcq

The inference result output is

I 03:36:41 Iter(0), top(5), tensor(@attach_Logits/Softmax/out0_0:out0) :
I 03:36:41 812: 0.9973124265670776
I 03:36:41 814: 0.00034916045842692256
I 03:36:41 627: 0.00010834729619091377
I 03:36:41 833: 9.26952125155367e-05
I 03:36:41 576: 6.784773722756654e-05

The top5 confidence is 812, corresponding to the label space shuttle, which matches the actual input image type, and matches the inference result of the float type, indicating that the pcq quantization precision is correct.

Inference int16 quantized model

X86 Linux PC

# pegasus_inference.sh MODEL_DIR QUANTIZED ITERATION
pegasus_inference.sh MobileNetV2_Imagenet/ int16

The inference result output is

I 06:54:23 Iter(0), top(5), tensor(@attach_Logits/Softmax/out0_0:out0) :
I 06:54:23 812: 0.9989829659461975
I 06:54:23 814: 0.0001675251842243597
I 06:54:23 627: 9.466391202295199e-05
I 06:54:23 864: 6.788487371522933e-05
I 06:54:23 536: 3.0241633794503286e-05

Model compilation and export

pegasus_export_ovx.sh can export the model files and project code required for NPU inference.

Here we take the INT16 quantized model as an example

X86 Linux PC

# pegasus_export_ovx.sh MODEL_DIR QUANTIZED
pegasus_export_ovx.sh MobileNetV2_Imagenet int16

Generated OpenVX project and NBG project paths:

MODEL_DIR/wksp/MODEL_DIR_QUANTIZED : Cross-platform OpenVX project, requires hardware just-in-time (JIT) compilation for model initialization.
MODEL_DIR/wksp/MODEL_DIR_QUANTIZED_nbg_unify: NBG format, pre-compiled machine code format, low overhead, fast initialization.

(.venv) root@focal-v4l2:~/work/Acuity/acuity_examples/models/MobileNetV2_Imagenet/wksp$ ls
MobileNetV2_Imagenet_int16  MobileNetV2_Imagenet_int16_nbg_unify

tip

In the NBG project, you can get the network_binary.nb model file. The compiled model can be copied to the board for inference using vpm_run or awnn API.

Use Vivante IDE to simulate running inference

Use Vivante IDE to verify the generated target model and OpenVX project on X86 PC in ACUITY Docker.

Import Vivante IDE required environment variables

X86 Linux PC

export USE_IDE_LIB=1
export VIVANTE_SDK_DIR=~/Vivante_IDE/VivanteIDE5.11.0/cmdtools/vsimulator
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/Vivante_IDE/VivanteIDE5.11.0/cmdtools/common/lib:~/Vivante_IDE/VivanteIDE5.11.0/cmdtools/vsimulator/lib
unset VSI_USE_IMAGE_PROCESS

Simulate running cross-platform OpenVX project

Compile executable file

X86 Linux PC

cd MobileNetV2_Imagenet/wksp/MobileNetV2_Imagenet_int16
make -f makefile.linux

The generated target file is the binary executable file of MODEL_DIR_QUANTIZED.

Run executable file

X86 Linux PC

# Usage: ./mobilenetv2imagenetint16 data_file inputs...
./mobilenetv2imagenetint16 MobileNetV2_Imagenet_int16.export.data ../../space_shuttle_224x224.jpg

The inference result output is:

Create Neural Network: 11ms or 11426us
Verify...
Verify Graph: 2430ms or 2430049us
Start run graph [1] times...
Run the 1 time: 229309.52ms or 229309520.00us
vxProcessGraph execution time:
Total   229309.53ms or 229309536.00us
Average 229309.53ms or 229309536.00us
 --- Top5 ---
812: 0.999023
814: 0.000146
627: 0.000084
864: 0.000067
  0: 0.000000

Simulate running NBG project

tip

Using Vivante IDE to run NBG projects will increase the time.

Compile executable file

X86 Linux PC

cd MobileNetV2_Imagenet/wksp/MobileNetV2_Imagenet_int16_nbg_unify
make -f makefile.linux

The generated target file is the binary executable file of MODEL_DIR_QUANTIZED.

Run executable file

X86 Linux PC

# Usage: ./mobilenetv2imagenetint16 data_file inputs...
./mobilenetv2imagenetint16 network_binary.nb ../../space_shuttle_224x224.jpg

The inference result output is:

Create Neural Network: 4ms or 4368us
Verify...
Verify Graph: 2ms or 2482us
Start run graph [1] times...
Run the 1 time: 229388.50ms or 229388496.00us
vxProcessGraph execution time:
Total   229388.52ms or 229388512.00us
Average 229388.52ms or 229388512.00us
 --- Top5 ---
812: 0.999023
814: 0.000146
627: 0.000084
864: 0.000067
  0: 0.000000

Board-side NPU inference

Board-side inference of NBG format models can be performed using the vpm_run tool for inference testing.

vpm_run installation and usage please refer to vpm_run model testing tool

NPU Version Table​

Download example repository​

Model parsing​

Import Model​

Manually Modify Model Input File​

Quantize Model​

Inference quantized model​

Inference float model​

Inference uint8 quantized model​

Inference pcq quantized model​

Inference int16 quantized model​

Model compilation and export​

Use Vivante IDE to simulate running inference​

Import Vivante IDE required environment variables​

Simulate running cross-platform OpenVX project​

Compile executable file​

Run executable file​

Simulate running NBG project​

Compile executable file​

Run executable file​

Board-side NPU inference​

NPU Version Table

Download example repository

Model parsing

Import Model

Manually Modify Model Input File

Quantize Model

Inference quantized model

Inference float model

Inference uint8 quantized model

Inference pcq quantized model

Inference int16 quantized model

Model compilation and export

Use Vivante IDE to simulate running inference

Import Vivante IDE required environment variables

Simulate running cross-platform OpenVX project

Compile executable file

Run executable file

Simulate running NBG project

Compile executable file

Run executable file

Board-side NPU inference