Skip to main content

NPU Usage

Teflon TFLite delegate is an Mesa open-source Tensorflow Lite delegate used for hardware-accelerated inference on the Amlogic A311D SoC NPU.

To utilize the teflon delegate for NPU hardware-accelerated neural network inference, users need to use the Armbian 6.8.7_xfce system provided by Radxa. Please follow the install OS guide to install this system.

Install Teflon TFLite delegate

Download precompiled delegate file

Teflon TFLite delegate has been precompiled and users can download and use it directly.

wget https://github.com/zifeng-radxa/zero2pro_NPU_example/releases/download/v1.0/libteflon.so

Manual compilation (optional)

  • Clone repository
git clone https://gitlab.freedesktop.org/tomeu/mesa.git -b teflon-staging --single-branch --depth=1
cd mesa
  • Set up compilation environment
sudo apt install -y python3-pip python3.11-venv libdrm-dev libwayland-dev libwayland-egl-backend-dev libx11-dev libxext libxfixes-dev libxcb-glx0-dev libxcb-shm0-dev libx11-xcb-dev libxcb-dri2-0-dev libxcb-dri3-dev libxcb-present-dev libxshmfence-dev libxxf86vm-dev libxrandr-dev
python3 -m venv .venv
source .venv/bin/activate
pip3 install meson ninja mako pycparser
  • Compile Teflon
meson setup build -Dgallium-drivers=etnaviv -Dvulkan-drivers= -Dteflon=true
meson compile -C build
  • Path to the successfully compiled libteflon.so
mesa/build/src/gallium/targets/teflon/libteflon.so

Using Teflon TFLite delegate

Users can refer to the TensorFlow Lite delegate documentation and delegate usage documentation to understand the principles and usage of delegates. Using NPU acceleration requires running inference scripts as root user.

MobileNet V1 Object Recognition Example

Here is an example of using Teflon delegate to use NPU inference MobileNet V1 object recognition model to recognite the contents of the following image.

  • Get example code and model files
git clone https://github.com/zifeng-radxa/zero2pro_NPU_example.git
cd zero2pro_NPU_example
wget http://download.tensorflow.org/models/mobilenet_v1_2018_08_02/mobilenet_v1_1.0_224_quant.tgz
tar -xvf mobilenet_v1_1.0_224_quant.tgz
  • Set up environment
python3 -m venv .venv
source .venv/bin/activate
pip3 install numpy pillow tflite-runtime
  • Run example code

    Replace -e with the path to libteflon.so

python3 classification.py -i ./grace_hopper.bmp -m ./mobilenet_v1_1.0_224_quant.tflite -l labels_mobilenet_quant_v1_224.txt -e ./libteflon.so
(.venv) root@radxa-zero2:~/zero2pro_npu_example# python3 classification.py -i ./grace_hopper.bmp -m ./mobilenet_v1_1.0_224_quant.tflite -l labels_mobilenet_quant_v1_224.txt -e ./libteflon.so
Loading external delegate from ./libteflon.so with args: {}
0.909804: military uniform
0.019608: Windsor tie
0.007843: bulletproof vest
0.007843: mortarboard
0.003922: cornet
time: 6.256ms
  • Compare the inference speed of the CPU to the NPU, NPU improves by 16 times
(.venv) root@radxa-zero2:~/zero2pro_npu_example# python3 classification.py -i ./grace_hopper.bmp -m ./mobilenet_v1_1.0_224_quant.tflite -l labels_mobilenet_quant_v1_224.txt
0.917647: military uniform
0.015686: Windsor tie
0.007843: mortarboard
0.007843: bulletproof vest
0.003922: bow tie
time: 101.621ms