NPU Development Guide

The Radxa Dragon series SoC is equipped with the Qualcomm® Hexagon™ Processor (NPU), a hardware accelerator specifically designed for AI inference. To utilize the NPU for model inference, you need to use the QAIRT (Qualcomm® AI Runtime) SDK to port pre-trained models. Qualcomm® provides a series of SDKs to help developers port their AI models to the NPU.

Model Quantization Library: AIMET
Model Porting SDK: QAIRT
Model Application Library: QAI-APP-BUILDER
Online Model Conversion Library: QAI-HUB

Qualcomm® NPU Software Stack

QAIRT

QAIRT (Qualcomm® AI Runtime) SDK is a software package that integrates Qualcomm® AI software products, including Qualcomm® AI Engine Direct, Qualcomm® Neural Processing SDK, and Qualcomm® Genie. QAIRT provides developers with all the necessary tools for porting and deploying AI models on Qualcomm® hardware accelerators, as well as the runtime for running models on CPU, GPU, and NPU.

Supported Inference Backends

QAIRT SDK Architecture

QAIRT Model Formats

QAIRT supports the following 3 model file formats based on different systems and inference backends:

Format	Backend	Cross-OS	Cross-Chip
Library	CPU / GPU / NPU	No	Yes
DLC	CPU / GPU / NPU	Yes	Yes
Context Binary	NPU	Yes	No

tip

This document focuses on model porting and deployment using the NPU, specifically covering the Context-Binary format which offers optimal memory usage and performance. For information on converting other model formats and inference methods for different backends, please refer to the QAIRT SDK Documentation

SoC Architecture Reference Table

SoC	dsp_arch	soc_id
QCS6490	v68	35
SC8280XP	v68	37
QCS9075	v73	77

Documentation

AIMET

AIMET (AI Model Efficiency Toolkit) is a quantization tool for deep learning models (such as PyTorch and ONNX). AIMET enhances the performance of deep learning models by reducing computational load and memory usage. With AIMET, developers can quickly iterate to find the optimal quantization configuration, achieving the best balance between accuracy and latency. Developers can compile and deploy quantized models exported from AIMET on Qualcomm NPUs using QAIRT, or run them directly with ONNX-Runtime.

AIMET OVERVIEW

Documentation

QAI-APPBUILDER

Quick AI Application Builder (QAI AppBuilder) helps developers easily use the Qualcomm® AI Runtime SDK to deploy AI models and design AI applications on Qualcomm® SoC platforms equipped with the Qualcomm® Hexagon™ Processor (NPU). It encapsulates the model deployment APIs into a set of simplified interfaces for loading models onto the NPU and performing inference. QAI AppBuilder significantly reduces the complexity of model deployment for developers and provides multiple demos as references for designing their own AI applications.

QAI-APPBUILDER Architecture

Documentation

QAI-Hub

Qualcomm® AI Hub (QAI-Hub) is a one-stop cloud platform for model conversion, offering online model compilation, quantization, performance analysis, inference, and download services. Qualcomm® AI Hub automates the model conversion process from pre-trained models to device runtimes, automatically configuring devices in the cloud for performance analysis and inference. The Qualcomm® AI Hub Models (QAI-Hub-Models) project leverages the cloud services provided by QAI-Hub, supporting command-line based quantization, compilation, inference, analysis, and download of models from the model list on cloud devices.

QAI-Hub WORKFLOW

Qualcomm® NPU Software Stack​

QAIRT​

Supported Inference Backends​

QAIRT Model Formats​

SoC Architecture Reference Table​

Documentation​

AIMET​

Documentation​

QAI-APPBUILDER​

Documentation​

QAI-Hub​

Documentation​

Qualcomm® NPU Software Stack

QAIRT

Supported Inference Backends

QAIRT Model Formats

SoC Architecture Reference Table

Documentation

AIMET

Documentation

QAI-APPBUILDER

Documentation

QAI-Hub

Documentation