NPU Development Guide
The Radxa Dragon series SoC is equipped with the Qualcomm® Hexagon™ Processor (NPU), a hardware accelerator specifically designed for AI inference. To utilize the NPU for model inference, you need to use the QAIRT (Qualcomm® AI Runtime) SDK to port pre-trained models. Qualcomm® provides a series of SDKs to help developers port their AI models to the NPU.
-
Model Quantization Library: AIMET
-
Model Porting SDK: QAIRT
-
Model Application Library: QAI-APP-BUILDER
-
Online Model Conversion Library: QAI-HUB
Qualcomm® NPU Software Stack
QAIRT
QAIRT (Qualcomm® AI Runtime) SDK is a software package that integrates Qualcomm® AI software products, including Qualcomm® AI Engine Direct, Qualcomm® Neural Processing SDK, and Qualcomm® Genie. QAIRT provides developers with all the necessary tools for porting and deploying AI models on Qualcomm® hardware accelerators, as well as the runtime for running models on CPU, GPU, and NPU.
Supported Inference Backends
-
CPU
-
GPU
-
NPU

QAIRT SDK Architecture
QAIRT Model Formats
QAIRT supports the following 3 model file formats based on different systems and inference backends:
| Format | Backend | Cross-OS | Cross-Chip |
|---|---|---|---|
| Library | CPU / GPU / NPU | No | Yes |
| DLC | CPU / GPU / NPU | Yes | Yes |
| Context Binary | NPU | Yes | No |
This document focuses on model porting and deployment using the NPU, specifically covering the Context-Binary format which offers optimal memory usage and performance. For information on converting other model formats and inference methods for different backends, please refer to the QAIRT SDK Documentation
SoC Architecture Reference Table
| SoC | dsp_arch | soc_id |
|---|---|---|
| QCS6490 | v68 | 35 |
| SC8280XP | v68 | 37 |
| QCS9075 | v73 | 77 |
Documentation
AIMET
AIMET (AI Model Efficiency Toolkit) is a quantization tool for deep learning models (such as PyTorch and ONNX). AIMET enhances the performance of deep learning models by reducing computational load and memory usage. With AIMET, developers can quickly iterate to find the optimal quantization configuration, achieving the best balance between accuracy and latency. Developers can compile and deploy quantized models exported from AIMET on Qualcomm NPUs using QAIRT, or run them directly with ONNX-Runtime.

AIMET OVERVIEW
Documentation
QAI-APPBUILDER
Quick AI Application Builder (QAI AppBuilder) helps developers easily use the Qualcomm® AI Runtime SDK to deploy AI models and design AI applications on Qualcomm® SoC platforms equipped with the Qualcomm® Hexagon™ Processor (NPU). It encapsulates the model deployment APIs into a set of simplified interfaces for loading models onto the NPU and performing inference. QAI AppBuilder significantly reduces the complexity of model deployment for developers and provides multiple demos as references for designing their own AI applications.

QAI-APPBUILDER Architecture
Documentation
QAI-Hub
Qualcomm® AI Hub (QAI-Hub) is a one-stop cloud platform for model conversion, offering online model compilation, quantization, performance analysis, inference, and download services. Qualcomm® AI Hub automates the model conversion process from pre-trained models to device runtimes, automatically configuring devices in the cloud for performance analysis and inference. The Qualcomm® AI Hub Models (QAI-Hub-Models) project leverages the cloud services provided by QAI-Hub, supporting command-line based quantization, compilation, inference, analysis, and download of models from the model list on cloud devices.

QAI-Hub WORKFLOW