Skip to main content

NPU Development Guide

The Radxa Dragon series SoC is equipped with the Qualcomm® Hexagon™ Processor (NPU), a hardware accelerator specifically designed for AI inference. To utilize the NPU for model inference, you need to use the QAIRT (Qualcomm® AI Runtime) SDK to port pre-trained models. Qualcomm® provides a series of SDKs to help developers port their AI models to the NPU.

Qualcomm® NPU Software Stack

QAIRT

QAIRT (Qualcomm® AI Runtime) SDK is a software package that integrates Qualcomm® AI software products, including Qualcomm® AI Engine Direct, Qualcomm® Neural Processing SDK, and Qualcomm® Genie. QAIRT provides developers with all the necessary tools for porting and deploying AI models on Qualcomm® hardware accelerators, as well as the runtime for running models on CPU, GPU, and NPU.

Supported Inference Backends

  • CPU

  • GPU

  • NPU

QAIRT SDK Architecture

QAIRT Model Formats

QAIRT supports the following 3 model file formats based on different systems and inference backends:

FormatBackendCross-OSCross-Chip
LibraryCPU / GPU / NPUNoYes
DLCCPU / GPU / NPUYesYes
Context BinaryNPUYesNo
tip

This document focuses on model porting and deployment using the NPU, specifically covering the Context-Binary format which offers optimal memory usage and performance. For information on converting other model formats and inference methods for different backends, please refer to the QAIRT SDK Documentation

SoC Architecture Reference Table

SoCdsp_archsoc_id
QCS6490v6835
SC8280XPv6837
QCS9075v7377

Documentation

AIMET

AIMET (AI Model Efficiency Toolkit) is a quantization tool for deep learning models (such as PyTorch and ONNX). AIMET enhances the performance of deep learning models by reducing computational load and memory usage. With AIMET, developers can quickly iterate to find the optimal quantization configuration, achieving the best balance between accuracy and latency. Developers can compile and deploy quantized models exported from AIMET on Qualcomm NPUs using QAIRT, or run them directly with ONNX-Runtime.

AIMET OVERVIEW

Documentation

QAI-APPBUILDER

Quick AI Application Builder (QAI AppBuilder) helps developers easily use the Qualcomm® AI Runtime SDK to deploy AI models and design AI applications on Qualcomm® SoC platforms equipped with the Qualcomm® Hexagon™ Processor (NPU). It encapsulates the model deployment APIs into a set of simplified interfaces for loading models onto the NPU and performing inference. QAI AppBuilder significantly reduces the complexity of model deployment for developers and provides multiple demos as references for designing their own AI applications.

QAI-APPBUILDER Architecture

Documentation

QAI-Hub

Qualcomm® AI Hub (QAI-Hub) is a one-stop cloud platform for model conversion, offering online model compilation, quantization, performance analysis, inference, and download services. Qualcomm® AI Hub automates the model conversion process from pre-trained models to device runtimes, automatically configuring devices in the cloud for performance analysis and inference. The Qualcomm® AI Hub Models (QAI-Hub-Models) project leverages the cloud services provided by QAI-Hub, supporting command-line based quantization, compilation, inference, analysis, and download of models from the model list on cloud devices.

QAI-Hub WORKFLOW

Documentation

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0