Skip to main content

MiniCPM-1B-sft

This document describes how to perform NPU hardware-accelerated inference of the MiniCPM-1B-sft model on Qualcomm platforms using Qualcomm® Genie.

Model Details

ModelQuantizationContext Length
MiniCPM-1B-sftW4A161024

Supported Devices

tip

Refer to the SoC Architecture Reference to find the DSP architecture of your device's SoC.

  • This example supports Qualcomm platform SoCs with v73 DSP architecture.

    dsp_arch
    v73
  • Supported devices

    DeviceSoCdsp_arch
    Fogwise® AIRbox Q900QCS9075v73

Download qcom-qairt Dependencies

Device
sudo apt install qcom-qnn-sdk-v73 qcom-genie-sdk-v73

Import Environment Variables

Device
export ADSP_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu

Download Model

tip

Please install the modelscope Python package in a Python virtual environment. For virtual environment usage, refer to Python Virtual Environment Usage

Device
pip3 install modelscope
modelscope download --model radxa/MiniCPM-1B-sft-w4a16-1024-v73 --local_dir ./MiniCPM-1B-sft-w4a16-1024-v73

Run Inference

Device
cd MiniCPM-1B-sft-w4a16-1024-v73

Build Prompt

Prompts can be passed as a file or as a parameter.

<s><user>What is the most popular cookie in the world?</user><assistant>

Run Inference

Device
genie-t2t-run -c minicpm-1b-htp-228.json -p '<s><user>What is the most popular cookie in the world?</user><assistant>'
(.venv) rock@radxa-airbox-q900:/mnt/ssd/qualcomm/MiniCPM-1B-sft$ genie-t2t-run -c minicpm-1b-htp-228.json -p '<s><user>What is the most popular cookie in the world?</user><assistant>'
Using libGenie.so version 1.14.0

/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 207163392 across 1 buffers"
[PROMPT]: <s><user>What is the most popular cookie in the world?</user><assistant>

[BEGIN]: Themostpopularcookieintheworldislikelytobechocolatechipcookies,whichoriginatedintheUnitedStatesinthe1950s.[END]
/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc

Performance Reference

You can enable performance profiling with the --profile option.

genie-t2t-run -c minicpm-1b-htp-228.json --prompt_file chat.txt --profile profile.txt
Fogwise® AIRbox Q900
GenieDialog_create977,912 us
num-prompt-tokens17
prompt-processing-rate47.047752380371094 toks/sec
time-to-first-token361,345 us
num-generated-tokens14
token-generation-rate41.98681640625 toks/sec
token-generation-time333,439 us
GenieDialog_free90,825 us

Metric Definitions

MetricDefinition
GenieDialog_createTime to initialize a dialog object, including model loading, context preparation, and memory allocation.
num-prompt-tokensNumber of tokens in the prompt sent to the model (i.e., the smallest unit the input text is split into).
prompt-processing-rateSpeed at which the model processes the prompt, in tokens per second (toks/sec), reflecting the efficiency of prompt analysis and output preparation.
time-to-first-tokenTime elapsed from the start of processing to the generation of the first output token, reflecting the model's response latency.
num-generated-tokensNumber of tokens actually output by the model in this generation, representing the length of the generated text in tokens.
token-generation-rateSpeed at which the model generates tokens, in tokens per second (toks/sec), reflecting generation efficiency.
token-generation-timeTotal time spent generating all output tokens, in microseconds (us).
GenieDialog_freeTime to free the dialog object, including memory release and resource cleanup.

Official Genie Documentation

For more details on Qualcomm® Genie usage and API, refer to:

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0