Skip to main content

Qwen2.5-0.5B Large Model

This document describes how to use NPU for Qwen2.5-0.5B large language model inference on the Radxa Dragon Q8B

Download Example

Device
pip3 install modelscope
modelscope download --model radxa/Qwen2.5-0.5B-v68 --local ./Qwen2.5-0.5B-v68

Model Inference

Construct Prompt

Qwen2.5 prompt construction must follow the format below:

"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nPlease give a brief introduction to relativity.<|im_end|>\n<|im_start|>assistant\n"

LLM Inference

Configure Environment

Device
cd ./Qwen2.5-0.5B-v68
export LD_LIBRARY_PATH=$(pwd)
chmod +x genie-t2t-run

Execute Inference

Device
./genie-t2t-run -c qwen2.5-0.5B-1k-htp.json  -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nPlease give a brief introduction to relativity.<|im_end|>\n<|im_start|>assistant\n"
rock@radxa-dragon-q6a:~/ssd/qualcomm/701/Qwen2.5-0.5B-v68$ ./genie-t2t-run -c qwen2.5-0.5B-1k-htp.json  -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nPlease give a brief introduction to relativity.<|im_end|>\n<|im_start|>assistant\n"
Using libGenie.so version 1.13.0

/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.40.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 45466112 across 1 buffers"
[PROMPT]: <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nPlease give a brief introduction to relativity.<|im_end|>\n<|im_start|>assistant\n

[BEGIN]: Relativity is a branch of physics that deals with the concepts of time dilation, length contraction, and the speed of light. These effects are important in many areas of science and engineering, including GPS navigation, the study of black holes, and the understanding of the structure of the universe. The idea behind relativity is that the laws of physics are the same for all observers, regardless of their relative motion or frame of reference. This means that the same physical laws apply to objects in the same state of motion, even if they are observed from different points of reference.[END]
/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.40.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc

Performance Analysis

You can use the --profile option to enable performance analysis:

./genie-t2t-run -c qwen2.5-0.5B-1k-htp.json  -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nPlease give a brief introduction to relativity.<|im_end|>\n<|im_start|>assistant\n" --profile profile.txt
Dragon Q8BCTX_LENGTH 1024
duration2,629,992 us
num-prompt-tokens29
prompt-processing-rate543.478271484375 toks/sec
time-to-first-token53,388 us
num-generated-tokens114
token-generation-rate44.251705169677734 toks/sec
token-generation-time2,576,247 us

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0