Skip to main content

DeepSeek-R1-Distill-Qwen-7B

This document describes how to perform NPU hardware-accelerated inference of the DeepSeek-R1-Distill-Qwen-7B model on Qualcomm platforms using Qualcomm® Genie.

Model Details

ModelQuantizationContext Length
DeepSeek-R1-Distill-Qwen-7BW4A164096

Supported Devices

tip

Refer to the SoC Architecture Reference to find the DSP architecture of your device's SoC.

  • This example supports Qualcomm platform SoCs with v73 DSP architecture.

    dsp_arch
    v73
  • Supported devices

    DeviceSoCdsp_arch
    Fogwise® AIRbox Q900QCS9075v73

Download qcom-qairt Dependencies

Device
sudo apt install qcom-qnn-sdk-v73 qcom-genie-sdk-v73

Import Environment Variables

Device
export ADSP_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu

Download Model

tip

Please install the modelscope Python package in a Python virtual environment. For virtual environment usage, refer to Python Virtual Environment Usage

Device
pip3 install modelscope
modelscope download --model radxa/DeepSeek-R1-Distill-Qwen-7B-w4a16-4096-v73 --local_dir ./DeepSeek-R1-Distill-Qwen-7B-w4a16-4096-v73

Run Inference

Device
cd DeepSeek-R1-Distill-Qwen-7B-w4a16-4096-v73

Run Inference

Build Prompt

Prompts can be passed as a file or as a parameter.

<|begin▁of▁sentence|>You are Deepseek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. You'll provide helpful, harmless, and detailed responses to all user inquiries.<|User|>Introduce Qualcomm in 100 words<|Assistant|>

Run Inference

Device
genie-t2t-run -c DeepSeek-R1-Distill-Qwen-7B-htp.json -p "<|begin▁of▁sentence|>You are Deepseek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. You'll provide helpful, harmless, and detailed responses to all user inquiries.<|User|>Introduce Qualcomm in 100 words<|Assistant|>"
(.venv) rock@radxa-airbox-q900:/mnt/ssd/qualcomm/DeepSeek-R1-Distill-Qwen-7B$ genie-t2t-run -c DeepSeek-R1-Distill-Qwen-7B-htp.json --prompt_file chat.txt
Using libGenie.so version 1.14.0

/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 161120768 across 10 buffers"
[PROMPT]: <|begin▁of▁sentence|>You are Deepseek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. You'll provide helpful, harmless, and detailed responses to all user inquiries.<|User|>Introduce Qualcomm in 100 words<|Assistant|>

[BEGIN]: <think>
Okay, I need to introduce Qualcomm in 100 words. Let me start by recalling what I know about Qualcomm. They're a big company, known for technology, especially in mobile communications. They've been around for a long time, maybe since the 1980s or something. I think they work with a lot of big companies like Apple and Google.

Wait, Qualcomm is a subsidiary of Intel, right? So they're part of Intel's portfolio. They focus on semiconductors, which are crucial for chips in everything from phones to cars. Their main products are modems, processors, and other components that help devices communicate.

Qualcomm was founded in 1985, I believe. They've been a key player in the development of 4G and 5G technologies. Their chips are used in smartphones, laptops, and even in automotive applications for things like connected cars.

I should mention their commitment to innovation and their role in advancing wireless communication standards. Maybe also touch on their global presence and partnerships with major companies. Oh, and they're involved in various initiatives to make wireless networks faster and more reliable.

Putting it all together, I need to make sure it's concise and hits all the main points without exceeding 100 words. I'll structure it to start with their introduction, then their history, key technologies, and their impact on society and industry.
</think>

Qualcomm is a leading global technology company founded in 1985, renowned for its contributions to wireless communication technologies. As a subsidiary of Intel, Qualcomm specializes in designing advanced semiconductor solutions, including high-speed modems, processors, and wireless components. Its innovations have significantly enhanced communication systems, powering everything from smartphones to connected cars. With a steadfast commitment to innovation, Qualcomm has played a pivotal role in advancing 4G and 5G technologies, shaping the future of wireless networks worldwide. Its extensive portfolio and global partnerships underscore its influence in transforming communication standards and enhancing user experiences across industries.[END]
/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc

Performance Reference

You can enable performance profiling with the --profile option.

genie-t2t-run -c DeepSeek-R1-Distill-Qwen-7B-htp.json --prompt_file chat.txt --profile profile.txt
Fogwise® AIRbox Q900
GenieDialog_create1,952,585 us
num-prompt-tokens46
prompt-processing-rate302.6634521484375 toks/sec
time-to-first-token152,027 us
num-generated-tokens409
token-generation-rate11.3246431350708 toks/sec
token-generation-time36,116,025 us
GenieDialog_free207,812 us

Metric Definitions

MetricDefinition
GenieDialog_createTime to initialize a dialog object, including model loading, context preparation, and memory allocation.
num-prompt-tokensNumber of tokens in the prompt sent to the model (i.e., the smallest unit the input text is split into).
prompt-processing-rateSpeed at which the model processes the prompt, in tokens per second (toks/sec), reflecting the efficiency of prompt analysis and output preparation.
time-to-first-tokenTime elapsed from the start of processing to the generation of the first output token, reflecting the model's response latency.
num-generated-tokensNumber of tokens actually output by the model in this generation, representing the length of the generated text in tokens.
token-generation-rateSpeed at which the model generates tokens, in tokens per second (toks/sec), reflecting generation efficiency.
token-generation-timeTotal time spent generating all output tokens, in microseconds (us).
GenieDialog_freeTime to free the dialog object, including memory release and resource cleanup.

Official Genie Documentation

For more details on Qualcomm® Genie usage and API, refer to:

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0