DeepSeek-R1-Distill-Qwen-7B
本文档将介绍如何在高通平台上通过 Qualcomm® Genie 使用 NPU 硬件加速推理 Deepseek-R1-Distill-Qwen-7B 模型
-
源模型许可证:MIT
模型细节
| 模型 | 量化方式 | 上下文长度 |
|---|---|---|
| DeepSeek-R1-Distill-Qwen-7B | W4A16 | 4096 |
支持设备
提示
请参考 SoC 架构对照表 查寻当前设备 SoC 的 DSP 架构
-
此示例支持 v73 DSP 架构的高通平台 SoC
dsp_arch v73 -
运行设备
设备 SoC dsp_arch Fogwise® AIRbox Q900 QCS9075 v73
下载 qcom-qairt 依赖
- QCS6490
- QCS9075
Device
sudo apt install qcom-qnn-sdk-v68 qcom-genie-sdk-v68
Device
sudo apt install qcom-qnn-sdk-v73 qcom-genie-sdk-v73
导入环境变量
Device
export ADSP_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu
下载模型
提示
请在 python 虚拟环境中安装 modelscope python 包,虚拟环境使用请参考 Python 虚拟环境使用
Device
pip3 install modelscope
modelscope download --model radxa/DeepSeek-R1-Distill-Qwen-7B-w4a16-4096-v73 --local_dir ./DeepSeek-R1-Distill-Qwen-7B-w4a16-4096-v73
推理模型
Device
cd DeepSeek-R1-Distill-Qwen-7B-w4a16-4096-v73
构建 prompt
prompt 支持以文件形式或者参数形式传入
- prompt
- prompt_file
<|begin▁of▁sentence|>You are Deepseek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. You'll provide helpful, harmless, and detailed responses to all user inquiries.<|User|>Introduce Qualcomm in 100 words<|Assistant|>
Device
vim chat.txt
<|begin▁of▁sentence|>You are Deepseek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. You'll provide helpful, harmless, and detailed responses to all user inquiries.<|User|>Introduce Qualcomm in 100 words<|Assistant|>
执行推理
- prompt
- prompt_file
Device
genie-t2t-run -c DeepSeek-R1-Distill-Qwen-7B-htp.json -p "<|begin▁of▁sentence|>You are Deepseek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. You'll provide helpful, harmless, and detailed responses to all user inquiries.<|User|>Introduce Qualcomm in 100 words<|Assistant|>"
Device
genie-t2t-run -c DeepSeek-R1-Distill-Qwen-7B-htp.json --prompt_file chat.txt
(.venv) rock@radxa-airbox-q900:/mnt/ssd/qualcomm/DeepSeek-R1-Distill-Qwen-7B$ genie-t2t-run -c DeepSeek-R1-Distill-Qwen-7B-htp.json --prompt_file chat.txt
Using libGenie.so version 1.14.0
/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 161120768 across 10 buffers"
[PROMPT]: <|begin▁of▁sentence|>You are Deepseek-R1, an AI assistant created exclusively by the Chinese Company DeepSeek. You'll provide helpful, harmless, and detailed responses to all user inquiries.<|User|>Introduce Qualcomm in 100 words<|Assistant|>
[BEGIN]: <think>
Okay, I need to introduce Qualcomm in 100 words. Let me start by recalling what I know about Qualcomm. They're a big company, known for technology, especially in mobile communications. They've been around for a long time, maybe since the 1980s or something. I think they work with a lot of big companies like Apple and Google.
Wait, Qualcomm is a subsidiary of Intel, right? So they're part of Intel's portfolio. They focus on semiconductors, which are crucial for chips in everything from phones to cars. Their main products are modems, processors, and other components that help devices communicate.
Qualcomm was founded in 1985, I believe. They've been a key player in the development of 4G and 5G technologies. Their chips are used in smartphones, laptops, and even in automotive applications for things like connected cars.
I should mention their commitment to innovation and their role in advancing wireless communication standards. Maybe also touch on their global presence and partnerships with major companies. Oh, and they're involved in various initiatives to make wireless networks faster and more reliable.
Putting it all together, I need to make sure it's concise and hits all the main points without exceeding 100 words. I'll structure it to start with their introduction, then their history, key technologies, and their impact on society and industry.
</think>
Qualcomm is a leading global technology company founded in 1985, renowned for its contributions to wireless communication technologies. As a subsidiary of Intel, Qualcomm specializes in designing advanced semiconductor solutions, including high-speed modems, processors, and wireless components. Its innovations have significantly enhanced communication systems, powering everything from smartphones to connected cars. With a steadfast commitment to innovation, Qualcomm has played a pivotal role in advancing 4G and 5G technologies, shaping the future of wireless networks worldwide. Its extensive portfolio and global partnerships underscore its influence in transforming communication standards and enhancing user experiences across industries.[END]
/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc
性能参考
可以使用 --profile 选项开启性能分析功能
genie-t2t-run -c DeepSeek-R1-Distill-Qwen-7B-htp.json --prompt_file chat.txt --profile profile.txt
| Fogwise® AIRbox Q900 | |
|---|---|
| GenieDialog_create | 1,952,585 us |
| num-prompt-tokens | 46 |
| prompt-processing-rate | 302.6634521484375 toks/sec |
| time-to-first-token | 152,027 us |
| num-generated-tokens | 409 |
| token-generation-rate | 11.3246431350708 toks/sec |
| token-generation-time | 36,116,025 us |
| GenieDialog_free | 207,812 us |
指标含义解析
| 指标 | 含义解释 |
|---|---|
| GenieDialog_create | 初始化一个会话对象的时间。包括模型加载、上下文准备、内存分配等。 |
| num-prompt-tokens | 本次输入给模型的 prompt(提示词)的 token 数量,也就是模型接收到的文本拆分成的最小单元数量。 |
| prompt-processing-rate | 模型处理输入 prompt 的速度,单位为 token 每秒(toks/sec),表示模型分析 prompt 并准备生成输出的效率。 |
| time-to-first-token | 从开始处理到生成第一个输出 token 所花的时间,反映模型响应的延迟。 |
| num-generated-tokens | 模型在本次生成中实际输出的 token 数量,也就是模型生成的文本长度(以 token 为单位)。 |
| token-generation-rate | 模型生成 token 的速度,单位为 token 每秒(toks/sec),反映生成效率。 |
| token-generation-time | 模型生成所有输出 token 总共花费的时间,单位通常为微秒(us)。 |
| GenieDialog_free | 释放会话对象的时间,包括释放内存和清理资源。 |
Genie 官方文档
如果想深入了解 Qualcomm® Genie 的使用方法与详细 API 请参考