跳到主要内容

Phi-2

本文档将介绍如何在高通平台上通过 Qualcomm® Genie 使用 NPU 硬件加速推理 Phi-2 模型

模型细节

模型量化方式上下文长度
Phi-2W4A161024

支持设备

提示

请参考 SoC 架构对照表 查寻当前设备 SoC 的 DSP 架构

  • 此示例支持 v73 DSP 架构的高通平台 SoC

    dsp_arch
    v73
  • 运行设备

    设备SoCdsp_arch
    Fogwise® AIRbox Q900QCS9075v73

下载 qcom-qairt 依赖

Device
sudo apt install qcom-qnn-sdk-v73 qcom-genie-sdk-v73

导入环境变量

Device
export ADSP_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu

下载模型

提示

请在 python 虚拟环境中安装 modelscope python 包,虚拟环境使用请参考 Python 虚拟环境使用

Device
pip3 install modelscope
modelscope download --model radxa/Phi-2-w4a16-1024-v73 --local_dir ./Phi-2-w4a16-1024-v73

推理模型

Device
cd Phi-2-w4a16-1024-v73

构建 prompt

prompt 支持以文件形式或者参数形式传入

System: You are a helpful assistant\nUser: Introduce Qualcomm in 100 words\nAssistant:

执行推理

Device
genie-t2t-run -c phi-2-htp.json -p 'System: You are a helpful assistant\nUser: Introduce Qualcomm in 100 words\nAssistant:'
(.venv) rock@radxa-airbox-q900:/mnt/ssd/qualcomm/Mistral-7B-Instruct-v0.3$ genie-t2t-run -c mistral-7b-instruct-v0_3-htp.json -p '<s>[INST] What is the most popular cookie in the world? [/INST]'
Using libGenie.so version 1.14.0

/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 281051648 across 8 buffers"
[PROMPT]: <s>[INST] What is the most popular cookie in the world? [/INST]

[BEGIN]: The most popular cookie in the world is the chocolate chip cookie, which is a type of cookie that originates from the United States. It is a small, round-shaped, and semisweet chocolate-flavored cookie that is often enjoyed for its rich, creamy, and indulgent taste. The cookie is often served as a sweet, crunchy, and luscious treat, and is often adored for its delightful, delectable, and scrumptious texture.

The chocolate chip cookie is a favorite dessert and confectionery delight that is often relished, Savoried, and Adored for its Melt-in-the-Mouth, Melt-in-the-Mind, and Melt-in-the-Moment. It is a Creamy, Creamy, and Dreamy Delight, and is often Savorized, Savorized, and Savored for its Dreamy, Delighting, and Delightful.
, its Delicious, Delectable, and Delightful.

The chocolate chip cookie is a popular, Beloved, and Adored Confectionery Delight, and is often Adored, Adored, and Adored for its Delectable, and Delightful.[END]
/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc

性能参考

可以使用 --profile 选项开启性能分析功能

genie-t2t-run -c phi-2-htp.json --prompt_file chat.txt --profile profile.txt
Fogwise® AIRbox Q900
GenieDialog_create871,031 us
num-prompt-tokens19
prompt-processing-rate208.463623046875 toks/sec
time-to-first-token91,145 us
num-generated-tokens135
token-generation-rate20.067829132080078 toks/sec
token-generation-time6,727,238 us
GenieDialog_free99,989 us

指标含义解析

指标含义解释
GenieDialog_create初始化一个会话对象的时间。包括模型加载、上下文准备、内存分配等。
num-prompt-tokens本次输入给模型的 prompt(提示词)的 token 数量,也就是模型接收到的文本拆分成的最小单元数量。
prompt-processing-rate模型处理输入 prompt 的速度,单位为 token 每秒(toks/sec),表示模型分析 prompt 并准备生成输出的效率。
time-to-first-token从开始处理到生成第一个输出 token 所花的时间,反映模型响应的延迟。
num-generated-tokens模型在本次生成中实际输出的 token 数量,也就是模型生成的文本长度(以 token 为单位)。
token-generation-rate模型生成 token 的速度,单位为 token 每秒(toks/sec),反映生成效率。
token-generation-time模型生成所有输出 token 总共花费的时间,单位通常为微秒(us)。
GenieDialog_free释放会话对象的时间,包括释放内存和清理资源。

Genie 官方文档

如果想深入了解 Qualcomm® Genie 的使用方法与详细 API 请参考

    您需要登录 GitHub 才能发表评论。如果您已登录,请忽略此消息。

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0