Phi-3.5-mini-instruct
This document describes how to perform NPU hardware-accelerated inference of the Phi-3.5-mini-instruct model on Qualcomm platforms using Qualcomm® Genie.
-
Source model: microsoft/Phi-3.5-mini-instruct
-
Source model license: MIT
Model Details
| Model | Quantization | Context Length |
|---|---|---|
| Phi-3.5-mini-instruct | W4A16 | 4096 |
Supported Devices
Refer to the SoC Architecture Reference to find the DSP architecture of your device's SoC.
-
This example supports Qualcomm platform SoCs with v73 DSP architecture.
dsp_arch v73 -
Supported devices
Device SoC dsp_arch Fogwise® AIRbox Q900 QCS9075 v73
Download qcom-qairt Dependencies
- QCS6490
- QCS9075
sudo apt install qcom-qnn-sdk-v68 qcom-genie-sdk-v68
sudo apt install qcom-qnn-sdk-v73 qcom-genie-sdk-v73
Import Environment Variables
export ADSP_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu
Download Model
Please install the modelscope Python package in a Python virtual environment. For virtual environment usage, refer to Python Virtual Environment Usage
pip3 install modelscope
modelscope download --model radxa/Phi-3.5-mini-instruct-w4a16-4096-v73 --local_dir ./Phi-3.5-mini-instruct-w4a16-4096-v73
Run Inference
cd Phi-3.5-mini-instruct-w4a16-4096-v73
Build Prompt
Prompts can be passed as a file or as a parameter.
- prompt
- prompt_file
<|system|>You are a helpful assistant.<|end|><|user|>How to explain Internet for a medieval knight?<|end|><|assistant|>
vim chat.txt
<|system|>You are a helpful assistant.<|end|><|user|>How to explain Internet for a medieval knight?<|end|><|assistant|>
Run Inference
- prompt
- prompt_file
genie-t2t-run -c Phi-3.5-mini-instruct-htp.json -p '<|system|>You are a helpful assistant.<|end|><|user|>How to explain Internet for a medieval knight?<|end|><|assistant|>'
genie-t2t-run -c Phi-3.5-mini-instruct-htp.json --prompt_file chat.txt
(.venv) rock@radxa-airbox-q900:/mnt/ssd/qualcomm/Phi-3.5-mini-instruct$ genie-t2t-run -c Phi-3.5-mini-instruct-htp.json -p '<|system|>You are a helpful assistant.<|end|><|user|>How to explain Internet for a medieval knight?<|end|><|assistant|>'
Using libGenie.so version 1.14.0
/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
[INFO] "Using create From Binary"
[INFO] "Allocated total size = 816947712 across 8 buffers"
[PROMPT]: <|system|>You are a helpful assistant.<|end|><|user|>How to explain Internet for a medieval knight?<|end|><|assistant|>
[BEGIN]: To explain the Internet to a medieval knight, you would need to break down the concept into fundamental ideas and relate them to familiar medieval scenarios. Here's a step-by터 approach:
1. **Understanding the Concept of a "Network":**
- **Metaphor:** Explain the Internet as a vast network of roads, similar to the intricate web of trade routes and alliances that knights might travel on. Just as knights can journey from one castle to another, the Internet allows people to travel from one place to another, not on horseback, but through a system of interconnected pathways.
2. **The Role of the "Knights of the Round Table" (Internet Service Providers):**
- **Explanation:** Describe ISPs as the local lords or guild masters who provide access to the roads. They maintain the infrastructure (like roads) and ensure that travelers (users) can move from one place to another.
3. **The "Code of Chivalry" (Internet Protocols/Standards):**
- **Illustration:** Just as knights follow a code of conduct, the Internet has its own set of rules that ensure communication and data exchange are orderly and efficient. These rules are known as protocols, which are agreed-upon methods for knights (devices and users) to interact safely.
4. **The "Tournament of the Field" (Data Exchange):**
- **Analogy:** When a knight competes in a tournament, he aims to win or achieve a goal. Similarly, the Internet allows individuals to send and receive information (letters, messages, scrolls) to achieve their objectives.
5. **The "Four Postern Door" (Firewall):**
- **Security Measure:** Explain that just as a castle has a gatekeeper to protect its inhabitants from invaders, the Internet has security measures (firewalls) to protect against malicious entities.
6. **The "Tale of Two Cities" (Internet Speed and Connectivity):**
- **Variation:** Some castles (computers) have faster horses (faster Internet speeds) and more direct routes (better connectivity) than others. This difference can affect how quickly one can send messages or travel to distant lands.
7. **The "Crossbow" (Data Transmission):**
- **Tool:** Describe how data is sent across the Internet using a metaphor such as a crossbow. The Internet is like a vast battlefield where crossbows (data packets) are launched from one knight's (user's) position to another, carrying messages or information.
8. **The "Alchemist's Potion" (Data Encryption):**
- **Secrecy:** Just as a potion can be concocted to remain hidden from prying eyes, data on the Internet is often encrypted, ensuring that only those with the right key (password or decryption key) can read the information.
9. **The "Dragon's Lair" (Server Farms):**
- **Central Hub:** Explain that there are central hubs, like a dragon's lair, where vast amounts of scrolls (data) are stored. These are called servers, and they hold the knowledge and resources that knights (users) can access when they travel the Internet.
10. **The "Mercantile Guilds" (Social Networks and Online Communities):**
- **Social Interaction:** The Internet also serves as a marketplace and a gathering place for knights to exchange news, share tales of adventure, and forge alliances, much like the social networks and online communities of today.
By using these medieval metaphors and scenarios, you can help a medieval knight grasp the abstract and complex nature of the Internet in a context they can understand. Remember, the goal is to make the explanation relatable while maintaining the essence of how the Internet functions.[END]
/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.42.0/release/snpe_src/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc
Performance Reference
You can enable performance profiling with the --profile option.
genie-t2t-run -c Phi-3.5-mini-instruct-htp.json --prompt_file chat.txt --profile profile.txt
| Fogwise® AIRbox Q900 | |
|---|---|
| GenieDialog_create | 2,143,046 us |
| num-prompt-tokens | 21 |
| prompt-processing-rate | 122.04051971435547 toks/sec |
| time-to-first-token | 172,091 us |
| num-generated-tokens | 901 |
| token-generation-rate | 9.163215637207031 toks/sec |
| token-generation-time | 98,328,012 us |
| GenieDialog_free | 122,259 us |
Metric Definitions
| Metric | Definition |
|---|---|
| GenieDialog_create | Time to initialize a dialog object, including model loading, context preparation, and memory allocation. |
| num-prompt-tokens | Number of tokens in the prompt sent to the model (i.e., the smallest unit the input text is split into). |
| prompt-processing-rate | Speed at which the model processes the prompt, in tokens per second (toks/sec), reflecting the efficiency of prompt analysis and output preparation. |
| time-to-first-token | Time elapsed from the start of processing to the generation of the first output token, reflecting the model's response latency. |
| num-generated-tokens | Number of tokens actually output by the model in this generation, representing the length of the generated text in tokens. |
| token-generation-rate | Speed at which the model generates tokens, in tokens per second (toks/sec), reflecting generation efficiency. |
| token-generation-time | Total time spent generating all output tokens, in microseconds (us). |
| GenieDialog_free | Time to free the dialog object, including memory release and resource cleanup. |
Official Genie Documentation
For more details on Qualcomm® Genie usage and API, refer to: