RKLLM Usage and Deploy LLM

This document explains how to deploy large language models in Huggingface format to the RK3588 with NPU for hardware-accelerated inference using RKLLM.

Currently Supported Models

This guide uses Qwen2.5-1.5B-Instruct as an example to show how to deploy a large language model from scratch on a development board equipped with the RK3588 chip and use the NPU for hardware-accelerated inference.

tip

If the RKLLM environment is not installed and configured, please refer to RKLLM Installation.

Model Conversion

Using Qwen2.5-1.5B-Instruct as an example, users can also choose any of the links in the currently supported models list.

Download all files of Qwen2.5-1.5B-Instruct on an x86 PC workstation. If git-lfs is not installed, please install it.
```
git clone https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
```
Activate the rkllm conda environment. Refer to RKLLM conda Installation if needed.
```
conda activate rkllm
```

Change modelpath model path, dataset path, rkllm export path in rknn-llm/rkllm-toolkit/examples/test.py.

modelpath = 'Your Huggingface LLM model'
datasert = None # 默认是 "./data_quant.json"， 如无可以填写 None
ret = llm.export_rkllm("./Your_Huggingface_LLM_model.rkllm")

Run the model conversion script.
```
cd rknn-llm/rkllm-toolkit/examples/
python3 test.py
```
After successful conversion, you will get an rkllm model.

Compile Executable File

Download the cross-compilation toolchain gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.
Modify the main program rknn-llm/examples/rkllm_api_demo/src/llm_demo.cpp code, change two places here.
```
184 text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX;
185 // text = input_str;
```
Modify the GCC_COMPILER_PATH in the rknn-llm/examples/rkllm_api_demo/build-linux.sh compilation script.
```
GCC_COMPILER_PATH=gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
```
Run the model conversion script.
```
cd rknn-llm/examples/rkllm_api_demo/
bash build-linux.sh
```
The generated executable file is located in build/build_linux_aarch64_Release/llm_demo.

Board Deployment

Terminal Mode

Copy the converted rkllm model and the compiled binary file llm_demo to the board.

Import environment variables.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:rknn-llm/rkllm-runtime/Linux/librkllm_api/aarch64

Run llm_demo and enter exit to quit.

export RKLLM_LOG_LEVEL=1
./llm_demo your_rkllm_path 10000 10000

Performance Comparison of Models

Model	Parameter Size	Chip	Chip Count	Inference Speed
TinyLlama	1.1B	RK3588	1	15.03 token/s
Qwen	1.8B	RK3588	1	14.18 token/s
Phi3	3.8B	RK3588	1	6.46 token/s
ChatGLM3	6B	RK3588	1	3.67 token/s

RKLLM Usage and Deploy LLM

Currently Supported Models​

Model Conversion​

Compile Executable File​

Board Deployment​

Terminal Mode​

Performance Comparison of Models​