Skip to main content

RKLLM Usage and Deploy LLM

This document explains how to deploy large language models in Huggingface format to the RK3588 with NPU for hardware-accelerated inference using RKLLM.

Currently Supported Models

This guide uses Qwen2.5-1.5B-Instruct as an example to show how to deploy a large language model from scratch on a development board equipped with the RK3588 chip and use the NPU for hardware-accelerated inference.

tip

If the RKLLM environment is not installed and configured, please refer to RKLLM Installation.

Model Conversion

Using Qwen2.5-1.5B-Instruct as an example, users can also choose any of the links in the currently supported models list.

  • Download all files of Qwen2.5-1.5B-Instruct on an x86 PC workstation. If git-lfs is not installed, please install it.
    git clone https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct
  • Activate the rkllm conda environment. Refer to RKLLM conda Installation if needed.
    conda activate rkllm
  • Change modelpath model path, dataset path, rkllm export path in rknn-llm/rkllm-toolkit/examples/test.py.
    15 modelpath = 'Your Huggingface LLM model'
    29 datasert = None # 默认是 "./data_quant.json", 如无可以填写 None
    83 ret = llm.export_rkllm("./Your_Huggingface_LLM_model.rkllm")
  • Run the model conversion script.
    cd rknn-llm/rkllm-toolkit/examples/
    python3 test.py
    After successful conversion, you will get an rkllm model.

Compile Executable File

  • Download the cross-compilation toolchain gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu.
  • Modify the main program rknn-llm/examples/rkllm_api_demo/src/llm_demo.cpp code, change two places here.
    184 text = PROMPT_TEXT_PREFIX + input_str + PROMPT_TEXT_POSTFIX;
    185 // text = input_str;
  • Modify the GCC_COMPILER_PATH in the rknn-llm/examples/rkllm_api_demo/build-linux.sh compilation script.
    GCC_COMPILER_PATH=gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
  • Run the model conversion script.
    cd rknn-llm/examples/rkllm_api_demo/
    bash build-linux.sh
    The generated executable file is located in build/build_linux_aarch64_Release/llm_demo.

Board Deployment

Terminal Mode

  • Copy the converted rkllm model and the compiled binary file llm_demo to the board.
  • Import environment variables.
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:rknn-llm/rkllm-runtime/Linux/librkllm_api/aarch64
  • Run llm_demo and enter exit to quit.
    export RKLLM_LOG_LEVEL=1
    ./llm_demo your_rkllm_path 10000 10000
    rkllm_2.webp

Performance Comparison of Models

ModelParameter SizeChipChip CountInference Speed
TinyLlama1.1BRK3588115.03 token/s
Qwen1.8BRK3588114.18 token/s
Phi33.8BRK358816.46 token/s
ChatGLM36BRK358813.67 token/s