Skip to main content

RKLLM DeepSeek-R1

DeepSeek-R1 is a state-of-the-art reasoning model developed by DeepSeek. DeepSeek has open-sourced the training approach and model weights, and its performance is competitive with closed-source reasoning models. DeepSeek also released multiple distilled open-source lightweight variants (covering the Qwen2.5 and Llama3.1 families) using knowledge distillation. This document demonstrates how to deploy the distilled DeepSeek-R1-Distill-Qwen-1.5B model to an RK3588 device with the RKLLM toolchain and run hardware-accelerated inference on the built-in NPU.

rkllm_2.webp

Quick Start

Download the demo

Download the complete demo from ModelScope.

For virtual environment setup, refer to Virtual Environment Usage.

Device
python3 -m venv .venv && source .venv/bin/activate
pip install -U modelscope
modelscope download --model radxa/DeepSeek-R1-Distill-Qwen-1.5B_RKLLM --local_dir ./DeepSeek-R1-Distill-Qwen-1.5B_RKLLM

Run the Example

Device
cd DeepSeek-R1-Distill-Qwen-1.5B_RKLLM/demo_Linux_aarch64/
export LD_LIBRARY_PATH=./lib
chmod +x ./llm_demo
./llm_demo ../DeepSeek-R1-Distill-Qwen-1.5B_W8A8_RK3588.rkllm 2048 4096

Full Conversion Workflow

Prerequisites

Set up the development environment by following RKLLM Installation.

Version note

Running this example with RKLLM 1.2.3 may cause severe quality degradation (repetitive output). It is recommended to use RKLLM 1.2.2 for this demo. See: GitHub Issue.

Activate the virtual environment

For virtual environment setup, refer to Create Virtual Environment.

X64 Linux PC
conda activate rkllm
pip install -U huggingface_hub

Download the Model

X64 Linux PC
cd RK-SDK/rknn-llm/examples/rkllm_api_demo/
hf download deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local-dir ./DeepSeek-R1-Distill-Qwen-1.5B

Model Conversion

Generate a quantization calibration file and export the model to the RKLLM format.

tip

If you need a different max_context length, adjust the max_context parameter in the llm.build call in export_rkllm.py. The default is 4096. Larger values use more memory. The value must be ≤ 16384 and a multiple of 32 (e.g., 32, 64, 96, …, 16384).

X64 Linux PC
cd export/
python generate_data_quant.py -m ../DeepSeek-R1-Distill-Qwen-1.5B -o ../DeepSeek-R1-Distill-Qwen-1.5B/data_quant.json
# Before running, update the model path and calibration file path as needed.
python export_rkllm.py

Build the executable

For cross-compiler setup, refer to Compiler Tools.

X64 Linux PC
cd ../deploy/
# Export the cross-compiler path.
export GCC_COMPILER=/path/to/your/gcc/bin/aarch64-linux-gnu
bash build-linux.sh

The generated binaries are located at install/demo_Linux_aarch64.

Deploy to the device

Copy the converted model and the built demo_Linux_aarch64 directory to the device.

Device
cd demo_Linux_aarch64/
export RKLLM_LOG_LEVEL=1
export LD_LIBRARY_PATH=./lib
./llm_demo ../DeepSeek-R1-Distill-Qwen-1.5B_W8A8_RK3588.rkllm 2048 4096

Run the demo. Type exit to quit.

Device
./llm_demo ../DeepSeek-R1-Distill-Qwen-1.5B_W8A8_RK3588.rkllm 2048 4096
$ ./llm_demo ../DeepSeek-R1-Distill-Qwen-1.5B_W8A8_RK3588.rkllm 2048 4096
rkllm init start
I rkllm: rkllm-runtime version: 1.2.2, rknpu driver version: 0.9.8, platform: RK3588
...
rkllm init success

user: Solve x+y=14 and 2x+4y=38.
assistant: x=9, y=5
ParameterRequiredDescriptionNotes
pathYesPath to the RKLLM modelN/A
max_new_tokensYesMax generated tokens/turnMust be ≤ max_context_len
max_context_lenYesMax context lengthMust be ≤ export max_context

Performance

For the math prompt: Solve x+y=12 and 2x+4y=34. Find x and y.,

RK3588 achieves 15.36 tokens/s:

StageTotal Time (ms)TokensTime per Token (ms)Tokens per Second
Prefill122.70294.23236.35
Generate27539.1642365.1015.36

RK3582 achieves 10.61 tokens/s:

StageTotal Time (ms)TokensTime per Token (ms)Tokens per Second
Prefill599.71817.4135.07
Generate76866.4185194.2510.61

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0