跳到主要内容

RKLLM SmolVLM2

SmolVLM2 是由 Hugging Face 开发的紧凑但功能强大的视觉大模型,旨在为资源受限的设备(如智能手机和嵌入式系统)带来先进的视觉语言处理能力。 这些模型以小型化设计著称,适合在紧凑型设备上运行,填补了大型模型与小型设备性能差距的空白。 本文档将讲述如何使用 RKLLM 将 SmolVLM2 256M / 500M / 2.2B 部署到 RK3588 上利用 NPU 进行硬件加速推理。

提示

原创信息

此模型由瑞莎社区用户 @Rients Politiek 提供

瑞莎社区论坛帖子地址 SmolVLM2 for RK3588 NPU

模型部署

SmolVLM2 模型共有三种规格,请按需求选择所需参数

参数选择

Device
export MODEL_SIZE=256m REPO_SIZE=256M

代码下载

Device
git clone https://github.com/Qengineering/SmolVLM2-${REPO_SIZE}-NPU.git && cd SmolVLM2-${REPO_SIZE}-NPU

编译项目

下载依赖

Device
sudo apt update
sudo apt install cmake gcc g++ make libopencv-dev

cmake 编译

Device
cmake -B build -DRK_LIB_PATH=${PWD}/aarch64/library -DCMAKE_CXX_FLAGS="-I${PWD}/aarch64/include"
cmake --build build -j4

下载模型

安装 hf-cli

Device
curl -LsSf https://hf.co/cli/install.sh | bash

下载模型

Device
hf download Qengineering/SmolVLM2-${MODEL_SIZE}-rk3588 --local-dir ./SmolVLM2-${MODEL_SIZE}-rk3588

运行例子

Device
export RKLLM_LOG_LEVEL=1
# VLM_NPU Picture RKNN_model RKLLM_model NewTokens ContextLength
./VLM_NPU ./Moon.jpg ./SmolVLM2-${MODEL_SIZE}-rk3588/smolvlm2_${MODEL_SIZE}_vision_fp16_rk3588.rknn ./SmolVLM2-${MODEL_SIZE}-rk3588/smolvlm2-${MODEL_SIZE}-instruct_w8a8_rk3588.rkllm 2048 4096

input image

prompt: <image>Describe the image.
rock@rock-5b-plus:~/SmolVLM2-256M-NPU$ ./VLM_NPU ./Moon.jpg ./SmolVLM2-${MODEL_SIZE}-rk3588/smolvlm2_${MODEL_SIZE}_vision_fp16_rk3588.rknn ./SmolVLM2-${MODEL_SIZE}-rk3588/smolvlm2-${MODEL_SIZE}-instruct_w8a8_rk3588.rkllm 2048 4096
I rkllm: rkllm-runtime version: 1.2.3, rknpu driver version: 0.9.8, platform: RK3588
I rkllm: loading rkllm model from ./SmolVLM2-256m-rk3588/smolvlm2-256m-instruct_w8a8_rk3588.rkllm
I rkllm: rkllm-toolkit version: 1.2.2, max_context_limit: 4096, npu_core_num: 3, target_platform: RK3588, model_dtype: W8A8
I rkllm: Enabled cpus: [4, 5, 6, 7]
I rkllm: Enabled cpus num: 4
rkllm init success
I rkllm: reset chat template:
I rkllm: system_prompt: <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n
I rkllm: prompt_prefix: <|im_start|>user\n
I rkllm: prompt_postfix: <|im_end|>\n<|im_start|>assistant\n
W rkllm: Calling rkllm_set_chat_template will disable the internal automatic chat template parsing, including enable_thinking. Make sure your custom prompt is complete and valid.

used NPU cores 3

model input num: 1, output num: 1

Input tensors:
index=0, name=pixel_values, n_dims=4, dims=[1, 384, 384, 3], n_elems=442368, size=884736, fmt=NHWC, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000

Output tensors:
index=0, name=output, n_dims=3, dims=[1, 36, 576, 0], n_elems=20736, size=41472, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000

Model input height=384, width=384, channel=3


User: <image>Describe the image.
Answer: The image depicts a scene from space, specifically looking at the moon's surface. The moon is in the process of being tidied up and has been cleaned to remove any debris or stains that might have accumulated over time. The overall atmosphere appears to be clear and bright, with no visible signs of pollution or other human activity.

The image also includes a large number of small objects scattered across the surface of the moon, which appear to be rocks or boulders. These objects are scattered randomly around the moon's surface, creating a sense of randomness and disorder. The overall atmosphere is calm and serene, with no signs of any movement or activity in the scene.

Overall, this image gives a sense of the beauty and cleanliness of the lunar environment, as well as the ongoing process of tidying up the moon's surface.
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Model init time (ms) 227.84
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Prefill 97.59 78 1.25 799.24
I rkllm: Generate 2643.09 166 15.92 62.81
I rkllm: --------------------------------------------------------------------------------------
I rkllm: Peak Memory Usage (GB)
I rkllm: 0.59
I rkllm: --------------------------------------------------------------------------------------

性能分析

在 ROCK5B+ 上达 62.81 token/s,

StageTotal Time (ms)TokensTime per Token (ms)Tokens per Second
Prefill97.59781.25799.24
Generate2643.0916615.9262.81

内存使用

256M500M2.2B
Peak Memory Usage (GB)0.590.883.39

    您需要登录 GitHub 才能发表评论。如果您已登录,请忽略此消息。

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0