YOLO World

环境配置

信息

参考 RKNN 安装配置好相关环境。

参考 RKNN Model Zoo 下载示例文件。

模型下载

下载 onnx 模型文件。

X64 Linux PC

cd rknn_model_zoo/examples/yolo_world/model/
bash download_model.sh

模型转换

选择目标平台。

rk3588
rk356x
rk3576

X64 Linux PC

export TARGET_PLATFORM=rk3588

X64 Linux PC

export TARGET_PLATFORM=rk356x

X64 Linux PC

export TARGET_PLATFORM=rk3576

将 onnx 模型转换为 rknn 模型。

X64 Linux PC

cd ../python/
python convert.py ../model/clip_text.onnx ${TARGET_PLATFORM}
python convert.py ../model/yolo_world_v2s.onnx ${TARGET_PLATFORM}

C API

编译示例

切换到 rknn_model_zoo 目录下执行 build-linux.sh 编译脚本。

X64 Linux PC

cd ../../..
bash build-linux.sh -t ${TARGET_PLATFORM} -a aarch64 -d yolo_world

文件同步

然后将编译生成的 install 目录下的 demo 目录推送到板端。

X64 Linux PC

cd install/${TARGET_PLATFORM}_linux_aarch64/
scp -r rknn_yolo_world_demo/ user@your_device_ip:target_directory

运行示例

导出运行时库到环境变量。

Device

cd rknn_yolo_world_demo/
export LD_LIBRARY_PATH=./lib

运行示例。

Device

./rknn_yolo_world_demo ./model/clip_text.rknn ./model/detect_classes.txt ./model/yolo_world_v2s.rknn ./model/bus.jpg

$ ./rknn_yolo_world_demo ./model/clip_text.rknn ./model/detect_classes.txt ./model/yolo_world_v2s.rknn ./model/bus.jpg
--> init clip text model
model input num: 1, output num: 1
input tensors:
  index=0, name=input_ids, n_dims=2, dims=[1, 20], n_elems=20, size=160, fmt=UNDEFINED, type=INT64, qnt_type=AFFINE, zp=0, scale=1.000000
output tensors:
  index=0, name=text_embeds, n_dims=2, dims=[1, 512], n_elems=512, size=1024, fmt=UNDEFINED, type=FP16, qnt_type=AFFINE, zp=0, scale=1.000000
load label ./model/detect_classes.txt
--> init yolo world model
model input num: 2, output num: 6
input tensors:
  index=0, name=images, n_dims=4, dims=[1, 640, 640, 3], n_elems=1228800, size=1228800, fmt=NHWC, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003922
  index=1, name=texts, n_dims=3, dims=[1, 80, 512], n_elems=40960, size=40960, fmt=UNDEFINED, type=INT8, qnt_type=AFFINE, zp=-52, scale=0.003410
output tensors:
  index=0, name=1168, n_dims=4, dims=[1, 80, 80, 80], n_elems=512000, size=512000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003214
  index=1, name=1076, n_dims=4, dims=[1, 4, 80, 80], n_elems=25600, size=25600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.054310
  index=2, name=1170, n_dims=4, dims=[1, 80, 40, 40], n_elems=128000, size=128000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003697
  index=3, name=1121, n_dims=4, dims=[1, 4, 40, 40], n_elems=6400, size=6400, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.057563
  index=4, name=1172, n_dims=4, dims=[1, 80, 20, 20], n_elems=32000, size=32000, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.003884
  index=5, name=1166, n_dims=4, dims=[1, 4, 20, 20], n_elems=1600, size=1600, fmt=NCHW, type=INT8, qnt_type=AFFINE, zp=-128, scale=0.058563
model is NHWC input fmt
model input height=640, width=640, channel=3
num_lines=80
origin size=640x640 crop size=640x640
input image: 640 x 640, subsampling: 4:2:0, colorspace: YCbCr, orientation: 1
--> inference clip text model
rknn_run_1
rknn_run_2
rknn_run_3
rknn_run_4
rknn_run_5
rknn_run_6
rknn_run_7
rknn_run_8
rknn_run_9
rknn_run_10
rknn_run_11
rknn_run_12
rknn_run_13
rknn_run_14
rknn_run_15
rknn_run_16
rknn_run_17
rknn_run_18
rknn_run_19
rknn_run_20
rknn_run_21
rknn_run_22
rknn_run_23
rknn_run_24
rknn_run_25
rknn_run_26
rknn_run_27
rknn_run_28
rknn_run_29
rknn_run_30
rknn_run_31
rknn_run_32
rknn_run_33
rknn_run_34
rknn_run_35
rknn_run_36
rknn_run_37
rknn_run_38
rknn_run_39
rknn_run_40
rknn_run_41
rknn_run_42
rknn_run_43
rknn_run_44
rknn_run_45
rknn_run_46
rknn_run_47
rknn_run_48
rknn_run_49
rknn_run_50
rknn_run_51
rknn_run_52
rknn_run_53
rknn_run_54
rknn_run_55
rknn_run_56
rknn_run_57
rknn_run_58
rknn_run_59
rknn_run_60
rknn_run_61
rknn_run_62
rknn_run_63
rknn_run_64
rknn_run_65
rknn_run_66
rknn_run_67
rknn_run_68
rknn_run_69
rknn_run_70
rknn_run_71
rknn_run_72
rknn_run_73
rknn_run_74
rknn_run_75
rknn_run_76
rknn_run_77
rknn_run_78
rknn_run_79
rknn_run_80
--> inference yolo world model
scale=1.000000 dst_box=(0 0 639 639) allow_slight_change=1 _left_offset=0 _top_offset=0 padding_w=0 padding_h=0
rga_api version 1.10.1_[0]
rknn_run
person @ (475 234 559 519) 0.948
person @ (110 237 226 535) 0.948
bus @ (96 135 551 436) 0.932
person @ (212 240 283 510) 0.917
person @ (80 326 125 514) 0.665
write_image path: out.png width=640 height=640 channel=3 data=0xffff8189b010

效果展示

Python API

激活虚拟环境

Device

conda activate rknn

运行示例

将相关文件推送到板端执行下面的命令。

Device

python yolo_world.py --text_model ../model/clip_text.rknn --yolo_world ../model/yolo_world_v2s.rknn --target ${TARGET_PLATFORM}

$ python yolo_world.py --text_model ../model/clip_text.rknn --yolo_world ../model/yolo_world_v2s.rknn --target rk3588
/home/radxa/miniforge3/envs/rknn/lib/python3.12/site-packages/rknn/api/rknn.py:51: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  self.rknn_base = RKNNBase(cur_path, verbose)
I rknn-toolkit2 version: 2.3.2
I target set by user is: rk3588
W inference: The 'data_format' is not set, and its default value is 'nhwc'!
W inference: The 'data_format' is not set, and its default value is 'nhwc'!
W inference: The 'data_format' is not set, and its default value is 'nhwc'!
I rknn-toolkit2 version: 2.3.2
I target set by user is: rk3588
W inference: The 'data_format' is not set, and its default value is 'nhwc'!
   class        score      xmin, ymin, xmax, ymax
--------------------------------------------------
   person       0.948     [ 477,  232,  559,  521]
   person       0.932     [ 110,  236,  226,  536]
   person       0.917     [ 212,  240,  283,  510]
   person       0.595     [  80,  327,  126,  514]
    bus         0.917     [  98,  135,  553,  435]
Save results to result.jpg!

环境配置​

模型下载​

模型转换​

C API​

编译示例​

文件同步​

运行示例​

效果展示​

Python API​

激活虚拟环境​

运行示例​

效果展示​

环境配置

模型下载

模型转换

C API

编译示例

文件同步

运行示例

效果展示

Python API

激活虚拟环境

运行示例

效果展示