跳到主要内容

Whisper

Whisper 是由 OpenAI 推出的开源通用语音识别模型。它通过 68 万小时的大规模多语种数据预训练,具备极强的鲁棒性,能够从容应对复杂背景噪声和各类口音。

  • 核心特点:支持高精度的多语种语音转文字、语种自动检测以及语音翻译。
  • 版本说明:本案例采用 Whisper Medium Multilingual 模型。作为家族中的中量级成员,它在保证中文及多语言识别准确率的同时,兼顾了推理效率,是目前兼具性能与速度的主流平衡选择。
环境配置

需要提前配置好相关环境。

快速开始

下载模型

O6 / O6N
cd ai_model_hub_25_Q3/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual
wget -O whisper_medium_multilingual_decoder.cix https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/whisper_medium_multilingual_decoder.cix
wget -O whisper_medium_multilingual_encoder.cix https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/whisper_medium_multilingual_encoder.cix

安装依赖

O6 / O6N
sudo apt update
sudo apt install ffmpeg

模型测试

信息

运行前激活虚拟环境!

O6 / O6N
python3 inference_npu.py

完整转换流程

下载模型文件

Linux PC
cd ai_model_hub_25_Q3/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model
wget -O whisper_medium_multilingual_decoder.onnx https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model/whisper_medium_multilingual_decoder.onnx
wget -O whisper_medium_multilingual_encoder.onnx https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model/whisper_medium_multilingual_encoder.onnx

项目结构

.
├── cfg
├── datasets
├── inference_npu.py
├── inference_onnx.py
├── model
├── ReadMe.md
├── test_data
├── whisper
├── whisper-medium
├── whisper_medium_multilingual_decoder.cix
└── whisper_medium_multilingual_encoder.cix

进行模型量化和转换

转换编码器部分

Linux PC
cd ..
cixbuild cfg/whisper_medium_multilingual_encoder/whisper_medium_multilingual_encoder_build.cfg

转换解码器部分

Linux PC
cixbuild cfg/whisper_medium_multilingual_decoder/whisper_medium_multilingual_decoder_build.cfg
推送到板端

完成模型转换之后需要将 cix 模型文件推送到板端。

测试主机推理

安装 ffmpeg

Linux PC
sudo apt update
sudo apt install ffmpeg

运行推理脚本

Linux PC
python3 inference_onnx.py

模型推理结果

会在 output 目录下生成 test_audio_npu.txt 文件。

They regain their apartment, apparently without disturbing the household of Gainwell.

进行 NPU 部署

安装 ffmpeg

O6 / O6N
sudo apt update
sudo apt install ffmpeg

运行推理脚本

O6 / O6N
python3 inference_npu.py --backend npu --encoder_model_path whisper_medium_multilingual_encoder.cix --decoder_model_path whisper_medium_multilingual_decoder.cix

模型推理结果

O6 / O6N
$ python3 inference_npu.py --backend npu --encoder_model_path whisper_medium_multilingual_encoder.cix --decoder_model_path whisper_medium_multilingual_decoder.cix
2025-12-29 10:55:26.758036920 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card3/device/vendor"
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 1.
Output tensor count is 1.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 5.
Output tensor count is 2.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success

会在 output 目录下生成 test_audio_npu.txt 文件。

They regain their apartment, apparently without disturbing the household of Gainwell.

    您需要登录 GitHub 才能发表评论。如果您已登录,请忽略此消息。

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0