Whisper
Whisper 是由 OpenAI 推出的开源通用语音识别模型。它通过 68 万小时的大规模多语种数据预训练,具备极强的鲁棒性,能够从容应对复杂背景噪声和各类口音。
- 核心特点:支持高精度的多语种语音转文字、语种自动检测以及语音翻译。
- 版本说明:本案例采用 Whisper Medium Multilingual 模型。作为家族中的中量级成员,它在保证中文及多语言识别准确率的同时,兼顾了推理效率,是目前兼具性能与速度的主流平衡选择。
环境配置
需要提前配置好相关环境。
快速开始
下载模型
O6 / O6N
cd ai_model_hub_25_Q3/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual
wget -O whisper_medium_multilingual_decoder.cix https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/whisper_medium_multilingual_decoder.cix
wget -O whisper_medium_multilingual_encoder.cix https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/whisper_medium_multilingual_encoder.cix
安装依赖
O6 / O6N
sudo apt update
sudo apt install ffmpeg
模型测试
信息
运行前激活虚拟环境!
O6 / O6N
python3 inference_npu.py
完整转换流程
下载模型文件
Linux PC
cd ai_model_hub_25_Q3/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model
wget -O whisper_medium_multilingual_decoder.onnx https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model/whisper_medium_multilingual_decoder.onnx
wget -O whisper_medium_multilingual_encoder.onnx https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model/whisper_medium_multilingual_encoder.onnx
项目结构
.
├── cfg
├── datasets
├── inference_npu.py
├── inference_onnx.py
├── model
├── ReadMe.md
├── test_data
├── whisper
├── whisper-medium
├── whisper_medium_multilingual_decoder.cix
└── whisper_medium_multilingual_encoder.cix
进行模型量化和转换
转换编码器部分
Linux PC
cd ..
cixbuild cfg/whisper_medium_multilingual_encoder/whisper_medium_multilingual_encoder_build.cfg
转换解码器部分
Linux PC
cixbuild cfg/whisper_medium_multilingual_decoder/whisper_medium_multilingual_decoder_build.cfg
推送到板端
完成模型转换之后需要将 cix 模型文件推送到板端。
测试主机推理
安装 ffmpeg
Linux PC
sudo apt update
sudo apt install ffmpeg
运行推理脚本
Linux PC
python3 inference_onnx.py
模型推理结果
会在 output 目录下生成 test_audio_npu.txt 文件。
They regain their apartment, apparently without disturbing the household of Gainwell.
进行 NPU 部署
安装 ffmpeg
O6 / O6N
sudo apt update
sudo apt install ffmpeg
运行推理脚本
O6 / O6N
python3 inference_npu.py --backend npu --encoder_model_path whisper_medium_multilingual_encoder.cix --decoder_model_path whisper_medium_multilingual_decoder.cix
模型推理结果
O6 / O6N
$ python3 inference_npu.py --backend npu --encoder_model_path whisper_medium_multilingual_encoder.cix --decoder_model_path whisper_medium_multilingual_decoder.cix
2025-12-29 10:55:26.758036920 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card3/device/vendor"
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 1.
Output tensor count is 1.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 5.
Output tensor count is 2.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success
会在 output 目录下生成 test_audio_npu.txt 文件。
They regain their apartment, apparently without disturbing the household of Gainwell.