Whisper

Whisper is an open-source general-purpose speech recognition model released by OpenAI. Pre-trained on 680,000 hours of large-scale multilingual data, it is highly robust and can handle complex background noise and various accents.

Key features: Supports high-accuracy multilingual speech-to-text, automatic language detection, and speech translation.
Version notes: This example uses the Whisper Medium Multilingual model. As a mid-sized member of the family, it balances accuracy (including Chinese and other languages) with inference efficiency, making it a mainstream choice that balances performance and speed.

Environment setup

You need to set up the environment in advance.

Quick start

Download the model

O6 / O6N

cd ai_model_hub_25_Q3/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/whisper_medium_multilingual_decoder.cix
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/whisper_medium_multilingual_encoder.cix

Install dependencies

O6 / O6N

sudo apt update
sudo apt install ffmpeg

Test the model

info

Activate the virtual environment before running.

O6 / O6N

python3 inference_npu.py

Full conversion workflow

Download model files

Linux PC

cd ai_model_hub_25_Q3/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model/whisper_medium_multilingual_decoder.onnx
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Audio/Speech_Recognotion/onnx_whisper_medium_multilingual/model/whisper_medium_multilingual_encoder.onnx

Project structure

.
├── cfg
├── datasets
├── inference_npu.py
├── inference_onnx.py
├── model
├── ReadMe.md
├── test_data
├── whisper
├── whisper-medium
├── whisper_medium_multilingual_decoder.cix
└── whisper_medium_multilingual_encoder.cix

Quantize and convert the model

Convert the encoder

Linux PC

cd ..
cixbuild cfg/whisper_medium_multilingual_encoder/whisper_medium_multilingual_encoder_build.cfg

Convert the decoder

Linux PC

cixbuild cfg/whisper_medium_multilingual_decoder/whisper_medium_multilingual_decoder_build.cfg

Copy to device

After conversion, copy the .cix model files to the device.

Test inference on the host

Install ffmpeg

Linux PC

sudo apt update
sudo apt install ffmpeg

Run the inference script

Linux PC

python3 inference_onnx.py

Inference output

A file named test_audio_npu.txt will be generated under the output directory.

They regain their apartment, apparently without disturbing the household of Gainwell.

Deploy on NPU

Install ffmpeg

O6 / O6N

sudo apt update
sudo apt install ffmpeg

Run the inference script

O6 / O6N

python3 inference_npu.py --backend npu --encoder_model_path whisper_medium_multilingual_encoder.cix --decoder_model_path whisper_medium_multilingual_decoder.cix

Inference output

O6 / O6N

$ python3 inference_npu.py --backend npu --encoder_model_path whisper_medium_multilingual_encoder.cix --decoder_model_path whisper_medium_multilingual_decoder.cix
2025-12-29 10:55:26.758036920 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card3/device/vendor"
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 1.
Output tensor count is 1.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 5.
Output tensor count is 2.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success

A file named test_audio_npu.txt will be generated under the output directory.

They regain their apartment, apparently without disturbing the household of Gainwell.

Quick start​

Download the model​

Install dependencies​

Test the model​

Full conversion workflow​

Download model files​

Project structure​

Quantize and convert the model​

Test inference on the host​

Install ffmpeg​

Run the inference script​

Inference output​

Deploy on NPU​

Install ffmpeg​

Run the inference script​

Inference output​

Quick start

Download the model

Install dependencies

Test the model

Full conversion workflow

Download model files

Project structure

Quantize and convert the model

Test inference on the host

Install ffmpeg

Run the inference script

Inference output

Deploy on NPU

Install ffmpeg

Run the inference script

Inference output