跳到主要内容

Whisper-Base 示例

此文档讲述使用 QAI AppBuilder Python API 利用 Qualcomm® Hexagon™ Processor (NPU) 推理 Whisper-Base 语音识别模型。

示例支持设备

设备SoC
Fogwise® AIRbox Q900QCS9075

安装 QAI AppBuilder

提示
  1. 请根据 QAI AppBuilder 安装方法 安装 QAI AppBuilder。

  2. 请根据 创建 ADSP 环境变量 配置 ADSP 环境变量。

运行示例

安装依赖

Device
pip3 install requests tqdm qai-hub py3_wget opencv-python torch torchvision matplotlib openai-whisper audio2numpy samplerate transformers qai_hub_models==0.30.2

运行脚本

  • 进入示例目录

    Device
    cd ai-engine-direct-helper/samples/python
  • 准备输入音频,这里以以下音频为输入示例

    input audio

  • 执行推理

    Device
    python3 whisper_base_en/whisper_base_en.py
    $ python3 whisper_base_en/whisper_base_en.py
    0.0ms [WARNING] <W> Initializing HtpProvider

    /prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.37.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    184.8ms [WARNING] Time: Read model file to memory. 71.89

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    242.3ms [WARNING] Time: contextCreateFromBinary. 57.39

    242.3ms [WARNING] Time: UnmapViewOfFile. 0.00

    244.9ms [WARNING] Time: model_initialize whisper_decoder 244.80

    282.6ms [WARNING] Time: Read model file to memory. 37.32

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    636.8ms [WARNING] Time: contextCreateFromBinary. 354.14

    636.9ms [WARNING] Time: UnmapViewOfFile. 0.00

    638.6ms [WARNING] Time: model_initialize whisper_encoder 393.56

    1135.2ms [WARNING] Time: model_inference whisper_encoder 440.09

    time consumes for encoder 0.4419412612915039(s)
    Decoder Inference k_cache_cross type <class 'numpy.ndarray'> shape (6, 8, 64, 1500) type float32
    Decoder Inference v_cache_cross type <class 'numpy.ndarray'> shape (6, 8, 1500, 64) type float32
    start decode sample_len 224
    1470.7ms [WARNING] Time: model_inference whisper_decoder 323.97

    time consumes for decoder 0.3242673873901367(s)
    1796.0ms [WARNING] Time: model_inference whisper_decoder 322.57

    time consumes for decoder 0.3229062557220459(s)
    2111.1ms [WARNING] Time: model_inference whisper_decoder 313.66

    time consumes for decoder 0.31383657455444336(s)
    2425.6ms [WARNING] Time: model_inference whisper_decoder 313.01

    time consumes for decoder 0.3132438659667969(s)
    2740.2ms [WARNING] Time: model_inference whisper_decoder 313.24

    time consumes for decoder 0.3134174346923828(s)
    3055.0ms [WARNING] Time: model_inference whisper_decoder 313.45

    time consumes for decoder 0.3136253356933594(s)
    3369.6ms [WARNING] Time: model_inference whisper_decoder 313.31

    time consumes for decoder 0.31349658966064453(s)
    3684.4ms [WARNING] Time: model_inference whisper_decoder 313.44

    time consumes for decoder 0.3136255741119385(s)
    3999.4ms [WARNING] Time: model_inference whisper_decoder 313.63

    time consumes for decoder 0.3138108253479004(s)
    4314.0ms [WARNING] Time: model_inference whisper_decoder 313.20

    time consumes for decoder 0.31337976455688477(s)
    4628.7ms [WARNING] Time: model_inference whisper_decoder 313.41

    time consumes for decoder 0.313596248626709(s)
    4943.4ms [WARNING] Time: model_inference whisper_decoder 313.37

    time consumes for decoder 0.3135509490966797(s)
    5258.1ms [WARNING] Time: model_inference whisper_decoder 313.29

    time consumes for decoder 0.31348109245300293(s)
    5572.8ms [WARNING] Time: model_inference whisper_decoder 313.35

    time consumes for decoder 0.31352949142456055(s)
    5887.5ms [WARNING] Time: model_inference whisper_decoder 313.35

    time consumes for decoder 0.3135380744934082(s)
    6201.4ms [WARNING] Time: model_inference whisper_decoder 312.50

    time consumes for decoder 0.31267857551574707(s)
    6515.8ms [WARNING] Time: model_inference whisper_decoder 313.02

    time consumes for decoder 0.31319570541381836(s)
    6830.7ms [WARNING] Time: model_inference whisper_decoder 313.52

    time consumes for decoder 0.31371450424194336(s)
    7145.6ms [WARNING] Time: model_inference whisper_decoder 313.57

    time consumes for decoder 0.31376171112060547(s)
    7459.9ms [WARNING] Time: model_inference whisper_decoder 313.03

    time consumes for decoder 0.3132154941558838(s)
    7774.9ms [WARNING] Time: model_inference whisper_decoder 313.60

    time consumes for decoder 0.3137829303741455(s)
    8089.4ms [WARNING] Time: model_inference whisper_decoder 313.16

    time consumes for decoder 0.31334543228149414(s)
    8404.7ms [WARNING] Time: model_inference whisper_decoder 313.92

    time consumes for decoder 0.31411004066467285(s)
    8719.6ms [WARNING] Time: model_inference whisper_decoder 313.62

    time consumes for decoder 0.31380605697631836(s)
    9034.5ms [WARNING] Time: model_inference whisper_decoder 313.51

    time consumes for decoder 0.31369900703430176(s)
    9349.9ms [WARNING] Time: model_inference whisper_decoder 314.01

    time consumes for decoder 0.31418848037719727(s)
    9664.7ms [WARNING] Time: model_inference whisper_decoder 313.53

    time consumes for decoder 0.3137087821960449(s)
    9979.7ms [WARNING] Time: model_inference whisper_decoder 313.63

    time consumes for decoder 0.3138093948364258(s)
    Transcription: And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    10095.0ms [WARNING] Time: model_destroy whisper_decoder 13.25

    <W> Logs will be sent to the system's default channel
    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    /prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.37.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc
    10198.2ms [WARNING] Time: model_destroy whisper_encoder 103.03

    识别结果

    Transcription: And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

    您需要登录 GitHub 才能发表评论。如果您已登录,请忽略此消息。

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0