Skip to main content

Whisper-Base Example

This document explains how to use the QAI AppBuilder Python API to perform inference with the Whisper-Base speech recognition model using the Qualcomm® Hexagon™ Processor (NPU).

Supported Devices

DeviceSoC
Fogwise® AIRbox Q900QCS9075

Install QAI AppBuilder

tip
  1. Please install QAI AppBuilder according to the QAI AppBuilder Installation Guide.

  2. Please configure the ADSP environment variables according to Configuring ADSP Environment Variables.

Run Example

Install Dependencies

Device
pip3 install requests tqdm qai-hub py3_wget opencv-python torch torchvision matplotlib openai-whisper audio2numpy samplerate transformers qai_hub_models==0.30.2

Run the Script

  • Navigate to the example directory

    Device
    cd ai-engine-direct-helper/samples/python
  • Prepare the input audio. The following audio is used as an example:

    input audio

  • Execute inference

    Device
    python3 whisper_base_en/whisper_base_en.py
    $ python3 whisper_base_en/whisper_base_en.py
    0.0ms [WARNING] <W> Initializing HtpProvider

    /prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.37.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    184.8ms [WARNING] Time: Read model file to memory. 71.89

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    242.3ms [WARNING] Time: contextCreateFromBinary. 57.39

    242.3ms [WARNING] Time: UnmapViewOfFile. 0.00

    244.9ms [WARNING] Time: model_initialize whisper_decoder 244.80

    282.6ms [WARNING] Time: Read model file to memory. 37.32

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    636.8ms [WARNING] Time: contextCreateFromBinary. 354.14

    636.9ms [WARNING] Time: UnmapViewOfFile. 0.00

    638.6ms [WARNING] Time: model_initialize whisper_encoder 393.56

    1135.2ms [WARNING] Time: model_inference whisper_encoder 440.09

    time consumes for encoder 0.4419412612915039(s)
    Decoder Inference k_cache_cross type <class 'numpy.ndarray'> shape (6, 8, 64, 1500) type float32
    Decoder Inference v_cache_cross type <class 'numpy.ndarray'> shape (6, 8, 1500, 64) type float32
    start decode sample_len 224
    1470.7ms [WARNING] Time: model_inference whisper_decoder 323.97

    time consumes for decoder 0.3242673873901367(s)
    1796.0ms [WARNING] Time: model_inference whisper_decoder 322.57

    time consumes for decoder 0.3229062557220459(s)
    2111.1ms [WARNING] Time: model_inference whisper_decoder 313.66

    time consumes for decoder 0.31383657455444336(s)
    2425.6ms [WARNING] Time: model_inference whisper_decoder 313.01

    time consumes for decoder 0.3132438659667969(s)
    2740.2ms [WARNING] Time: model_inference whisper_decoder 313.24

    time consumes for decoder 0.3134174346923828(s)
    3055.0ms [WARNING] Time: model_inference whisper_decoder 313.45

    time consumes for decoder 0.3136253356933594(s)
    3369.6ms [WARNING] Time: model_inference whisper_decoder 313.31

    time consumes for decoder 0.31349658966064453(s)
    3684.4ms [WARNING] Time: model_inference whisper_decoder 313.44

    time consumes for decoder 0.3136255741119385(s)
    3999.4ms [WARNING] Time: model_inference whisper_decoder 313.63

    time consumes for decoder 0.3138108253479004(s)
    4314.0ms [WARNING] Time: model_inference whisper_decoder 313.20

    time consumes for decoder 0.31337976455688477(s)
    4628.7ms [WARNING] Time: model_inference whisper_decoder 313.41

    time consumes for decoder 0.313596248626709(s)
    4943.4ms [WARNING] Time: model_inference whisper_decoder 313.37

    time consumes for decoder 0.3135509490966797(s)
    5258.1ms [WARNING] Time: model_inference whisper_decoder 313.29

    time consumes for decoder 0.31348109245300293(s)
    5572.8ms [WARNING] Time: model_inference whisper_decoder 313.35

    time consumes for decoder 0.31352949142456055(s)
    5887.5ms [WARNING] Time: model_inference whisper_decoder 313.35

    time consumes for decoder 0.3135380744934082(s)
    6201.4ms [WARNING] Time: model_inference whisper_decoder 312.50

    time consumes for decoder 0.31267857551574707(s)
    6515.8ms [WARNING] Time: model_inference whisper_decoder 313.02

    time consumes for decoder 0.31319570541381836(s)
    6830.7ms [WARNING] Time: model_inference whisper_decoder 313.52

    time consumes for decoder 0.31371450424194336(s)
    7145.6ms [WARNING] Time: model_inference whisper_decoder 313.57

    time consumes for decoder 0.31376171112060547(s)
    7459.9ms [WARNING] Time: model_inference whisper_decoder 313.03

    time consumes for decoder 0.3132154941558838(s)
    7774.9ms [WARNING] Time: model_inference whisper_decoder 313.60

    time consumes for decoder 0.3137829303741455(s)
    8089.4ms [WARNING] Time: model_inference whisper_decoder 313.16

    time consumes for decoder 0.31334543228149414(s)
    8404.7ms [WARNING] Time: model_inference whisper_decoder 313.92

    time consumes for decoder 0.31411004066467285(s)
    8719.6ms [WARNING] Time: model_inference whisper_decoder 313.62

    time consumes for decoder 0.31380605697631836(s)
    9034.5ms [WARNING] Time: model_inference whisper_decoder 313.51

    time consumes for decoder 0.31369900703430176(s)
    9349.9ms [WARNING] Time: model_inference whisper_decoder 314.01

    time consumes for decoder 0.31418848037719727(s)
    9664.7ms [WARNING] Time: model_inference whisper_decoder 313.53

    time consumes for decoder 0.3137087821960449(s)
    9979.7ms [WARNING] Time: model_inference whisper_decoder 313.63

    time consumes for decoder 0.3138093948364258(s)
    Transcription: And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    10095.0ms [WARNING] Time: model_destroy whisper_decoder 13.25

    <W> Logs will be sent to the system's default channel
    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    0.0ms [WARNING] <W> This META does not have Alloc2 Support

    /prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.37.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc
    10198.2ms [WARNING] Time: model_destroy whisper_encoder 103.03

    Recognition Results

    Transcription: And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0