Whisper-Base Example

This document explains how to use the QAI AppBuilder Python API to perform inference with the Whisper-Base speech recognition model using the Qualcomm® Hexagon™ Processor (NPU).

Supported Devices

Device	SoC
Fogwise® AIRbox Q900	QCS9075

Install QAI AppBuilder

tip

Please install QAI AppBuilder according to the QAI AppBuilder Installation Guide.
Please configure the ADSP environment variables according to Configuring ADSP Environment Variables.

Run Example

Install Dependencies

Device

pip3 install requests tqdm qai-hub py3_wget opencv-python torch torchvision matplotlib openai-whisper audio2numpy samplerate transformers qai_hub_models==0.30.2

Run the Script

Navigate to the example directory
- QCS9075
Device
cd ai-engine-direct-helper/samples/python
Prepare the input audio. The following audio is used as an example:

input audio

Execute inference

Device

python3 whisper_base_en/whisper_base_en.py

$ python3 whisper_base_en/whisper_base_en.py
     0.0ms [WARNING]  <W> Initializing HtpProvider

/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.37.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

   184.8ms [WARNING] Time: Read model file to memory. 71.89

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

   242.3ms [WARNING] Time: contextCreateFromBinary. 57.39

   242.3ms [WARNING] Time: UnmapViewOfFile. 0.00

   244.9ms [WARNING] Time: model_initialize whisper_decoder 244.80

   282.6ms [WARNING] Time: Read model file to memory. 37.32

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

   636.8ms [WARNING] Time: contextCreateFromBinary. 354.14

   636.9ms [WARNING] Time: UnmapViewOfFile. 0.00

   638.6ms [WARNING] Time: model_initialize whisper_encoder 393.56

  1135.2ms [WARNING] Time: model_inference whisper_encoder 440.09

time consumes for encoder 0.4419412612915039(s)
Decoder Inference k_cache_cross type <class 'numpy.ndarray'> shape  (6, 8, 64, 1500) type  float32
Decoder Inference v_cache_cross type <class 'numpy.ndarray'> shape  (6, 8, 1500, 64) type  float32
start decode sample_len  224
  1470.7ms [WARNING] Time: model_inference whisper_decoder 323.97

time consumes for decoder 0.3242673873901367(s)
  1796.0ms [WARNING] Time: model_inference whisper_decoder 322.57

time consumes for decoder 0.3229062557220459(s)
  2111.1ms [WARNING] Time: model_inference whisper_decoder 313.66

time consumes for decoder 0.31383657455444336(s)
  2425.6ms [WARNING] Time: model_inference whisper_decoder 313.01

time consumes for decoder 0.3132438659667969(s)
  2740.2ms [WARNING] Time: model_inference whisper_decoder 313.24

time consumes for decoder 0.3134174346923828(s)
  3055.0ms [WARNING] Time: model_inference whisper_decoder 313.45

time consumes for decoder 0.3136253356933594(s)
  3369.6ms [WARNING] Time: model_inference whisper_decoder 313.31

time consumes for decoder 0.31349658966064453(s)
  3684.4ms [WARNING] Time: model_inference whisper_decoder 313.44

time consumes for decoder 0.3136255741119385(s)
  3999.4ms [WARNING] Time: model_inference whisper_decoder 313.63

time consumes for decoder 0.3138108253479004(s)
  4314.0ms [WARNING] Time: model_inference whisper_decoder 313.20

time consumes for decoder 0.31337976455688477(s)
  4628.7ms [WARNING] Time: model_inference whisper_decoder 313.41

time consumes for decoder 0.313596248626709(s)
  4943.4ms [WARNING] Time: model_inference whisper_decoder 313.37

time consumes for decoder 0.3135509490966797(s)
  5258.1ms [WARNING] Time: model_inference whisper_decoder 313.29

time consumes for decoder 0.31348109245300293(s)
  5572.8ms [WARNING] Time: model_inference whisper_decoder 313.35

time consumes for decoder 0.31352949142456055(s)
  5887.5ms [WARNING] Time: model_inference whisper_decoder 313.35

time consumes for decoder 0.3135380744934082(s)
  6201.4ms [WARNING] Time: model_inference whisper_decoder 312.50

time consumes for decoder 0.31267857551574707(s)
  6515.8ms [WARNING] Time: model_inference whisper_decoder 313.02

time consumes for decoder 0.31319570541381836(s)
  6830.7ms [WARNING] Time: model_inference whisper_decoder 313.52

time consumes for decoder 0.31371450424194336(s)
  7145.6ms [WARNING] Time: model_inference whisper_decoder 313.57

time consumes for decoder 0.31376171112060547(s)
  7459.9ms [WARNING] Time: model_inference whisper_decoder 313.03

time consumes for decoder 0.3132154941558838(s)
  7774.9ms [WARNING] Time: model_inference whisper_decoder 313.60

time consumes for decoder 0.3137829303741455(s)
  8089.4ms [WARNING] Time: model_inference whisper_decoder 313.16

time consumes for decoder 0.31334543228149414(s)
  8404.7ms [WARNING] Time: model_inference whisper_decoder 313.92

time consumes for decoder 0.31411004066467285(s)
  8719.6ms [WARNING] Time: model_inference whisper_decoder 313.62

time consumes for decoder 0.31380605697631836(s)
  9034.5ms [WARNING] Time: model_inference whisper_decoder 313.51

time consumes for decoder 0.31369900703430176(s)
  9349.9ms [WARNING] Time: model_inference whisper_decoder 314.01

time consumes for decoder 0.31418848037719727(s)
  9664.7ms [WARNING] Time: model_inference whisper_decoder 313.53

time consumes for decoder 0.3137087821960449(s)
  9979.7ms [WARNING] Time: model_inference whisper_decoder 313.63

time consumes for decoder 0.3138093948364258(s)
Transcription: And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

 10095.0ms [WARNING] Time: model_destroy whisper_decoder 13.25

 <W> Logs will be sent to the system's default channel
     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

     0.0ms [WARNING]  <W> This META does not have Alloc2 Support

/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.37.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc
 10198.2ms [WARNING] Time: model_destroy whisper_encoder 103.03

Recognition Results

Transcription: And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

Install QAI AppBuilder​

Run Example​

Install Dependencies​

Run the Script​

Install QAI AppBuilder

Run Example

Install Dependencies

Run the Script