Whisper-Tiny Example

This document explains how to use the QAI AppBuilder Python API to perform inference with the Whisper-Tiny speech recognition model using the Qualcomm® Hexagon™ Processor (NPU).

Supported Devices

Device	SoC
Fogwise® AIRbox Q900	QCS9075

Install QAI AppBuilder

tip

Please install QAI AppBuilder according to the QAI AppBuilder Installation Guide.
Please configure the ADSP environment variables according to Configuring ADSP Environment Variables.

Run Example

Install Dependencies

Device

pip3 install requests tqdm qai-hub py3_wget opencv-python torch torchvision matplotlib openai-whisper audio2numpy samplerate transformers qai_hub_models==0.30.2

Run the Script

Navigate to the example directory
- QCS9075
Device
cd ai-engine-direct-helper/samples/python
Prepare the input audio. The following audio is used as an example:

input audio

Execute inference

Device

python3 whisper_tiny_en/whisper_tiny_en.py

$ python3 whisper_tiny_en/whisper_tiny_en.py
0ms [WARNING]  <W> Initializing HtpProvider

/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.37.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:38:dummy call to rpcmem_init, rpcmem APIs will be used from libxdsprpc
0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING] Time: Read model file to memory. 51.68

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

3ms [WARNING] Time: contextCreateFromBinary. 49.16

3ms [WARNING] Time: UnmapViewOfFile. 0.00

7ms [WARNING] Time: model_initialize whisper_decoder 226.65

7ms [WARNING] Time: Read model file to memory. 18.37

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

9ms [WARNING] Time: contextCreateFromBinary. 66.16

0ms [WARNING] Time: UnmapViewOfFile. 0.00

2ms [WARNING] Time: model_initialize whisper_encoder 86.17

2ms [WARNING] Time: model_inference whisper_encoder 199.78

Decoder Inference k_cache_cross type <class 'numpy.ndarray'> shape  (4, 6, 64, 1500) type  float32
Decoder Inference v_cache_cross type <class 'numpy.ndarray'> shape  (4, 6, 1500, 64) type  float32
start decode sample_len  224
7ms [WARNING] Time: model_inference whisper_decoder 159.73

8ms [WARNING] Time: model_inference whisper_decoder 157.66

9ms [WARNING] Time: model_inference whisper_decoder 157.76

3ms [WARNING] Time: model_inference whisper_decoder 157.02

0ms [WARNING] Time: model_inference whisper_decoder 158.39

8ms [WARNING] Time: model_inference whisper_decoder 157.56

7ms [WARNING] Time: model_inference whisper_decoder 157.58

0ms [WARNING] Time: model_inference whisper_decoder 157.04

0ms [WARNING] Time: model_inference whisper_decoder 157.82

6ms [WARNING] Time: model_inference whisper_decoder 157.36

7ms [WARNING] Time: model_inference whisper_decoder 157.76

8ms [WARNING] Time: model_inference whisper_decoder 157.78

7ms [WARNING] Time: model_inference whisper_decoder 157.58

4ms [WARNING] Time: model_inference whisper_decoder 157.39

2ms [WARNING] Time: model_inference whisper_decoder 157.47

9ms [WARNING] Time: model_inference whisper_decoder 157.46

2ms [WARNING] Time: model_inference whisper_decoder 157.10

0ms [WARNING] Time: model_inference whisper_decoder 157.66

0ms [WARNING] Time: model_inference whisper_decoder 157.72

3ms [WARNING] Time: model_inference whisper_decoder 157.92

3ms [WARNING] Time: model_inference whisper_decoder 157.68

3ms [WARNING] Time: model_inference whisper_decoder 157.71

5ms [WARNING] Time: model_inference whisper_decoder 157.89

8ms [WARNING] Time: model_inference whisper_decoder 158.02

9ms [WARNING] Time: model_inference whisper_decoder 157.74

4ms [WARNING] Time: model_inference whisper_decoder 158.21

6ms [WARNING] Time: model_inference whisper_decoder 157.92

1ms [WARNING] Time: model_inference whisper_decoder 158.13

Transcription: And so my fellow Americans ask not what your country can do for you ask what you can do for your country.
0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

7ms [WARNING] Time: model_destroy whisper_decoder 14.94

 <W> Logs will be sent to the system's default channel
0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

0ms [WARNING]  <W> This META does not have Alloc2 Support

/prj/qct/webtech_scratch20/mlg_user_admin/qaisw_source_repo/rel/qairt-2.37.1/point_release/SNPE_SRC/avante-tools/prebuilt/dsp/hexagon-sdk-5.5.5/ipc/fastrpc/rpcmem/src/rpcmem_android.c:42:dummy call to rpcmem_deinit, rpcmem APIs will be used from libxdsprpc
4ms [WARNING] Time: model_destroy whisper_encoder 73.58

Recognition Results

Transcription: And so my fellow Americans ask not what your country can do for you ask what you can do for your country.

Install QAI AppBuilder​

Run Example​

Install Dependencies​

Run the Script​

Install QAI AppBuilder

Run Example

Install Dependencies

Run the Script