Skip to main content

Audio Classification

gst-ai-audio-classification performs audio event classification on an audio stream, identifying sound types (e.g., speech, music, ambient noise).

Uses the YAMNet model, with the default configuration using CPU inference.

Prerequisites

Steps

1. Verify Model and Labels

radxa@airbox$
ls -l /etc/models/yamnet.tflite
ls -l /etc/labels/yamnet.json

2. View Configuration

radxa@airbox$
cat /etc/configs/config-audio-classification.json

Key fields:

FieldDefaultDescription
file-path/etc/media/video-mp3.mp4Input audio/video file (MP3 encoded)
model/etc/models/yamnet.tfliteModel file
labels/etc/labels/yamnet.jsonLabel file
threshold10Confidence threshold
codecmp3Audio encoding format
runtimecpuInference hardware

Default uses CPU inference. For DSP inference, change runtime to dsp and add ml-framework: "tflite".

3. Run

radxa@airbox$
gst-ai-audio-classification --config-file=/etc/configs/config-audio-classification.json

Press Ctrl + C to stop.

Expected Output

Terminal output:

Running app with model: /etc/models/yamnet.tflite and labels: /etc/labels/yamnet.json
Pipeline state changed from PAUSED to PLAYING

The display shows the test video with audio classification results overlaid.

Validation

  • Pipeline reaches PLAYING state
  • Terminal continuously outputs audio classification results
  • Display shows classification labels

How It Works

YAMNet is an audio event classification model based on the AudioSet dataset, supporting 521 audio categories. Pipeline flow:

filesrc → qtdemux → (audio decode) → qtimlaudioconverter

qtimltflite (inference)

qtimlaclassification

(classification label overlay)

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0