Audio Classification
gst-ai-audio-classification performs audio event classification on an audio stream, identifying sound types (e.g., speech, music, ambient noise).
Uses the YAMNet model, with the default configuration using CPU inference.
Prerequisites
- Completed QIM SDK Installation and Model Download
Steps
1. Verify Model and Labels
ls -l /etc/models/yamnet.tflite
ls -l /etc/labels/yamnet.json
2. View Configuration
cat /etc/configs/config-audio-classification.json
Key fields:
| Field | Default | Description |
|---|---|---|
file-path | /etc/media/video-mp3.mp4 | Input audio/video file (MP3 encoded) |
model | /etc/models/yamnet.tflite | Model file |
labels | /etc/labels/yamnet.json | Label file |
threshold | 10 | Confidence threshold |
codec | mp3 | Audio encoding format |
runtime | cpu | Inference hardware |
Default uses CPU inference. For DSP inference, change
runtimetodspand addml-framework: "tflite".
3. Run
gst-ai-audio-classification --config-file=/etc/configs/config-audio-classification.json
Press Ctrl + C to stop.
Expected Output
Terminal output:
Running app with model: /etc/models/yamnet.tflite and labels: /etc/labels/yamnet.json
Pipeline state changed from PAUSED to PLAYING
The display shows the test video with audio classification results overlaid.
Validation
- Pipeline reaches
PLAYINGstate - Terminal continuously outputs audio classification results
- Display shows classification labels
How It Works
YAMNet is an audio event classification model based on the AudioSet dataset, supporting 521 audio categories. Pipeline flow:
filesrc → qtdemux → (audio decode) → qtimlaudioconverter
↓
qtimltflite (inference)
↓
qtimlaclassification
↓
(classification label overlay)