Skip to main content

Monocular Depth Estimation

gst-ai-monodepth performs monocular depth estimation on each frame of a video stream, generating a depth map rendered as a heatmap overlay. Warm colors (red/orange) indicate closer distances; cool colors (blue) indicate farther distances.

Uses the MiDaS V2 model.

Prerequisites

Steps

1. Verify Model and Labels

radxa@airbox$
ls -l /etc/models/midas_quantized.tflite
ls -l /etc/labels/monodepth.json

2. View Configuration

radxa@airbox$
cat /etc/configs/config_monodepth.json

Key fields:

FieldDefaultDescription
file-path/etc/media/video.mp4Input video path
ml-frameworktfliteInference framework
model/etc/models/midas_quantized.tfliteModel file
labels/etc/labels/monodepth.jsonColor mapping file
runtimedspInference hardware

3. Run

radxa@airbox$
gst-ai-monodepth --config-file=/etc/configs/config_monodepth.json

Press Ctrl + C to stop.

Expected Output

Terminal output:

Running app with model: /etc/models/midas_quantized.tflite and labels: /etc/labels/monodepth.json
Using DSP Delegate
VERBOSE: Replacing 140 out of 140 node(s) with delegate (TfLiteQnnDelegate) node, yielding 1 partitions for the whole graph.
Pipeline state changed from PAUSED to PLAYING

The display shows the test video overlaid with a depth heatmap. Warm colors indicate nearby objects; cool colors indicate distant background.

Validation

  • Using DSP Delegate: Inference running on NPU
  • Replacing 140 out of 140 node(s): All 140 operators delegated to DSP
  • Pipeline reaches PLAYING state
  • Display correctly shows depth heatmap

How It Works

MiDaS (Monocular Depth Estimation) takes a single RGB image as input and outputs relative depth values for each pixel. The GStreamer pipeline:

filesrc → qtdemux → h264parse → v4l2h264dec
↓ ↓
(tee split) qtimlvconverter (preprocess)

qtimltflite (DSP inference)

post-process (depth → heatmap)

qtivcomposer

waylandsink

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0