Fast-SCNN

Fast-SCNN is a lightweight convolutional neural network designed for real-time semantic segmentation on high-resolution images. It adopts an innovative multi-branch architecture. By sharing feature extraction modules and using a lightweight design, it alleviates the heavy compute pressure of traditional segmentation models when processing large images.

Key features: Focuses on pixel-level real-time semantic segmentation, enabling low-latency class labeling for complex scenes. It is widely used in areas with strict responsiveness requirements such as autonomous driving, mobile AR, and robot obstacle avoidance.
Version notes: This example uses Fast-SCNN. With a unique “learning to downsample” module combined with global feature extraction, it greatly improves inference efficiency without sacrificing key spatial details. It reduces reliance on high-end GPUs and is a common lightweight choice for high-resolution real-time image understanding on embedded devices.

Environment setup

You need to set up the environment in advance.

Quick start

Download model files

O6 / O6N

cd ai_model_hub_25_Q3/models/ComputeVision/Semantic_Segmentation/torch_fast_scnn
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/ComputeVision/Semantic_Segmentation/torch_fast_scnn/fast_scnn.cix

Test the model

info

Activate the virtual environment before running.

O6 / O6N

python3 inference_npu.py

Full conversion workflow

Project structure

├── cfg
├── datasets
├── fast_scnn.cix
├── inference_npu.py
├── inference_pt.py
├── model
├── ReadMe.md
└── test_data

Quantize and convert the model

Linux PC

cd ..
cixbuild cfg/fast_scnnbuild.cfg

Copy to device

After conversion, copy the .cix model files to the device.

Test inference on the host

Run the inference script

Linux PC

python3 inference_pt.py

Inference output

Deploy on NPU

Run the inference script

O6 / O6N

python3 inference_npu.py

Inference output

O6 / O6N

$ python inference_npu.py
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 1.
Output tensor count is 1.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success

Quick start​

Download model files​

Test the model​

Full conversion workflow​

Project structure​

Quantize and convert the model​

Test inference on the host​

Run the inference script​

Inference output​

Deploy on NPU​

Run the inference script​

Inference output​

Quick start

Download model files

Test the model

Full conversion workflow

Project structure

Quantize and convert the model

Test inference on the host

Run the inference script

Inference output

Deploy on NPU

Run the inference script

Inference output