Stable Diffusion

Stable Diffusion is a text-to-image model based on latent diffusion. By compressing images into a low-dimensional latent space for denoising training, it reduces the heavy compute requirements of generative AI and makes it possible to generate high-quality, artistic images on consumer-grade GPUs.

Key features: Supports text-to-image generation, image-to-image (understanding and re-rendering), and inpainting. It can generate visually compelling artwork from natural-language prompts.
Version notes: This example uses Stable Diffusion v1.4. As the first industrial-grade mainstream version in the series, it was deeply pre-trained on hundreds of millions of image-text pairs and has strong aesthetic expression and instruction following. It offers an excellent balance between output quality and VRAM usage, and remains one of the most widely supported models in the generative AI ecosystem.

Environment setup

You need to set up the environment in advance.

Quick start

Download model files

O6 / O6N

cd ai_model_hub_25_Q3/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/decoder.cix
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/default_seed.npy
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/encoder.cix
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/uncondition.npy
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/unet.cix

Test the model

info

Activate the virtual environment before running.

O6 / O6N

python3 inference_npu.py

Full conversion workflow

Download model files

Linux PC

cd ai_model_hub_25_Q3/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/model/decoder
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/model/decoder/decoder.onnx
cd ../encoder
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/model/encoder/encoder.onnx
cd ../unet
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/model/unet/unet.onnx
wget https://www.modelscope.cn/models/cix/ai_model_hub_25_Q3/resolve/master/models/Generative_AI/Text_to_Image/onnx_stable_diffusion_v1_4/model/unet/weights.pb

Project structure

├── cfg
├── datasets
├── decoder.cix
├── default_seed.npy
├── encoder.cix
├── inference_npu.py
├── inference_onnx.py
├── model
├── ReadMe.md
├── tokenizer
├── uncondition.npy
└── unet.cix

Quantize and convert the model

Convert the text encoder

Linux PC

cd ../..
cixbuild cfg/encoder/encoderbuild.cfg

Convert the U-Net network

Linux PC

cixbuild cfg/unet/unetbuild.cfg

Convert the VAE decoder

Linux PC

cixbuild cfg/decoder/decoderbuild.cfg

Copy to device

After conversion, copy the .cix model files to the device.

Test inference on the host

Run the inference script

Linux PC

python3 inference_onnx.py

Inference output

Linux PC

$ python3 inference_onnx.py
please input prompt text: majestic crystal mountains under aurora borealis, fantasy landscape, trending on artstation
using unified predictor-corrector with order 1 (solver type: B(h))
using corrector
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
do not run corrector at the last step
using unified predictor-corrector with order 1 (solver type: B(h))
Decoder:
SD time : 56.92895817756653

Generated image

Deploy on NPU

Run the inference script

O6 / O6N

python3 inference_npu.py

Runtime output

O6 / O6N

$ python3 inference_npu.py
please input prompt text: a single wilting rose on a marble table, cinematic lighting, moody atmosphere
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 1.
Output tensor count is 1.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 3.
Output tensor count is 1.
npu: noe_create_job success
npu: noe_clean_job success
using unified predictor-corrector with order 1 (solver type: B(h))
using corrector
npu: noe_create_job success
npu: noe_clean_job success
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
npu: noe_create_job success
npu: noe_clean_job success
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
npu: noe_create_job success
npu: noe_clean_job success
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
npu: noe_create_job success
npu: noe_clean_job success
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
npu: noe_create_job success
npu: noe_clean_job success
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
npu: noe_create_job success
npu: noe_clean_job success
using unified predictor-corrector with order 2 (solver type: B(h))
using corrector
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success
do not run corrector at the last step
using unified predictor-corrector with order 1 (solver type: B(h))
Decoder:
npu: noe_init_context success
npu: noe_load_graph success
Input tensor count is 1.
Output tensor count is 1.
npu: noe_create_job success
npu: noe_clean_job success
npu: noe_unload_graph success
npu: noe_deinit_context success
SD time : 20.26415753364563

Quick start​

Download model files​

Test the model​

Full conversion workflow​

Download model files​

Project structure​

Quantize and convert the model​

Convert the text encoder​

Convert the U-Net network​

Convert the VAE decoder​

Test inference on the host​

Run the inference script​

Inference output​

Generated image​

Deploy on NPU​

Run the inference script​

Runtime output​

Generated image​

Quick start

Download model files

Test the model

Full conversion workflow

Download model files

Project structure

Quantize and convert the model

Convert the text encoder

Convert the U-Net network

Convert the VAE decoder

Test inference on the host

Run the inference script

Inference output

Generated image

Deploy on NPU

Run the inference script

Runtime output

Generated image