Skip to main content

AIMET Quantization Tool

AIMET (AI Model Efficiency Toolkit) is a quantization tool for deep learning models such as PyTorch and ONNX. AIMET enhances the performance of deep learning models by reducing computational load and memory usage.

With AIMET, developers can quickly iterate to find the optimal quantization configuration, achieving the best balance between accuracy and latency. Developers can compile and deploy quantized models exported from AIMET on Qualcomm NPUs using QAIRT, or run them directly with ONNX-Runtime.

AIMET helps developers with:

  • Quantization simulation
  • Model quantization using Post-Training Quantization (PTQ) techniques
  • Quantization-Aware Training (QAT) on PyTorch models using AIMET-Torch
  • Visualizing and experimenting with the impact of activation values and weights on model accuracy at different precisions
  • Creating mixed-precision models
  • Exporting quantized models to deployable ONNX format

AIMET Overview

AIMET System Requirements

  • 64-bit x86 processor
  • Ubuntu 22.04
  • Python 3.10
  • Nvidia GPU
  • Nvidia driver version 455 or higher

AIMET Installation

Create Python Environment

AIMET requires a Python 3.10 environment, which can be created using Anaconda.

tip

After installing Anaconda, create and activate a Python 3.10 environment using the terminal:

X86 Linux PC
conda create -n aimet python=3.10
conda activate aimet

Install AIMET

AIMET provides two Python packages:

  • AIMET-ONNX: Quantizes ONNX models using PTQ technology

    X86 Linux PC
    pip3 install aimet-onnx
  • AIMET-Torch: Perform QAT on PyTorch models

    X86 Linux PC
    pip3 install aimet-torch
  • Install jupyter-notebook

    AIMET examples are provided as jupyter-notebook references. You need to install the jupyter kernel for the aimet Python environment.

    X86 Linux PC
    pip3 install jupyter ipykernel
    python3 -m ipykernel install --user --name=aimet

AIMET Usage Example

This example demonstrates PTQ (Post-Training Quantization) using the PyTorch ResNet50 object detection model, which is first converted to ONNX format and then quantized using AIMET-ONNX. For implementation details, please refer to the ResNet50 example notebook.

tip

The model exported in this example can be used for NPU porting of AIMET quantized models in the QAIRT SDK Usage Example.

Prepare the Example Notebook

Clone the AIMET Repository

X86 Linux PC
git clone https://github.com/quic/aimet.git && cd aimet

Configure PYTHONPATH

X86 Linux PC
export PYTHONPATH=$PYTHONPATH:$(pwd)

Download the Example Notebook

X86 Linux PC
cd Examples/onnx/quantization
wget https://github.com/ZIFENG278/resnet50_qairt_example/raw/refs/heads/main/notebook/quantsim-resnet50.ipynb

Download the Dataset

Prepare a calibration dataset. To reduce download time, we'll use ImageNet-Mini as a substitute for the full ImageNet dataset.

  • Download the ImageNet-Mini dataset from Kaggle

Run the Example Notebook

Start jupyter-notebook

X86 Linux PC
cd aimet
jupyter-notebook
tip

After starting jupyter-notebook, it will automatically open in your default browser. If it doesn't open automatically, click on the URL printed in the terminal.

Change the Kernel

On the jupyter-notebook homepage, select /Examples/onnx/quantization/quantsim-resnet50.ipynb

In the notebook's menu bar at the top left, select Kernel -> Change Kernel -> Select Kernel and choose the aimet kernel created during the AIMET installation.

Change Notebook Kernel

Update Dataset Path

Modify the DATASET_DIR path in the Dataset section to point to your downloaded ImageNet-Mini dataset folder.

DATASET_DIR = '<ImageNet-Mini Path>'  # Please replace this with a real directory

Run the Entire Notebook

In the notebook's menu bar at the top left, select Run -> Run All Cells to execute the entire notebook.

Run All Cells

The exported resnet50 model files will be saved in the aimet_quant folder, with the outputs being resnet50.onnx and resnet50.encodings.

Deploying AIMET Models

AIMET exports models from different frameworks into the specified file formats as shown in the table below:

FrameworkFormat
PyTorch.onnx
ONNX.onnx
TensorFlow.h5 or .pb

You can use the QAIRT tool to deploy the quantized output files from AIMET to edge devices. For the deployment process, please refer to:

Complete Documentation

For more detailed documentation about AIMET, please refer to

More Examples

For more AIMET examples, please refer to: