Skip to main content

ERNIE 4.5-21B-A3B

This document explains how to run Baidu ERNIE models on the Radxa Orion O6 / O6N using llama.cpp with KleidiAI acceleration: ERNIE-4.5-21B-A3B and ERNIE-4.5-21B-A3B-Base.

Model links:

Download the model

Radxa provides pre-built GGUF files: ERNIE-4.5-21B-A3B-PT-Q4_0.gguf and ERNIE-4.5-21B-A3B-Base-PT-Q4_0.gguf. You can download them with modelscope:

Device
pip3 install modelscope
modelscope download --model radxa/ERNIE-4.5-GGUF ERNIE-4.5-21B-A3B-PT-Q4_0.gguf --local_dir ./ERNIE-4.5-21B-A3B-PT-Q4_0.gguf

Convert the model (optional)

tip

If you want to convert the model to GGUF yourself, follow this section on an x86 host.

Otherwise, download the pre-built GGUF from Radxa and skip to Inference.

Build llama.cpp

Build llama.cpp on an x86 host.

tip

Follow llama.cpp to build llama.cpp on an x86 host.

Build command:

X86 PC
sudo apt install cmake gcc g++
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build
cmake --build build --config Release

Download the source model

Use modelscope to download the original model:

X86 PC
pip3 install modelscope
modelscope download --model PaddlePaddle/ERNIE-4.5-21B-A3B-PT --local_dir ./ERNIE-4.5-21B-A3B-PT

Convert to a float (F16) GGUF

X86 PC
cd llama.cpp
python3 convert_hf_to_gguf.py ./ERNIE-4.5-21B-A3B-PT

Running convert_hf_to_gguf.py generates an F16 (float) GGUF file in the model directory.

Quantize the GGUF

Use llama-quantize to quantize the float GGUF to Q4_0:

X86 PC
cd llama.cpp
./build/bin/llama-quantize ERNIE-4.5-21B-A3B-PT/ERNIE-4.5-21B-A3B-PT-F16.gguf ERNIE-4.5-21B-A3B-PT/ERNIE-4.5-21B-A3B-PT-Q4_0.gguf Q4_0

Running llama-quantize generates a GGUF file with the selected quantization format in the target path.

Inference

Build llama.cpp

tip

Follow llama.cpp to build llama.cpp with KleidiAI enabled on the Radxa Orion O6 / O6N.

Build command:

Device
sudo apt install cmake gcc g++
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv9-a+i8mm+dotprod -DGGML_CPU_KLEIDIAI=ON
cmake --build build --config Release

Run inference

Use llama-cli to chat with the model:

Device
cd llama.cpp
taskset -c 0,5,6,7,8,9,10,11 ./build/bin/llama-cli -m ERNIE-4.5-21B-A3B-PT-Q4_0.gguf -c 4096 -t 8 --conversation --jinja
(base) rock@orion-o6:~/baidu/llama.cpp/build/bin$ taskset -c 0,5,6,7,8,9,10,11 ./llama-cli -m ../../../gguf/ERNIE-4.5-21B-A3B-PT-Q4_0.gguf -c 4096 -t 8 --conversation --jinja

Loading model...


▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀

build : b7406-4aced7a63
model : ERNIE-4.5-21B-A3B-PT-Q4_0.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file


> What is relativity?

**Relativity** is a foundational theory in physics developed by Albert Einstein, primarily consisting of two parts: **special relativity** (1905) and **general relativity** (1915). It revolutionized our understanding of space, time, and gravity, challenging classical Newtonian physics.

### **1. Special Relativity**
- **Key Idea**: Physics laws are the same for all non-accelerating observers, regardless of their motion.
- **Postulates**:
1. **Principle of Relativity**: Physical laws are identical in all inertial frames.
2. **Speed of Light**: The speed of light in a vacuum (*c* ≈ 299,792 km/s) is constant and does not depend on the motion of the light source or observer.
- **Consequences**:
- **Time Dilation**: Time slows down for objects moving at relativistic speeds (close to *c*). For example, a clock on a fast-moving train ticks slower than one on Earth.
- **Length Contraction**: Objects appear shorter along the direction of motion when moving at high speeds.
- **Mass-Energy Equivalence**: *E = mc²*—energy (*E*) and mass (*m*) are interchangeable, explaining nuclear reactions.
- **Relativistic Momentum**: Momentum depends on velocity, not just speed.

### **2. General Relativity**
- **Key Idea**: Gravity is not a force but the curvature of spacetime caused by mass and energy.
- **Postulates**:
- **Equivalence Principle**: A local inertial frame (free-falling) is indistinguishable from one without gravity.
- **Spacetime Curvature**: Massive objects like planets warp spacetime, causing objects to follow curved paths (e.g., orbits).
- **Consequences**:
- **Gravitational Time Dilation**: Clocks run slower in stronger gravitational fields (e.g., near Earth’s surface vs. orbit).
- **Light Bending**: Light curves around massive objects due to spacetime curvature (confirmed by Eddington’s 1919 eclipse experiment).
- **Black Holes**: Extreme curvature traps light and matter, creating regions where nothing escapes.
- **Expanding Universe**: General relativity explains the universe’s expansion, leading to the Big Bang theory.

### **Applications and Impact**
- **Technology**: GPS systems rely on corrections for both special relativity (time dilation) and general relativity (gravity’s effect on time).
- **Cosmology**: Predicts black holes, neutron stars, and the universe’s evolution.
- **Fundamental Physics**: Unifies with quantum mechanics in attempts to explain the universe’s origin (e.g., string theory, loop quantum gravity).

### **Why It Matters**
Relativity reshaped modern physics by showing that space, time, and gravity are interconnected. It replaced Newton’s absolute space and time with a dynamic, relative framework, providing a more accurate description of the cosmos at both microscopic and cosmic scales.

In short, relativity is the science of how space, time, and energy influence each other, reshaping our understanding of reality.

[ Prompt: 18.6 t/s | Generation: 7.0 t/s ]

Performance benchmarking

You can use llama-bench to benchmark the model.

Device
taskset -c 0,5,6,7,8,9,10,11 ./llama-bench -m ERNIE-4.5-21B-A3B-PT-Q4_0.gguf -p 128 -n 128 -pg 128,128 -t 8
Modelernie4_5-moe 21B.A3B Q4_0
Size11.51 GiB
params21.83 B
backendCPU
threads8
n-promptn-genprefill t/sgeneration t/sprefill+generation
12812833.96 ± 0.359.75 ± 0.0115.15 ± 0.04
51251236.30 ± 0.119.67 ± 0.0214.69 ± 0.01
1024102435.25 ± 0.049.38 ± 0.0113.76 ± 0.01
2048204833.59 ± 0.068.89 ± 0.0112.28 ± 0.01
4096409630.79 ± 0.028.15 ± 0.0210.21 ± 0.02

    You need to be logged into GitHub to post a comment. If you are already logged in, please ignore this message.

    Radxa-docs © 2026 by Radxa Computer (Shenzhen) Co.,Ltd. is licensed under CC BY 4.0