ERNIE 4.5-0.3B
本文档讲述如何在瑞莎星睿 O6 / O6N 上使用 llama.cpp 启用 KleidiAI 加速推理百度文心一言 ERNIE-4.5-0.3B 与 ERNIE-4.5-0.3B-Base 模型。
模型地址:
模型下载
radxa 提供预编译好的 ERNIE-4.5-0.3B-PT-Q4_0.gguf
与 ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf 模型,用户可以使用 modelscope 进行下载
- ERNIE-4.5-0.3B-PT
- ERNIE-4.5-0.3B-Base-PT
Device
pip3 install modelscope
modelscope download --model radxa/ERNIE-4.5-GGUF ERNIE-4.5-0.3B-PT-Q4_0.gguf --local_dir ./ERNIE-4.5-0.3B-PT-Q4_0.gguf
Device
pip3 install modelscope
modelscope download --model radxa/ERNIE-4.5-GGUF ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf --local_dir ./ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf
模型转换
提示
如用户对转换 GGUF 模型感兴趣,可以参考本节内容在 X86 主机上进行模型转换,
如不想进行模型转换可以下载 radxa 提供的 GGUF 模型然后跳到 模型推理
编译 llama.cpp
在 X86 主机上编译 llama.cpp
提示
请根据 llama.cpp 在 X86 主机上编译带 llama.cpp
以下为编译命令
X86 PC
sudo apt install cmake gcc g++
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build
cmake --build build --config Release
下载模型
请使用 modelscope 下载源模型
- ERNIE-4.5-0.3B-PT
- ERNIE-4.5-0.3B-Base-PT
X86 PC
pip3 install modelscope
modelscope download --model PaddlePaddle/ERNIE-4.5-0.3B-PT --local_dir ./ERNIE-4.5-0.3B-PT
X86 PC
pip3 install modelscope
modelscope download --model PaddlePaddle/ERNIE-4.5-0.3B-Base-PT --local_dir ./ERNIE-4.5-0.3B-Base-PT
转换为浮点 GGUF 格式模型
- ERNIE-4.5-0.3B-PT
- ERNIE-4.5-0.3B-Base-PT
X86 PC
cd llama.cpp
python3 convert_hf_to_gguf.py ./ERNIE-4.5-0.3B-PT
X86 PC
cd llama.cpp
python3 convert_hf_to_gguf.py ./ERNIE-4.5-0.3B-Base-PT
执行 convert_hf_to_gguf.py 会在源模型目录下生成一个 F16 的浮点 GGUF 模型
量化 GGUF 模型
使用 llama-quantize 工具对浮点 GGUF 模型进行 Q4_0 量化
- ERNIE-4.5-0.3B-PT
- ERNIE-4.5-0.3B-Base-PT
X86 PC
cd llama.cpp
./build/bin/llama-quantize ERNIE-4.5-0.3B-PT/ERNIE-4.5-0.3B-PT-F16.gguf ERNIE-4.5-0.3B-PT/ERNIE-4.5-0.3B-PT-Q4_0.gguf Q4_0
X86 PC
cd llama.cpp
./build/bin/llama-quantize ERNIE-4.5-0.3B-Base-PT/ERNIE-4.5-0.3B-Base-PT-F16.gguf ERNIE-4.5-0.3B-Base-PT/ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf Q4_0
执行 llama-quantize 会在指定目录下生成一个特定量化方式的 GGUF 模型
模型推理
编译 llama.cpp
提示
请根据 llama.cpp 在瑞莎星睿 O6/O6N 上编译带 KleidiAI 特性的 llama.cpp
以下为编译命令
Device
sudo apt install cmake gcc g++
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build -DGGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv9-a+i8mm+dotprod -DGGML_CPU_KLEIDIAI=ON
cmake --build build --config Release
推理模型
这里使用 llama-cli 进行模型对话
- ERNIE-4.5-0.3B-PT
- ERNIE-4.5-0.3B-Base-PT
Device
cd llama.cpp
taskset -c 0,5,6,7,8,9,10,11 ./build/bin/llama-cli -m ERNIE-4.5-0.3B-PT-Q4_0.gguf -c 4096 -t 8 --conversation --jinja
(base) rock@orion-o6:~/baidu/llama.cpp/build/bin$ taskset -c 0,5,6,7,8,9,10,11 ./llama-cli -m ../../../gguf/ERNIE-4.5-0.3B-PT-Q4_0.gguf -c 4096 -t 8 --conversation --jinja
Loading model...
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b7406-4aced7a63
model : ERNIE-4.5-0.3B-PT-Q4_0.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file
> What is relativity?
Relativity is a philosophical and scientific theory that describes how the laws of physics are relative to different reference frames. It's a way of thinking and studying phenomena that treats the motion of objects as a coordinate in a three-dimensional space of spacetime, and it explains how frames of reference can be relative to each other.
[ Prompt: 224.0 t/s | Generation: 45.9 t/s ]
Device
cd llama.cpp
taskset -c 0,5,6,7,8,9,10,11 ./build/bin/llama-cli -m ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf -c 4096 -t 8 --conversation --jinja
(base) rock@orion-o6:~/baidu/llama.cpp/build/bin$ taskset -c 0,5,6,7,8,9,10,11 ./llama-cli -m ../../../gguf/ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf -c 4096 -t 8 --conversation --jinja
Loading model...
▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀
build : b7406-4aced7a63
model : ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read add a text file
> What is relativity?
Relativity is the scientific theory that explains the laws of physics that govern the behavior of matter and energy in the universe. It is a theory that explains the nature of space and time, which has implications for our understanding of the physical world and the laws of nature. Relativity is a fundamental concept in physics that describes the relationship between the speed of light in a vacuum and the speed of light in a medium. It also explains the behavior of objects in general relativity, which deals with the force of gravity and the curvature of space and time in general.
[ Prompt: 365.2 t/s | Generation: 43.3 t/s ]
性能分析
可以使用 llama-bench 工具对模型进行性能分析
- ERNIE-4.5-0.3B-PT
- ERNIE-4.5-0.3B-Base-PT
Device
taskset -c 0,5,6,7,8,9,10,11 ./llama-bench -m ERNIE-4.5-0.3B-PT-Q4_0.gguf -p 128 -n 128 -pg 128,128 -t 8
| Model | ernie4_5 0.3B Q4_0 |
|---|---|
| Size | 219.68 MiB |
| params | 360.75 M |
| backend | CPU |
| threads | 8 |
| n-prompt | n-gen | prefill t/s | generation t/s | prefill+generation t/s |
|---|---|---|---|---|
| 128 | 128 | 393.12 ± 3.11 | 78.56 ± 0.89 | 130.87 ± 1.04 |
| 512 | 512 | 439.33 ± 7.26 | 77.05 ± 0.23 | 116.79 ± 0.43 |
| 1024 | 1024 | 374.82 ± 2.67 | 70.65 ± 0.22 | 90.95 ± 0.35 |
| 2048 | 2048 | 293.03 ± 1.38 | 58.21 ± 0.09 | 66.94 ± 0.10 |
| 4096 | 4096 | 206.78 ± 0.28 | 45.48 ± 0.11 | 44.76 ± 0.03 |
Device
taskset -c 0,5,6,7,8,9,10,11 ./llama-bench -m ERNIE-4.5-0.3B-Base-PT-Q4_0.gguf -p 128 -n 128 -pg 128,128 -t 8
| Model | ernie4_5 0.3B Base Q4_0 |
|---|---|
| Size | 219.68 MiB |
| params | 360.75 M |
| backend | CPU |
| threads | 8 |
| n-prompt | n-gen | prefill t/s | generation t/s | prefill+generation t/s |
|---|---|---|---|---|
| 128 | 128 | 405.01 ± 5.66 | 75.12 ± 0.74 | 126.65 ± 0.96 |
| 512 | 512 | 445.61 ± 6.44 | 73.82 ± 0.22 | 114.13 ± 0.14 |
| 1024 | 1024 | 384.32 ± 1.54 | 68.78 ± 0.27 | 90.95 ± 0.07 |
| 2048 | 2048 | 300.07 ± 1.51 | 57.33 ± 0.06 | 67.82 ± 0.08 |
| 4096 | 4096 | 207.03 ± 0.70 | 44.82 ± 0.13 | 44.59 ± 0.02 |