ChatGLM2 Chatbot-TPU
ChatBot-TPU is an application that utilizes the Sophon SDK to port the open-source ChatGLM2 model from Tsinghua University's KEG Lab to the SG2300X chip series products. This enables hardware-accelerated inference using local TPU. The application is designed as a chatbot using Gradio, allowing users to ask real-life questions.
-
Clone the repository:
git clone https://github.com/zifeng-radxa/chatbot
-
Download the chatglm2 model. This example provides three chatglm2 models: int8-2048, int8-1024, and int4-512.
Assuming we are using the int4-512 model (quantized to int4 with a maximum token length of 512):
# chatglm-int4-512
wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/chatglm-int4-512/tar_downloader.sh
bash tar_downloader.sh
tar -xvf chatglm-int4-512.tar.gz
# chatglem-int8-1024
# wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/chatglm-int8-1024/tar_downloader.sh
# bash tar_downloader.sh
# tar -xvf chatglm-int8-1024.tar.gz
# chatglm-int8-2048
# wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/chatglm-int8-2048/tar_downloader.sh
# bash tar_downloader.sh
# tar -xvf chatglm-int8-2048.tar.gzThe resulting file structure will be as follows:
.
├── chatbot
└── chatglm-int4-512 -
Modify the
config.ini
configuration file according to the selected model:cd chatbot
vim config.ini[llm_model]
libtpuchat_path = ../chatglm-int4-512/libtpuchat.so
bmodel_path = ../chatglm-int4-512/chatglm2-6b_512_int4.bmodel
token_path = ../chatglm-int4-512/tokenizer.modelThe
config.ini
file needs to have the correct model files configured. If you want to switch to other model files, please modify the paths in the configuration file accordingly. -
Set up the environment:
It is necessary to create a virtual environment to avoid potential interference with other applications. For virtual environment usage, please refer to this guide.
python3 -m virtualenv .venv
source .venv/bin/activate -
Install dependencies:
pip3 install --upgrade pip
pip3 install -r requirements.txt -
Set environment variables:
export LD_LIBRARY_PATH=/opt/sophon/libsophon-current/lib:$LD_LIBRARY_PATH
-
Start the web service:
python3 web_demo.py
-
Access the 7860 port of the Airbox IP address in the browser.