ChatGLM2 Chatdoc-TPU
ChatDoc-TPU is an application that ports the open-source ChatGLM2 model from Tsinghua University's KEG Lab to SG2300X chip series products using the Sophon SDK to achieve hardware-accelerated inference with local TPU. It is designed as an easy-to-use file chatbot using Streamlit for user interaction.
-
Installation Requirements
Before installing ChatDoc-TPU, please use the memory_edit tool to modify the memory allocation of the current device. TPU memory requires 12GB. Refer to Memory Allocation Modification Tool for usage.
Recommended: NPU 7168 MB, VPU 2048 MB, VPP 3072 MB
-
Clone the repository:
git clone https://github.com/zifeng-radxa/chatdoc
-
Download the ChatDoc embedding file and chatglm2-int8-2048 bmodel:
wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/chatglm-int8-2048/tar_downloader.sh
bash tar_downloader.sh
tar -xvf chatglm-int8-2048.tar.gzcd chatdoc
# TPU version
wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/embedding/embedding_tpu.zip
# CPU version
# wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/embedding/embedding.zip
unzip embedding_tpu.zip
# CPU version
# unzip embedding.zip -
The resulting file structure will be as follows:
.
├── chatdoc
│ ├── data
│ │ ├── db
│ │ └── uploaded
│ ├── embedding
│ │ └── 1_Pooling
│ ├── embedding_tpu
│ │ ├── __pycache__
│ │ └── text2vec
│ │ ├── __pycache__
│ │ ├── model_file
│ │ ├── tokenizer_cache
│ │ └── utils
│ │ └── __pycache__
│ └── static
└── chatglm-int8-2048 -
Create a virtual environment:
It is necessary to create a virtual environment to avoid potential interference with other applications. For virtual environment usage, please refer to this guide.
python3 -m virtualenv .venv
source .venv/bin/activate -
Install dependencies:
pip3 install --upgrade pip
pip3 install -r requirements.txt
pip3 install https://github.com/radxa-edge/TPU-Edge-AI/releases/download/v0.1.0/tpu_perf-1.2.31-py3-none-manylinux2014_aarch64.whl -
Start the web service:
-
(Recommended) Start in TPU embedding mode (occupies more TPU memory):
bash run_emb_tpu.sh
-
Start in CPU embedding mode (occupies more system memory):
bash run.sh
-
-
Access the 8501 port of the Airbox IP address in the browser.