ChatGLM2 Chatdoc-TPU

ChatDoc-TPU is an application that ports the open-source ChatGLM2 model from Tsinghua University's KEG Lab to SG2300X chip series products using the Sophon SDK to achieve hardware-accelerated inference with local TPU. It is designed as an easy-to-use file chatbot using Streamlit for user interaction.

Installation Requirements

Before installing ChatDoc-TPU, please use the memory_edit tool to modify the memory allocation of the current device. TPU memory requires 12GB. Refer to Memory Allocation Modification Tool for usage.

Recommended: NPU 7168 MB, VPU 2048 MB, VPP 3072 MB

Clone the repository:

git clone https://github.com/zifeng-radxa/chatdoc

Download the ChatDoc embedding file and chatglm2-int8-2048 bmodel:

wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/chatglm-int8-2048/tar_downloader.sh
bash tar_downloader.sh
tar -xvf chatglm-int8-2048.tar.gz

cd chatdoc
# TPU version
wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/embedding/embedding_tpu.zip
# CPU version
# wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/embedding/embedding.zip

unzip embedding_tpu.zip
# CPU version
# unzip embedding.zip

The resulting file structure will be as follows:

.
├── chatdoc
│   ├── data
│   │   ├── db
│   │   └── uploaded
│   ├── embedding
│   │   └── 1_Pooling
│   ├── embedding_tpu
│   │   ├── __pycache__
│   │   └── text2vec
│   │       ├── __pycache__
│   │       ├── model_file
│   │       ├── tokenizer_cache
│   │       └── utils
│   │           └── __pycache__
│   └── static
└── chatglm-int8-2048

Create a virtual environment:

It is necessary to create a virtual environment to avoid potential interference with other applications. For virtual environment usage, please refer to this guide.
```
python3 -m virtualenv .venv
source .venv/bin/activate
```

Install dependencies:

pip3 install --upgrade pip
pip3 install -r requirements.txt
pip3 install https://github.com/radxa-edge/TPU-Edge-AI/releases/download/v0.1.0/tpu_perf-1.2.31-py3-none-manylinux2014_aarch64.whl

Start the web service:
- (Recommended) Start in TPU embedding mode (occupies more TPU memory):
```
bash run_emb_tpu.sh
```
- Start in CPU embedding mode (occupies more system memory):
```
bash run.sh
```
Access the 8501 port of the Airbox IP address in the browser.

Application Display​

Application Display