Practical practice of using a full set of open source tools to build LLM applications: calling Baichuan open source model capabilities in Dify

background

In the current context of the popularity of open source large language models, a large number of developers hope to deploy open source LLM locally for research on LLM or to build their own LLM applications based on open source LLM. The author is also trying to build his own LLM application through a series of related excellent projects in the open source community and through localized deployment services. So what preparations are needed to locally deploy an open source LLM to build a chat application?

Preparing the local environment:

Because we need to deploy a large open source model locally, you need to prepare a fairly hard-core local environment. The hardware requires an NVDIA graphics card with high performance and large video memory, large-capacity high-speed memory, and a large-capacity solid-state drive. The software requires the installation of graphics card drivers, CUDA, and Python environments. This time I chose to run the Baichuan-chat-13B model as an example. My basic configuration is CPU i9-13900K, GTX3090 24GB dual cards, 64GB memory and 2TB solid state drive.

A large language model (LLM):

This is the basis on which we build LLM applications. Different LLMs have different model structures and learned knowledge based on different pre-training data and target tasks. AI applications built based on different models will also perform differently. You can find the open source LLMs you are interested in through the popular AI community Hugging Face to try and compare capabilities.

An inference service that deploys LLM locally: The inference service can load the pre-trained LLM model to the local server and provide a model prediction interface, so that the LLM model can be used locally to perform various NLP tasks without relying on cloud services. You can use some excellent GitHub open source projects, which provide one-click deployment of inference services for popular open source LLM. The ones that are relatively well-known and have many stars include LocalAI, openLLM, etc.

A simple and easy-to-use "LLM operating system" Dify.AI: If you want to build a chat application based on LLM capabilities, you may need to study the full set of LLM technology stacks, such as: API calls of different models, vector database selection, embedding technology research and so on. If you use the open source project Dify.AI, you can save these research and learning tasks and help you quickly create AI applications based on different LLM capabilities through a visual interface. The latest version of Dify has added support for open source LLMs. All models hosted on HuggingFace and Replicate can be quickly called and switched. It also supports local deployment and can build AI applications based on openLLM and Xorbits inference inference services. .

The author will try to use the open source LLMOps platform Dify.AI, the open source inference service Xinference and the open source model baichuan-chat-13B  as examples to teach you step by step how to build an LLM chat application in a windows environment. Without further ado, let’s get straight to work.

Environmental preparation

Basic conda and Python should generally be available. However, this article will introduce the environment configuration from scratch!

Configure python environment

Generally, it is recommended to use conda for python version management. First install conda according to the conda official website documentation [1] . Then use conda to initialize the Python 3.11 environment:

conda create --name python-3-11 python=3.11
conda activate python-3-11

Install CUDA

It is recommended to install directly from the official website [2] . Windows 11 Select the version pictured below.

After installing according to the boot, open "NVDIA Control Panel -> System Information" and see that it is installed.

WSL2 preparation

It is recommended to use WSL2 environment for Dify's docker deployment. So now install WSL2 first. Refer to Microsoft official guidelines [3] .

The first step is to run CMD as administrator:

The second step is to install using the command in CMD:

wsl --install

As a result, we can see the various supported system versions

适用于 Linux 的 Windows 子系统已安装。

以下是可安装的有效分发的列表。
请使用“wsl --install -d <分发>”安装。

NAME                                   FRIENDLY NAME
Ubuntu                                 Ubuntu
Debian                                 Debian GNU/Linux
kali-linux                             Kali Linux Rolling
Ubuntu-18.04                           Ubuntu 18.04 LTS
Ubuntu-20.04                           Ubuntu 20.04 LTS
Ubuntu-22.04                           Ubuntu 22.04 LTS
OracleLinux_7_9                        Oracle Linux 7.9
OracleLinux_8_7                        Oracle Linux 8.7
OracleLinux_9_1                        Oracle Linux 9.1
openSUSE-Leap-15.5                     openSUSE Leap 15.5
SUSE-Linux-Enterprise-Server-15-SP4    SUSE Linux Enterprise Server 15 SP4
SUSE-Linux-Enterprise-15-SP5           SUSE Linux Enterprise 15 SP5
openSUSE-Tumbleweed                    openSUSE Tumbleweed

I installed the default Ubuntu version using select:

wsl --install -d Ubuntu

After that, you can use the "wsl" command in CMD to enter Ubuntu.

Step 3. Install Docker Desktop

Go to Docker official documentation [4] to download "Docker Desktop". When installing, be sure to check the "Use WSL 2 instead of Hyper-V" option. After the installation is complete, restart the computer. Check whether it is installed normally through CMD.

wsl -l --verbose

 NAME                   STATE           VERSION
* Ubuntu                 Running         2
  docker-desktop         Running         2
  docker-desktop-data    Running         2

You can see that Ubuntu and Docker are running in WSL, and they are confirmed to be WSL2 versions.

Step 4: Configure the proxy for WSL

Since the IP address of WSL will change after each restart, we can write a script to solve it. Change line 4 to your own port number.

#!/bin/sh
hostip=$(cat /etc/resolv.conf | grep nameserver | awk '{ print $2 }')
wslip=$(hostname -I | awk '{print $1}')
port=7890

PROXY_HTTP="http://${hostip}:${port}"

set_proxy(){
  export http_proxy="${PROXY_HTTP}"
  export HTTP_PROXY="${PROXY_HTTP}"

  export https_proxy="${PROXY_HTTP}"
  export HTTPS_proxy="${PROXY_HTTP}"

  export ALL_PROXY="${PROXY_SOCKS5}"
  export all_proxy=${PROXY_SOCKS5}

  git config --global http.https://github.com.proxy ${PROXY_HTTP}
  git config --global https.https://github.com.proxy ${PROXY_HTTP}

  echo "Proxy has been opened."
}

unset_proxy(){
  unset http_proxy
  unset HTTP_PROXY
  unset https_proxy
  unset HTTPS_PROXY
  unset ALL_PROXY
  unset all_proxy
  git config --global --unset http.https://github.com.proxy
  git config --global --unset https.https://github.com.proxy

  echo "Proxy has been closed."
}

test_setting(){
  echo "Host IP:" ${hostip}
  echo "WSL IP:" ${wslip}
  echo "Try to connect to Google..."
  resp=$(curl -I -s --connect-timeout 5 -m 5 -w "%{http_code}" -o /dev/null www.google.com)
  if [ ${resp} = 200 ]; then
    echo "Proxy setup succeeded!"
  else
    echo "Proxy setup failed!"
  fi
}

if [ "$1" = "set" ]
then
  set_proxy

elif [ "$1" = "unset" ]
then
  unset_proxy

elif [ "$1" = "test" ]
then
  test_setting
else
  echo "Unsupported arguments."
fi

The fifth step is to enter Ubuntu, install conda and configure python

As with the previous environment preparation, follow the official documentation to install conda and configure python, but install the Linux version.

Step 6, install CUDA for WSL

Go to the official website, select the WSL-Ubuntu version, and follow the instructions to install using the command line.

Step 7, install PyTorch

Enter the PyTorch official website [5] and install PyTorch according to the environment.

This completes the environment preparation.

Deploy the inference service Xinference

According to Dify's deployment documentation [6] , Xinference supports quite a few models. This time, let’s choose Xinference and try baichuan-chat-3B.

Xorbits inference is a powerful and versatile distributed inference framework designed to serve large language models, speech recognition models, and multi-modal models, even on a laptop. It supports a variety of GGML-compatible models, such as ChatGLM, Baichuan, Whisper, Vicuna, Orca, etc. Dify supports local deployment to access the large language model reasoning and embedding capabilities deployed by Xinference.

Install Xinfernece

Execute the following command in WSL:

$ pip install "xinference"

The above command will install Xinference's basic dependencies for inference. Xinference also supports "ggml inference" and "PyTorch inference". You need to install the following dependencies:

$ pip install "xinference[ggml]"
$ pip install "xinference[pytorch]"
$ pip install "xinference[all]"

Start Xinference and download and deploy the baichuan-chat-3B model

Execute the following command in WSL:

$ xinference -H 0.0.0.0

Xinference will start a worker locally by default, and the endpoint is:

http://127.0.0.1:9997  ", the default port is "9997". By default, it can only be accessed by the local computer. If "-H 0.0.0.0" is configured, non-local clients can access it at will. If you need to further modify "host" or "port", you can view the help information of xinference: "xinference --help".

2023-08-25 18:08:31,204 xinference   27505 INFO     Xinference successfully started. Endpoint: http://0.0.0.0:9997
2023-08-25 18:08:31,204 xinference.core.supervisor 27505 INFO     Worker 0.0.0.0:53860 has been added successfully
2023-08-25 18:08:31,205 xinference.deploy.worker 27505 INFO     Xinference worker successfully started.

Open in the browser: http://localhost:9997, select baichuan-chat, pytorch, 13B, 4bit, click create to deploy.

Or deploy using CLI:

xinference launch --model-name baichuan-chat --model-format pytorch --size-in-billions 13 --quantization 4

Since different models have different compatibility on different hardware platforms, please check the Xinference built-in model [7] to determine whether the created model supports the current hardware platform.

Use Xinference to manage models

To view all deployed models, on the command line, execute the following command:

$ xinference list

Information similar to the following will be displayed:

UID                                   Type    Name           Format      Size (in billions)  Quantization
------------------------------------  ------  -------------  --------  --------------------  --------------
0db1e250-4330-11ee-b9ef-00155da30d2d  LLM     baichuan-chat  pytorch                     13  4-bit

"0db1e250-4330-11ee-b9ef-00155da30d2d" is the uid of the model just deployed.

Deploy Dify.AI

For the main process, please refer to the official website deployment document [8] .

Clone Dify

Clone Dify source code to local

git clone https://github.com/langgenius/dify.git

Start Dify

Enter the docker directory of the differentiate source code and execute the one-click startup command:

cd dify/docker
docker compose up -d

Deployment results:

[+] Running 7/7
 ✔ Container docker-weaviate-1  Running                                                0.0s 
 ✔ Container docker-web-1       Running                                                0.0s 
 ✔ Container docker-redis-1     Running                                                0.0s 
 ✔ Container docker-db-1        Running                                                0.0s 
 ✔ Container docker-worker-1    Running                                                0.0s 
 ✔ Container docker-api-1       Running                                                0.0s 
 ✔ Container docker-nginx-1     Started                                                0.9s

Finally check if all containers are running properly:

docker compose ps

Operating status:

NAME                IMAGE                              COMMAND                  SERVICE             CREATED             STATUS              PORTS
docker-api-1        langgenius/dify-api:0.3.16         "/bin/bash /entrypoi…"   api                 24 hours ago        Up 3 hours          5001/tcp
docker-db-1         postgres:15-alpine                 "docker-entrypoint.s…"   db                  33 hours ago        Up 3 hours          0.0.0.0:5432->5432/tcp
docker-nginx-1      nginx:latest                       "/docker-entrypoint.…"   nginx               24 hours ago        Up 4 minutes        0.0.0.0:80->80/tcp
docker-redis-1      redis:6-alpine                     "docker-entrypoint.s…"   redis               33 hours ago        Up 3 hours          6379/tcp
docker-weaviate-1   semitechnologies/weaviate:1.18.4   "/bin/weaviate --hos…"   weaviate            33 hours ago        Up 3 hours          
docker-web-1        langgenius/dify-web:0.3.16         "/bin/sh ./entrypoin…"   web                 33 hours ago        Up 3 hours          3000/tcp
docker-worker-1     langgenius/dify-api:0.3.16         "/bin/bash /entrypoi…"   worker              33 hours ago        Up 3 hours          5001/tcp包括 3 个业务服务「 api / worker / web 」,以及 4 个基础组件「 weaviate / db / redis / nginx 」。

Includes 3 business services "api/worker/web" and 4 basic components "weaviate/db/redis/nginx".

After Docker starts successfully, visit: http://127.0.0.1/ in the browser. After setting a password and logging in, you will enter the application list page.

At this point, Dify Community Edition has been successfully deployed using Docker.

Connect to Xinference at Dify

Configure model provider

Fill in the model information in "Settings > Model Supplier > Xinference":

  • Model Name is the name you give your model deployment.
  • Server URL is the end point address of xinference.
  • Model UID is the UID of the deployed model obtained through xinference list

It should be noted that Sever Url cannot use localhost. Because if you fill in localhost, you are accessing the localhost in docker, which will cause the access to fail. The solution is to change the Sever Url to the LAN IP. In a WSL environment, you need to use the WSL IP address.

Use the command in WSL to get:

hostname -I
172.31.157.121

Use baichuan-chat

After creating an application, you can use the baichuan-chat-3B model configured in the previous step in the application. In Dify's prompt word arrangement interface, select the baichuan-chat model, design your application prompt word (prompt), and then publish an accessible AI application.

The above is the whole process of locally deploying Dify and connecting to baichuan-chat deployed by Xinference. At this point, our chat application based on baichuan-chat-13B is basically completed.

postscript

Of course, for a production-level LLM application, it is not enough to just complete the access, inference, and chat interaction of the large model. We also need to specifically tune LLM's prompts, add private data as context, or fine-tune LLM itself. This requires long-term iteration and optimization to make LLM application performance better and better. As a middleware tool platform, Dify.AI provides a visual operating system for a complete LLM App technology stack. After the above basic service deployment is completed, subsequent application iterations and improvements can be completed based on Dify, making the construction and management of LLM applications simpler and easier to use. The cleaning can be automatically completed by directly uploading the business data. After processing, data annotation and improvement services will also be provided in the future, and even your business team can participate in the collaboration.

At present, the development and application of LLM are still in a very early stage. I believe that in the near future, whether it is the release of LLM capabilities or the continuous improvement of the capabilities of various tools based on LLM, the threshold for developers to explore LLM capabilities will continue to be lowered. Let more AI applications with rich scenarios emerge.


If you like Dify, welcome:

  • Contribute code on GitHub and build a better Dify with us;
  • Share Dify and your experience with your friends through online and offline activities and social media;
  • Give us a shoutout on GitHub ⭐️

You can also contact the Dify assistant and join our friend group chat to share experience with each other:

Guess you like

Origin blog.csdn.net/DifyAI/article/details/132622617