Starting from scratch, deploy the http server based on the Firefly Chinese conversational large language model developed by yangjianxin

Project Introduction:

Firefly  is an open source Chinese language model project developed by yangjianxin. This article mainly implements deploying this model to an http server. Language implementation: python. This project is part of the back-end code of the mass entrepreneurship and innovation project (I modified it based on the firefly training code +The fine-tuned model is not open source for the time being), the sample model uses the firefly1b4 model instead

Project environment:

1.pytorch:2.0.1+cpu

2.transformers:4.29.1

3.httpserver library

Exception: requests library (not needed if you do not connect to other APIs)

Model download: YeungNLP (YeungNLP) (huggingface.co)

After downloading, create a new model folder and put all the downloaded files into the folder, as shown in the figure below

Open config.json and change the value of torch_dtype to int8, which can effectively reduce lag (especially suitable for the CPU version)

Hardware environment:

Due to the use of the model, in addition to inference, it does not consume a lot of CPU/Gpu. Loading the model consumes memory. Currently, after testing, it is found that in actual operation, 8G can barely run the model, but there is a high probability that the whole machine will freeze. It is recommended to at least reach 12G memory

Project development environment: Cpu: i58400, memory: 16G (it is enough to run the model under this configuration and then run Android Studio + non-Android Studio's own simulator)


Code part:

1.Import package:

print("导入requests库中...")
import requests
print("导入http库中...")
import http.server
print("导入json库中...")
import json
print("导入os库中...")
import os
print("导入time库中...")
import time
print("导入urllib库中...")
import urllib
import random
from urllib import parse
print("导入transformers库中...")
from transformers import BloomTokenizerFast, BloomForCausalLM
print("导包完成=====================")

2.RequestHandlerImpl class part (httpserver) 

class RequestHandlerImpl(http.server.BaseHTTPRequestHandler):

    

    def do_GET(self):
        get_str=""
        get_cmd=self.requestline[5:self.requestline.find("HTTP/1.1")]
        self.send_response(200)
        self.send_header("Content-Type", "text/html; charset=utf-8")
        self.end_headers()
        get_str=checkget(get_cmd,self.headers)
        if get_str=="":get_str= "Hello World\n"
        self.wfile.write(get_str.encode("utf-8"))
        
                         

        

    def do_POST(self):
        req_body = self.rfile.read(int(self.headers["Content-Length"])).decode()
        self.send_response(200)
        self.send_header("Content-Type", "text/html; charset=utf-8")
        self.end_headers()
        get_str=checkpost(self.path,req_body)
        self.wfile.write(get_str.encode("utf-8"))

3. Project function part (because the app backend has access to other interfaces):

def get_answer(text): 
    print("得到新问题",text)
    input_ids = tokenizer(text, return_tensors="pt").input_ids
    input_ids = input_ids.to(device)
    outputs = model.generate(input_ids, max_new_tokens=200, do_sample=True, top_p=0.85, temperature=0.35,repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
    rets = tokenizer.batch_decode(outputs)
    output = rets[0].strip().replace(text, "").replace('</s>', "")
    return format(output)


def get_list(parm): #新闻类接口,可以发布
    parm=parm[1:]
    get_tx=parm.split("&")
    name="福州"
    page="0"

    for i in range(0, len(get_tx)):
        if get_tx[i][0:5]=="name=":
            name=get_tx[i][5:]
            
        if get_tx[i][0:5]=="page=":
            page=get_tx[i][5:].replace(' ', '')

    url = "https://v.api.aa1.cn/api/api-tplist/go.php/api/News/local_news?name=" +  name + "&page=" + page
    print(url)
    response = requests.get(url)
    content = response.text
    return content


def get_top(): #百度热搜接口
    url ='https://v.api.aa1.cn/api/topbaidu/index.php'
    response = requests.get(url)
    content = response.text
    return content


def get_weather(): #天气类接口(付费的)
    url ='http://apis.juhe.cn/simpleWeather/query?city=%E7%A6%8F%E5%B7%9E&key=需要自己加上'
    response = requests.get(url)
    content = response.text
    return content
  
def login(up): #登录接口
    get_tx=up.split("&")
    un=""
    pw=""
    code=0

    for i in range(0, len(get_tx)):
        if get_tx[i][0:5]=="user=":
            un=get_tx[i][5:]
            
        if get_tx[i][0:5]=="pass=":
            pw=get_tx[i][5:].replace(' ', '')
            
    print(un)
    print(pw)

    f=open('libaray/uw', encoding='gbk') #加载type字符库
    for line in f:
        get_tx=line.split(",")
        
        if un==get_tx[0] and pw==get_tx[1].replace('\n', ''):
            dic = {'code': 200, 'msg': "登录成功","token":token}
            break
        else:
            dic = {'code': 201, 'msg': "用户名或密码错误"}

    f.close()
    
    print(dic)
    return json.dumps(dic, sort_keys=True,ensure_ascii= False,indent=4, separators=(',', ':'))


def register(up):  #登录接口
    get_tx=up.split("&")
    uw=""
    pw=""
    code=0
    
    for i in range(0, len(get_tx)): #这里和登录类似,可以封装起来,目的是获取传来的用户,密码
        if get_tx[i][0:5]=="user=":
            un=get_tx[i][5:]
        if get_tx[i][0:5]=="pass=":
            pw=get_tx[i][5:].replace(' ', '')

    print(un)
    print(pw)

    #加载uw密码库,后续可以写成load函数,在加载时候开启
    f=open('libaray/uw', encoding='gbk')
    for line in f:
        get_tx=line.split(",")
        if un==get_tx[0]:
            dic = {'code': 201, 'msg':"用户已存在"}
            return json.dumps(dic, sort_keys=True,ensure_ascii= False,indent=4, separators=(',', ':'))
    f.close()
    f=open('libaray/uw','a+')
    f.write(un+","+pw+"\n")
    dic = {'code': 200, 'msg':"注册成功"}
    f.close()
    return json.dumps(dic, sort_keys=True,ensure_ascii= False,indent=4, separators=(',', ':'))
    
       
def checkpost(path,get_cmd): #查看post进来的数据
    if path=="/login":
        return login(get_cmd)

    if path=="/register":
        return register(get_cmd)
                   
def checkhead(head): #检查需要加密的接口,传进来的头
    print(token == head.get("Authorization"))
    if  token == head.get("Authorization"):
        return True
    else:
        return False


def checkget(get_cmd="",head=""): #查看get进来的数据
    if get_cmd[0:9]=="question=":
        if checkhead(head): 
            dic = {'code': 200, 'msg':get_answer(parse.unquote(get_cmd[9:])),"prompt":urllib.parse.unquote(get_cmd[9:])}
            return json.dumps(dic, sort_keys=True,ensure_ascii= False,indent=4, separators=(',', ':'))  
        else:
            dic = {'code': 401, 'msg':"没有权限"}
            return json.dumps(dic, sort_keys=True,ensure_ascii= False,indent=4, separators=(',', ':'))  
    if get_cmd[0:4]=="list": 
        if checkhead(head):
            return get_list(get_cmd[4:])
        else:
            dic = {'code': 401, 'msg':"没有权限"}
            return json.dumps(dic, sort_keys=True,ensure_ascii= False,indent=4, separators=(',', ':'))
        
    if get_cmd[0:6]=="gettop": 
        if checkhead(head):
            return get_top()
        else:
            dic = {'code': 401, 'msg':"没有权限"}
            return json.dumps(dic, sort_keys=True,ensure_ascii= False,indent=4, separators=(',', ':'))

    if get_cmd[0:7]=="weather": #免费api接口
        return get_weather()
            
    if get_cmd[0:6]=="login?": 
        gcmd=get_cmd[6:]
        return login(gcmd)

main:
 

print("加载tokenizer中")
tokenizer = BloomTokenizerFast.from_pretrained('model/')   #路径以文件夹下的model为例
print("加载model中")
model = BloomForCausalLM.from_pretrained('model/')
model.eval()
device="cpu"
model = model.to(device) #用cuda或者cpu
print("tlc机器人已启动")
token=''.join(random.sample('abcdefghijklmnopqrstuvwxyzABCDEGHIJKLMNOPQRSTWVUXYZ!@#$%&',39))
print("加密为token=" + token) #这句加入是方便测试
local_ip="10.1.136.73" #local ip为服务器ip
server_address = (local_ip, 19999) 
httpd = http.server.HTTPServer(server_address, RequestHandlerImpl)
httpd.serve_forever()

Interface test:

1. Run the code:

After running the code, if the prompt is as shown in the figure below, there is no problem. You can see that there is a parameter of token=xxxx. This parameter is a randomly generated temporary token. The current setting is to generate it once every time the server is started. Here is It is convenient to print out the demonstration. In fact, you need to log in to the interface to obtain it. You can comment it out later.

2. Test whether the interface is available:

Enter http://10.1.136.73:19999/question=<s>Hello</s></s> in postman as shown below

Since the data output to the model is formatted in the form of <s></s>, in order to facilitate the client to pass historical conversations as promat, I did not format the string in python, but implemented it in the client.

The 401 prompt that pops up is that we added a header and failed to pass the verification, but it can prove that the http server can run normally.


 

3. Add header to continue verification:

Add a parameter named Authorization to the head, and run the head again with the parameter value being the temporarily generated token. It is found that it is feasible. As shown in the figure below, it is found that the client returns json normally.

 Code parameter: It is normal when it is 200, otherwise it is abnormal. prompt is the incoming value, and msg is the outgoing value.

Multiple rounds of dialogue:

In the actual test, we found that the Firefly1b4 version can also support multiple rounds of dialogue, but the effect will indeed be worse. We only need to format the data externally into the following form:

<s>Question 1</s></s>Answer 1</s></s>Question 2</s></s>Answer 2</s></s>Question 3</s>< /s>Answer 3</s></s>

The following are effect examples:

The incoming promat is <s>Do you know Beijing</s></s> Beijing is the capital of China and is located in northern China. </s></s>What delicacies are there?</s></s>Roast duck, noodles with soybean paste, bean juice, hotpot mutton, tofu brain, etc. </s></s>What entertainment places are there</s></s>

The output is the Great Wall, the Palace Museum, the Summer Palace, the Temple of Heaven, the Old Summer Palace, etc.

 



Developed by Fuzhou Mechanical and Electrical Engineering Vocational and Technical School wh

Email contact information: [email protected]

QQ contact information: 2151335401, 3135144152

Guess you like

Origin blog.csdn.net/m0_60277871/article/details/131437846