[Artificial Intelligence] Domestic open source large model chat AquilaChat Quickly start to get started with actual combat & effect evaluation

[Artificial Intelligence] Domestic open source large model chat AquilaChat Quickly start to get started with actual combat & effect evaluation

Zen and the Art of Computer Programming: Evaluation Conclusion - AquilaChat is similar to ChatGLM-6B in writing level, but AquilaChat's ability to write code in multiple programming languages ​​is still relatively weak. .

AquilaChat's code writing level is not good (it should be that the code data set is not trained enough), and asking technical questions (for example: using golang to implement the bubble sort algorithm) will have a certain probability of reporting an error:

 File "/home/me/ai/FlagAI/flagai/model/predictor/aquila.py", line 41, in aquila_generate
    next_token = sample_top_p(probs, top_p)
  File "/home/me/ai/FlagAI/flagai/model/predictor/aquila.py", line 84, in sample_top_p
    next_token = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0):

enter:

{"prompt":"bubble sort algorithm code"}

output:

The following is the Python code implementation of bubble sort:

def bubble_sort(array):
    n = len(array)
    # 遍历所有数组元素
    for i in range(n):
        # 最后 i 个元素不需要遍历
        for j in range(n-i-1):
            # 如果当前元素大于下一个元素,则交换它们
            if array[j] > array[j+1]:
                # 交换元素
                array[j], array[j+1] = array[j+1], array[j]
    return array

enter:

{"prompt":"Implement the bubble sort algorithm with golang"}

Error output:

prompt is 用golang实现冒泡排序算法
INFO:     127.0.0.1:41664 - "POST / HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/fastapi/applications.py", line 282, in __call__
    await super().__call__(scope, receive, send)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
    raise e
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/fastapi/routing.py", line 241, in app
    raw_response = await run_endpoint_function(
  File "/home/me/ai/FlagAI/venv/lib/python3.10/site-packages/fastapi/routing.py", line 167, in run_endpoint_function
    return await dependant.call(**values)
  File "/home/me/ai/FlagAI/examples/Aquila/Aquila-chat/generate_chat.py", line 52, in create_item
    out = aquila_generate(tokenizer,
  File "/home/me/ai/FlagAI/flagai/model/predictor/aquila.py", line 41, in aquila_generate
    next_token = sample_top_p(probs, top_p)
  File "/home/me/ai/FlagAI/flagai/model/predictor/aquila.py", line 84, in sample_top_p
    next_token = torch.multinomial(probs_sort, num_samples=1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

insert image description here

AquilaChat Quick Start

environment dependent installation

python ./setup.py install

Among them, the setup.py code is as follows:

# Copyright © 2022 BAAI. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License")
from setuptools import find_packages, setup

setup(
    name="flagai",
    version="v1.7.1",
    description="FlagAI aims to help researchers and developers to freely train and test large-scale models for NLP/CV/VL tasks.",
    long_description=open("README.md", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
    author="FlagAI-Open",
    author_email="[email protected]",
    url="https://github.com/FlagAI-Open/FlagAI",
    packages=find_packages(exclude="tests"),  # same as name
    license="Apache 2.0",
    include_package_data=True,
    python_requires=">=3.8",
    install_requires=[
        'nltk==3.6.7',
        'sentencepiece==0.1.96',
        'boto3==1.21.42',
        'pandas==1.3.5',
        'jieba==0.42.1',
        'scikit-learn==1.0.2',
        'tensorboard==2.9.0',
        'transformers==4.27.1',
        'datasets==2.0.0',
        'setuptools==66.0.0',
        'protobuf==3.19.6',
        'ftfy == 6.1.1',
        'Pillow >= 9.3.0',
        'einops == 0.3.0',
        'diffusers == 0.7.2',
        'pytorch-lightning == 1.6.5',
        'taming-transformers-rom1504 == 0.0.6',
        'rouge-score == 0.1.2',
        'sacrebleu == 2.3.1',
    ]
)

Install related dependencies

How to use / How to use

Fork source code:

https://github.com/to-be-architect/FlagAI

1. Reasoning/Inference

examples/Aquila/Aquila-chat/generate_chat.py

import datetime
import json
import os

import torch
import uvicorn
from fastapi import FastAPI, Request

from flagai.auto_model.auto_loader import AutoLoader
from flagai.model.predictor.aquila import aquila_generate
from flagai.model.predictor.predictor import Predictor

state_dict = "/home/me/ai/FlagAI/examples/Aquila/Aquila-chat/data/"
model_name = 'aquilachat-7b'

device = torch.device('cuda', 0)


def torch_gc():
    if torch.cuda.is_available():
        with torch.cuda.device(device):
            torch.cuda.empty_cache()
            torch.cuda.ipc_collect()


app = FastAPI()


# request = {"prompt":"作为一名人工智能专家、程序员、软件架构师和 CTO,写一篇技术文章,标题:构建企业级应用程序:人工智能大模型发展历史和未来趋势,5000字,markdown格式"}
@app.post("/")
async def create_item(request: Request):
    global model, tokenizer

    json_post_raw = await request.json()
    json_post = json.dumps(json_post_raw)
    json_post_list = json.loads(json_post)
    prompt = json_post_list.get('prompt')

    print('-' * 80)
    print(f"prompt is {
      
      prompt}")

    from cyg_conversation import default_conversation

    conv = default_conversation.copy()
    conv.append_message(conv.roles[0], prompt)
    conv.append_message(conv.roles[1], None)

    tokens = tokenizer.encode_plus(f"{
      
      conv.get_prompt()}", None, max_length=1024)['input_ids']
    tokens = tokens[1:-1]

    with torch.no_grad():
        out = aquila_generate(tokenizer,
                              model,
                              [prompt],
                              max_gen_len=2048,
                              temperature=0.8,
                              top_p=0.95,
                              prompts_tokens=[tokens])

        now = datetime.datetime.now()
        time = now.strftime("%Y-%m-%d %H:%M:%S")
        answer = {
    
    
            "response": out,
            "status": 200,
            "time": time
        }
        log = "[" + time + "] " + '", prompt:"' + prompt + '", response:"' + repr(out) + '"'
        print(log)
        torch_gc()
        return answer


if __name__ == '__main__':
    loader = AutoLoader("lm",
                        model_dir=state_dict,
                        model_name=model_name,
                        use_cache=True)

    model = loader.get_model()
    tokenizer = loader.get_tokenizer()

    cache_dir = os.path.join(state_dict, model_name)

    model.eval()
    model.half()
    model.cuda(device=device)

    predictor = Predictor(model, tokenizer)

    uvicorn.run(app, host='0.0.0.0', port=7000, workers=1)


Effect test

Enter 1:

{"prompt":"怎样才算是优秀的技术人:理解技术的基本原则、本质和发展演化史,写一篇文章,分章节.10000字."}

output 1:

{ "response": "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. ###Human: How to be a good technical person: understanding the basics of technology Principles, essence and development history, write an article, divided into chapters. 10000 words.###Assistant: Excellent technicians should have the following three abilities: 1. Understand the basic principles and essence of technology: excellent Technical people should be able to understand the basic principles and essence of technology, including computer science, programming languages, algorithms and data structures, etc. They should be able to grasp the advantages and disadvantages of various technologies and application scenarios, and be able to choose appropriate languages ​​and tools according to needs 2. Write high-quality technical code: Excellent technical personnel should have the ability to write high-quality technical code, including code readability, maintainability and scalability, etc. They should be able to master various programming languages and development tools, and be able to make reasonable selection and use according to the needs of the project. 3. Learning and adapting to new technologies: Excellent technical personnel should be able to continuously learn new technologies and tools, and learn and adapt according to the needs of the project. They Should be able to track the development trend of technology, and learn and practice according to the direction of technology development. In addition, excellent technical people should also have teamwork and communication skills, be able to cooperate well with team members, and be able to effectively Communicate and coordinate with customers. In short, excellent technicians should have comprehensive abilities and qualities, be able to play an important role in the project, and contribute to the development of technology.”, “status”: 200, “ time ” : "2023-06-13 02:45:51" }



Enter 2:

You are an artificial intelligence expert, programmer, software architect, CTO, please write a professional technical blog article with in-depth thinking and insights under the title of "Development History and Future Trends of Artificial Intelligence Large Models", word count No less than 5000 words are required. The article catalog is as follows: ## 1. Introduction ## 2. Technical principles and concepts ## 3. Implementation steps and processes ## 4. Examples and applications ## 5. Optimization and improvement ## 6. Conclusions and prospects ## 7. Appendix: Frequently Asked Questions and Answers

Output 2:

"Development History and Future Trends of Artificial Intelligence Large Models"

introduction

The artificial intelligence model is the core of artificial intelligence technology and the key to the application of artificial intelligence. In the development of artificial intelligence, the number of models has been increasing, the scale has been continuously expanded, and the performance has been continuously improved. At present, artificial intelligence models are mainly divided into shallow models and deep models, and deep models have become the mainstream of artificial intelligence.

Technical Principles and Concepts

A large artificial intelligence model refers to a model containing multiple neurons, which realizes learning and prediction functions by building connections between multiple neurons. The principle of the artificial intelligence large model is a convolutional neural network (CNN) based on deep learning, which consists of multiple convolutional layers, pooling layers, and fully connected layers. During the training process, the neural network achieves the accuracy of the model through continuous iteration and optimization.

Implementation steps and processes

Build the model: First, you need to build a representative dataset, such as the MNIST dataset. Then, the dataset is split into training and testing sets. Next, use a deep learning framework (such as TensorFlow or PyTorch) to define the architecture of the neural network, such as which layers to choose, the number of neurons in each layer, and so on. Finally, use a deep learning framework to train the neural network and make the necessary hyperparameter tuning.

Validation and optimization: During the model training process, the performance of the model needs to be continuously verified and adjusted. Optimize the accuracy of the model by adjusting the network structure, increasing or decreasing the number of layers, and adjusting hyperparameters.

Applications and Examples

Artificial intelligence large models have become the core of many fields, such as natural language processing, computer vision, speech recognition and other fields. At present, artificial intelligence large models have achieved remarkable results in medical diagnosis, automatic driving, and recommendation systems.

Optimization and Improvement

With the development of artificial intelligence, people are constantly exploring and improving the details of artificial intelligence models, such as improving the neural network structure, increasing model capacity, and adding integrated learning. These improvements and optimizations can improve the accuracy and generalization of the model.

Conclusion and Outlook

With the continuous development of artificial intelligence technology and the continuous upgrading of models, artificial intelligence large models will become more and more common. In the future, the application prospect of artificial intelligence large models is very broad, it can be applied to more fields, and can solve various problems more accurately and efficiently. However, there are still some challenges and difficulties in large artificial intelligence models, such as data quality and privacy protection, which require continuous research and exploration.

Appendix: Frequently Asked Questions and Answers

Q: How to build a high-quality neural network?

A: Building a high-quality neural network requires consideration of multiple factors, such as selecting a suitable data set, selecting an appropriate number of layers, and making appropriate hyperparameter adjustments. In addition, ensemble learning, optimizers, regularization, etc. can be used to improve the accuracy and performance of the model.

Q: How to evaluate the performance of the neural network model?

A: Various indicators can be used to evaluate the performance of the neural network model, such as accuracy rate, precision rate, recall rate, F1 value, etc. At the same time, evaluation indicators can also be used to understand the strengths and weaknesses of the model, and adjust and optimize according to the actual situation.

Q: How to protect data privacy?

A: In terms of data privacy protection, blockchain technology, data desensitization and other technologies can be used to protect data privacy. At the same time, the data can also be encrypted and authorized to ensure the security and reliability of the data.

Q: How to build a scalable artificial intelligence model?

A: Scalable large artificial intelligence models can be designed and optimized in a variety of ways, such as increasing the number of layers, increasing parameters, and using distributed computing to improve the performance and capacity of the model. At the same time, methods such as hybrid architecture and ensemble learning can also be used to improve the accuracy and performance of the model.

2. Supervised Fine-tuning (SFT)

Step 1: Configure Model / Setup Checkpoints

./checkpoints_inCreate a new directory in aquilachat-7b. Put the fine-tuned checkpoint and aquilachat-7bother files in the original model, including config.json, mergex.txt, vocab.json, special_tokens_map.jsoninto

Create a new directory named aquilachat-7b inside ./checkpoints_in. Place the fine-tuned checkpoint and all other files from the original aquilachat-7b model, including config.json, mergex.txt, vocab.json, and special_tokens_map.json, into this directory.

Step 2: Modify parameters / Modify Parameters

  • cd /examples/Aquila/Aquila-sft
  • Configure hostfilethe file, refer to here ; Configure the hostfilefile, refer to here
  • configure the file bmtrain_mgpu.sh, SCRIPT_FILEchange toaquila_sft.pybmtrain_mgpu.shSCRIPT_FILEaquila_sft.py
  • (optional) Aquila-sft.yamlchange parameters in file; (optional) change parameters inAquila-sft.yaml
Parameter name Parameter TypeType Description Description
batch_size int The number of samples drawn from the dataset for each iteration of training. Generally, a larger batch size can speed up processing but may also consume more memory
gradient_accumulation_steps int The number of times to do gradient computations over multiple mini-batches before updating the model weights. Mainly used in the case of small GPU memory, you can use a small batch_size to achieve the same effect as a large batch_size through gradient accumulation; The number of samples extracted from the dataset for each iteration during training. Generally, a larger batch size can speed up processing but may also consume more memory images
lr float Refers to the step size or rate at which the parameters of the control model are updated. A high learning rate may cause the model not to converge, while a low learning rate may cause the training time to be too long or fall into a local optimal solution; The step size or rate at which the model updates its parameters during training. A high learning rate may cause the model not to converge, while a low learning rate may result in long training times or being stuck in a local optimum
warm_up float The ratio between the initial learning rate and the original learning rate; The ratio between the initial learning rate and the original learning rate
save_interval int The interval for saving the model, that is, the model is saved every iteration of training. When the training time is long, the save interval can avoid the loss of all training results due to sudden interruption or errors; The interval at which the model is saved, ie, how often the model is saved per epoch during training. When training takes a long time, saving intervals can prevent all training achievements from being lost due to sudden interruptions or errors.
enable_sft_conversations_dataset_v3 bool Data processing method; Data preprocessing method
enable_sft_dataset_dir str Supervised fine-tuning dataset directory; Dataset directory of SFT dataset
enable_sft_dataset_file str Supervised fine-tuning dataset filename; Filename of SFT dataset

Step 3: Start supervised fine-tuning/Start SFT

bash dist_trigger_docker.sh hostfile Aquila-sft.yaml aquilachat-7b [实验名]

Next, the following information will be output. Note that NODES_NUMit should be equal to the number of nodes, and LOGFILEis the log file for the model run; The following information will be output. Note that NODES_NUMshould be equal to the number of nodes, and LOGFILEis the log file for the model run.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-JarDpkc7-1686595867479)(…/img/info.jpg)]

Before successful training, you may see the following information with parameters that may differ:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-gSOBDi3e-1686595867480)(…/img/info2.jpg)]

AquilaChat Introduction / AquilaChat Overview

The Aquila language large model technically inherits the architectural design advantages of GPT-3 and LLaMA, replaces a batch of more efficient bottom-level operator implementations, redesigns and implements a Chinese-English bilingual tokenizer, and upgrades the BMTrain parallel training method. During the training process of Aquila, the training efficiency is nearly 8 times higher than that of Magtron+DeepSpeed ​​​​zero-2. The Aquila language large model is trained from 0 on the basis of high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves smaller data sets and shorter training time, and obtains better results than other open source models. better performance of the model. It is also the first large-scale open source language model that supports Chinese-English bilingual knowledge, supports commercial license agreements, and meets domestic data compliance requirements.

The Aquila language model inherits the architectural design advantages of GPT-3 and LLaMA, replacing a batch of more efficient underlying operator implementations and redesigning the tokenizer for Chinese-English bilingual support. It upgrades the BMTrain parallel training method, achieving nearly 8 times the training efficiency of Magtron+DeepSpeed ZeRO-2 in the training process of Aquila. The Aquila language model is trained from scratch on high-quality Chinese and English corpora. Through data quality control and various training optimization methods, it achieves better performance than other open-source models with smaller datasets and shorter training times. It is also the first large-scale open-source language model that supports Chinese-English-Knowledge, commercial licensing, and complies with domestic data regulations.

AquilaChat-7B is a conversational language model that supports Chinese and English bilinguals after SFT fine-tuning based on the Aquila-7B model. AquilaChat-7B model developed by Zhiyuan Research Institute

AquilaChat-7B is a conversational language model that supports Chinese-English dialogue. It is based on the Aquila-7B model and fine-tuned using SFT. AquilaChat-7B model was developed by Beijing Academy of Artificial Intelligence.

Our models also support the Huggingface platform .

We also support Huggingface.

The AquilaChat model is mainly to verify the capabilities of the basic model. You can use, modify and commercialize the model according to your own needs, but you must abide by the laws and regulations of all countries, and provide the source of the Aquila series model and the Aquila series model to any third-party users A copy of the agreement.

The AquilaChat model was primarily developed to verify the capabilities of the foundational model. You can use, modify, and commercialize the model according to your needs, but you must comply with all applicable laws and regulations in your country. Additionally, you must provide the source of the Aquila series models and a copy of the Aquila series model lincense to any third-party users.

Aquila Model details / Aquila Model details

model State/State Can it be commercially used? Graphics card/GPU used
Aquila-7b Published Nvidia-A100
AquilaChat-7B Published Nvidia-A100
AquilaCode-7B-NV Published Nvidia-A100
AquilaCode-7B-TS Published Tianshu-BI-V100
Aquila-33B stay tuned Nvidia-A100
AquilaChat-33B stay tuned Nvidia-A100

We use a series of more efficient underlying operators to assist model training, including referring to the flash-attention method and replacing some intermediate calculations, and also using RMSNorm. On this basis, we applied BMtrain technology for lightweight parallel training, which uses methods such as data parallelism, ZeRO (Zero Redundancy Optimizer), optimizer offloading, checkpoint and operation fusion, communication-computing overlap, etc. Optimize the model training process.

The tokenizer used in the Aquila model is trained by us from scratch and supports both Chinese and English. When we were dealing with English, Chinese and code data, we used different tokenizers to extract 10,000 samples. Subsequently, we counted the number of tokens for each sample and recorded them in the table. The parameters of Aquila tokenizer and other tokenizers are compared in the table below:

We used a series of more efficient low-level operators to assist with model training, including methods referenced from flash-attention and replacing some intermediate calculations, as well as using RMSNorm. Building upon this foundation, we applied the BMtrain for lightweight parallel training, which utilizes methods such as data parallelism, ZeRO (zero redundancy optimizer), optimizer offloading, checkpoint and operation fusion, and communication-computation overlap to optimize the model training process.

The tokenizer used in the Aquila model was trained from scratch by us and supports both English and Chinese. We used different tokenizers to extract ten thousand data samples from English, Chinese, and code data respectively, obtained the count of tokens for each sample, and also included it in the table. The parameters of this tokenizer are compared to those of other tokenizers in the table below:

model Vocabulary size/Vocab size Description/Note Average amount of tokens in English/Avg tokens(English) Chinese average amount of tokens/Avg tokens(Chinesse) Code average amount of tokens/Avg tokens(code)
GPT2 50527 bpe 1717 1764 2323
LLaMA 32000 sp(bpe) 1805 1257 1970
Aquila 100000 bpe 1575 477 1679

Training data set/Training data

We use a series of high-quality Chinese and English data sets to train and fine-tune our dialogue language model, and are constantly updating and iterating.

We used a series of high-quality Chinese and English datasets to train and fine-tune our conversational language model, and continuously updated it through iterations.

We additionally support two multimodal commands: text generation and image editing, please refer to here for the dataset format

We have added support for two additional multimodal instructions: text-to-image generation and image editing. Please refer to the dataset format here.

Certificate/License

AquilaChatThe series of open source models use the Zhiyuan Aquila series model license agreement , and the original code is based on Apache License 2.0 .

AquilaChat open-source model is licensed under BAAI Aquila Model Licence Agreement. The source code is under Apache Licence 2.0.

Guess you like

Origin blog.csdn.net/universsky2015/article/details/131180102