[AI Combat] Summary of Open Source Large Language Model LLMs

big language model

Large language model (LLM) refers to a deep learning model trained with a large amount of text data, which can generate natural language text or understand the meaning of language text. Large language models can handle a variety of natural language tasks, such as text classification, question answering, dialogue, etc., and are an important way to artificial intelligence. From Baidu Encyclopedia

  • Development History

    In September 2020, OpenAI authorized Microsoft to use the GPT-3 model, and Microsoft became the first company in the world to enjoy the capabilities of GPT-3. In 2022, Open AI released the ChatGPT model for generating natural language text. On March 15, 2023, Open AI released the multi-modal pre-training large model GPT4.0.

    In February 2023, Google announced the chatbot Bard at the press conference, which is driven by Google's large language model LaMDA. On March 22, 2023, Google opened the public beta of Bard, which was first launched in the United States and the United Kingdom, and will gradually be launched in other regions in the future.

    On February 7, 2023, Baidu officially announced that it will launch Wenxin Yiyan, which will be officially launched on March 16. The underlying technology foundation of Wenxin Yiyan is the Wenxin large model, and the underlying logic is to provide services through Baidu Smart Cloud to attract enterprises and institutional customers to use APIs and infrastructure, jointly build AI models, develop applications, and realize industrial AI inclusiveness.

Open source big language model

This article lists the large language models that will be open sourced as of June 8, 2023

1、LLaMA

  • Introduction
    meta open source LLaMA
    LLaMA is completely trained on public open source pre-training data. And achieved quite good results, LaMA-13B surpassed GPT-3 (175 B) on most of the benchmarks, and the effect of LLaMA-65B can be compared with the best large model, Chinchilla-70B and PaLM-540B .
    Meta announced that it will open source LLaMA.

  • Paper and Code
    Paper: https://arxiv.org/abs/2302.13971v1
    Code: https://github.com/facebookresearch/llama

2、ChatGLM - 6B

  • Introduction
    ChatGLM-6B is an open-source, Chinese-English bilingual conversational language model based on the General Language Model (GLM) architecture with 6.2 billion parameters. Combined with model quantization technology, users can deploy locally on consumer-grade graphics cards (only 6GB of video memory is required at the INT4 quantization level). ChatGLM-6B uses technology similar to ChatGPT, optimized for Chinese Q&A and dialogue. After about 1T identifiers of Chinese-English bilingual training, supplemented by supervision and fine-tuning, feedback self-help, human feedback reinforcement learning and other technologies, ChatGLM-6B with 6.2 billion parameters has been able to generate answers that are quite in line with human preferences.

  • Thesis and Code
    Paper:
    Code: https://github.com/THUDM/ChatGLM-6B
    Official Website: https://chatglm.cn/blog

  • hardware requirements
    insert image description here

  • Open source agreement
    The code of this warehouse is open source according to the Apache-2.0 agreement, and the use of the weight of the ChatGLM-6B model needs to follow the Model License.

[Personal opinion] ChatGLM-6B is currently the leader of the open source Chinese language model.

3、Alpaca

  • Introduction

  • Papers and Code

4、PandaLLM

  • Introduction

    Panda: Overseas Chinese open source big language model

    The Panda series of language models are currently based on Llama-7B, -13B, -33B, and -65B for continuous pre-training in the Chinese field, using nearly 15M pieces of data, and evaluating the reasoning ability on the Chinese benchmark. I hope it can be used for Chinese The field of natural language processing provides general-purpose basic tools.

    Our Panda model and the Chinese dataset involved in training will be released as open source, and anyone can use it and participate in the development for free. We welcome developers from all over the world to participate in this project and jointly promote the development of Chinese natural language processing technology. In the future, we will further improve the evaluation of the basic capabilities of the Chinese language model, and open a larger model at the same time.

  • Paper and Code
    Paper: https://arxiv.org/pdf/2305.03025v1.pdf
    Code: https://github.com/dandelionsllm/pandallm

  • Model version:
    insert image description here

  • Model Evaluation
    insert image description here

5、GTP4ALL

  • 简介
    Open-source assistant-style large language models that run locally on your CPU.

GPT4All is made possible by our compute partner Paperspace.

GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer grade CPUs.

A GPT4All model is a 3GB - 8GB file that you can download and plug into the GPT4All open-source ecosystem software. Nomic AI supports and maintains this software ecosystem to enforce quality and security alongside spearheading the effort to allow any person or enterprise to easily train and deploy their own on-edge large language models.

6、DoctorGLM (MedicalGPT-zh v2)

7、MedicalGPT-zh v1

  • Introduction
    This project is an open source Chinese medical general model based on ChatGLM-6B LoRA 16-bit instruction fine-tuning. Based on the Chinese medical consensus and clinical guideline texts of a total of 28 departments, we generate a high-quality instruction data set with more comprehensive coverage of medical knowledge and more accurate answers. In this way, the knowledge and dialogue ability of the model in the medical field can be improved.

  • Paper and Code
    Paper: https://arxiv.org/pdf/2304.01097.pdf
    Code: https://github.com/MediaBrain-SJTU/MedicalGPT-zh

  • Dataset construction
    insert image description here

8、Cornucopia-LLaMA-Fin-Chinese

  • Introduction
    Cornucopia: LLaMA fine-tuning model based on Chinese financial knowledge
    This project opens up the LLaMA-7B model that has been fine-tuned/instruct-tuned by Chinese financial knowledge. The instruction data set is constructed through Chinese financial public data + crawled financial data, and on this basis, LLaMA is fine-tuned to improve the Q&A effect of LLaMA in the financial field.

    Based on the same data, GPT3.5 API will be used to build a high-quality data set in the later stage, and the high-quality instruction data set will be further expanded on the Chinese knowledge map-finance.

  • papers and code

    Code: https://github.com/jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese/tree/main

  • Model download
    insert image description here

  • The data set construction
    currently uses open and crawled Chinese financial question-and-answer data, involving insurance, wealth management, stocks, funds, loans, credit cards, social security, etc.

    An example of training set data for instruction fine-tuning is as follows:

      问题:办理商业汇票应遵守哪些原则和规定?
    
      回答: 办理商业汇票应遵守下列原则和规定:1.使用商业汇票的单位,必须是在银行开立帐户的法人;2.商业汇票在同城和异地均可使用;3.签发商业汇票必须以合法的商品交易为基础;4.经承兑的商业汇票,可向银行贴现;5.商业汇票一律记名,允许背书转让;6.商业汇票的付款期限由交易双方商定,最长不得超过6个月;7.商业汇票经承兑后,承兑人即付款人负有到期无条件交付票款的责任;8.商业汇票由银行印制和发售。
    

    In view of the inaccuracy and incompleteness of the existing data, we will use the GPT3.5 interface to further build and expand the question-and-answer data around the Chinese financial knowledge base, and set up various prompt forms to make full use of knowledge to iteratively update the data set.

9、minGPT

  • 简介
    A PyTorch re-implementation of GPT, both training and inference. minGPT tries to be small, clean, interpretable and educational, as most of the currently available GPT model implementations can a bit sprawling. GPT is not a complicated model and this implementation is appropriately about 300 lines of code (see mingpt/model.py). All that’s going on is that a sequence of indices feeds into a Transformer, and a probability distribution over the next index in the sequence comes out. The majority of the complexity is just being clever with batching (both across examples and over sequence length) for efficiency.

  • Papers and Code

    Code: https://github.com/karpathy/minGPT

10、InstructGLM

11、FastChat

  • 简介
    FastChat is an open platform for training, serving, and evaluating large language model based chatbots. The core features include:

    • The weights, training code, and evaluation code for state-of-the-art models (e.g., Vicuna, FastChat-T5).
    • A distributed multi-model serving system with Web UI and OpenAI-compatible RESTful APIs.
  • Paper and Code
    Code: https://github.com/lm-sys/FastChat

  • Model Weights
    Vicuna Weights
    We release Vicuna weights as delta weights to comply with the LLaMA model license. You can add our delta to the original LLaMA weights to obtain the Vicuna weights. Instructions:

    Get the original LLaMA weights in the Hugging Face format by following the instructions here.
    Use the following scripts to get Vicuna weights by applying our delta. They will automatically download delta weights from our Hugging Face account.

insert image description here

12. Luotuo-Chinese-LLM

  • Introduction
    Luotuo: Open source Chinese large language model
    The Luotuo project is an open source project of Chinese large language model initiated by Leng Ziang @ Shang Tang Technology, Chen Qiyuan @ Central China Normal University and Li Lulu @ Shang Tang Technology. It includes a series of language model.

  • Papers and Code

    Code: https://github.com/LC1332/Luotuo-Chinese-LLM

13、CamelBell-Chinese-LoRA

Other open source projects to be added. . .

reference

https://github.com/mymusise/ChatGLM-Tuning
https://huggingface.co/BelleGroup/BELLE-7B-2M
https://github.com/LianjiaTech/BELLE
https://huggingface.co/datasets/BelleGroup/generated_train_0.5M_CN
https://huggingface.co/datasets/JosephusCheung/GuanacoDataset
https://guanaco-model.github.io/
https://github.com/carbonz0/alpaca-chinese-dataset
https://github.com/THUDM/ChatGLM-6B
https://huggingface.co/THUDM/chatglm-6b
https://github.com/lich99/ChatGLM-finetune-LoRA

Guess you like

Origin blog.csdn.net/zengNLP/article/details/131119734