Approaching GPT-4! BLOOMChat: An open source and commercially available large language model that supports multiple languages

background

Two companies, SambaNova and Together, open sourced the commercially available multilingual fine-tuning model BLOOMChat on May 19, 2023.

SambaNova focuses on providing a generative AI platform for enterprises and governments, and Together focuses on creating a one-stop foundation model in an open source way to empower various industries.

OpenAI's GPT-4 and Google's PaLM2 have done a good job in multilingual support, but both of them are closed source, and the open source large language model mainly has the following pain points that cannot be solved:

  • First, most are not commercially available. For example, Meta's open source LLAMA and Vicuna based on LLAMA are not commercially available and can only be used for academic research. The model weights of ChatGLM open sourced by Tsinghua University and Zhipu AI are also not available for commercial use.
  • Second, general support for non-English languages. The training corpus of most open source models is mainly in English, and the non-English dialogue effect is average. However, more than 80% of the people in the world do not speak English, and how to solve the pain points of these people is also very important.

Many domestic enterprises and companies are also investigating how to fine-tune based on the open source model, create a large language model that supports Chinese, and apply it to their own business scenarios.

The Bloom pedestal model open sourced by BigScience is the first choice of many Internet companies, because this model is commercially available, supports 46 languages ​​including Chinese, and has enough model parameters, with 176 billion parameters.

Some companies directly use the fine-tuned Bloomz model based on Bloom to further fine-tune and create a vertical LLM.

SambaNova and Together jointly open source BLOOMChat, the purpose of which is to create an open source chat LLM that supports multiple languages ​​and is commercially available. Experiments show that BLOOMChat's support for multiple languages ​​is significantly better than other open source models.

BLOOMChat

BLOOMChat is trained on the AI ​​computing platform RDUs (Reconfigurable Dataflow Units) provided by SambaNova.

The native speaker of each language is used to evaluate the answering effect of the model.

For the answers in English, Chinese, French, Arabic, Spanish, and Indian, compared to GPT-4's 54.75% winning rate, BLOOMChat achieved a 45.25% winning rate, which is weaker than GPT-4.

However, it outperformed other mainstream open-source chat LLMs 66% of the time.

It also performs well in WMT translation tasks, ahead of other fine-tuning models based on BLOOM and other mainstream open source chat models.

The idea of ​​BLOOMChat is inspired by previous work, that is, fine-tuning instructions in one language can improve the performance of multilingual models in another language. BLOOMChat uses data sets including OpenChatKit's OIG, Dolly 2.0 and OASST1 The model fine-tuning of BLOOM (176B) is performed on the English-based dialogue dataset.

Despite fine-tuning only on the English dataset, the authors observe that the chat quality of BLOOMChat also improves significantly in non-English scenarios.

data collection

There are 2 types of instruction data for BLOOMChat fine-tuning.

  • The first is OpenChatKit, a dialogue dataset automatically synthesized by the program, with a large amount of data. The OpenChatKit training data set is open-sourced by Together, together with LAION and Ontocord.
  • The second type is Dolly 2.0 and OASST1, a high-quality question-and-answer dataset written manually, with a small amount of data.

Instruction fine tuning (fine tune)

The entire fine-tuning is carried out on the RDU (Reconfigurable Dataflow Units) AI platform of SambaNova, and the base model is BLOOM-176B.

Fine-tuning is done in 2 steps:

  • In the first step, each data source of OpenChatKit is sampled according to the amount of 100k data, and then a round of training is performed. This is because OpenChatKit contains a variety of data sources, and the amount of data is relatively large, so each data source of OpenChatKit is first sampled to obtain many sub-datasets, and then all the sub-datasets are fully fine tuned.
  • The second step is to do 3 rounds of fine tune on the dataset combined with Dolly 2.0 and OASST1.

All dataset-related data and codes, fine-tuning and reasoning scripts are free and open-sourced on GitHub, and the open-source address refers to the link at the end of the article.

Experimental effect

The BLOOMChat team conducted experimental evaluations in 3 different scenarios, evaluating English, Chinese, Arabic, French, Spanish, and Indian languages.

Experiment 1: Human Evaluation

Using the 22 English questions in OpenAssistant Conversations as a benchmark, let native speakers of other languages ​​translate these 22 English questions into other languages, and then find another native speaker to evaluate the answers given by the model .

The following 3 open source models were evaluated:

As you can see from the picture above, BLOOMChat is significantly better than several other open source models.

Compared with GPT-4, it is still slightly inferior. In the evaluation records of GPT-4, 55% of the evaluation records are better than BLOOMChat.

Experiment 2: Model Quality Assessment

Let the native speaker evaluate the answer data of BLOOMChat.

As can be seen from the figure above, despite only fine-tuning on the English dataset, more than 70% of the answers for each language are correct or acceptable.

Experiment 3: WMT translation task

Comparing the performance of multiple open source models on WMT translation tasks, in general, BLOOMChat is better than other open source models, but significantly weaker than GPT-4.

Limitations of BLOOMChat

Like most chat language models (LLMs), BLOOMChat has some limitations:

  • BLOOMChat may occasionally generate reply messages that sound reasonable but are in fact incorrect or off topic.

  • BLOOMChat may inadvertently switch languages ​​within a single reply, affecting the coherence and intelligibility of the output.

  • BLOOMChat may produce repetitive phrases or sentences, resulting in replies that are not engaging and informative.

  • BLOOMChat is relatively mediocre at generating code or solving complex math problems.

  • BLOOMChat may inadvertently generate replies containing inappropriate or harmful content.

Summarize

BLOOMChat is the first fully open-source chat LLM with over 100 billion parameters and dedicated to multilingual support.

Articles and sample codes are open source on GitHub: GPT Practical Tutorial , where you can see all mainstream open source LLMs.

Official account: advanced coding. Follow the official account to get the latest GPT combat content.

Personal website: Jincheng's Blog .

Zhihu: Wuji .

References

  • https://sambanova.ai/blog/introducing-bloomchat-176b-the-multilingual-chat-based-llm/
  • https://huggingface.co/spaces/sambanovasystems/BLOOMChat
  • https://github.com/sambanova/bloomchat

Guess you like

Origin blog.csdn.net/perfumekristy/article/details/130792733