Commercially available multilingual chat LLM is open source, and its performance is close to that of GPT-4

ChatGPT has been soaring for 160 days, and the world is no longer what it used to be.

A new artificial intelligence Chinese website https://ai.weoknow.com
updates the available chatGPT resources available in China every day


SambaNova and Together have jointly open-sourced the commercially available BLOOMChat, a large language model (LLM) for multilingual chat with 176 billion parameters. Guided tuning by BLOOM (176B) on an assistant-style dialogue dataset with support for dialogue, question answering, and generative answers in multiple languages.

According to the introduction, BLOOMChat is a new, open, multilingual chat LLM. SambaNova and Together trained BLOOMChat on the SambaNova DataScale system using SambaNova's unique reconfigurable dataflow architecture; built on BigScience group's BLOOM and fine-tuned on OpenChatKit, Dolly 2.0 and OASST1's OIG. Currently, BLOOM is already the largest multilingual open model, trained on 46 languages.

In the evaluation of the six languages ​​of English, Chinese, French, Arabic, Spanish, and Indian, the winning rate of GPT-4 is 54.75%, and the winning rate of BLOOMChat is 45.25%, slightly weaker than GPT-4. But compared with the other 4 mainstream open-source chat LLMs, BLOOMChat performs better 65.92% of the time. And in a preliminary study using BLOOMChat for cross-lingual NLP tasks, BLOOMChat outperformed other BLOOM variants and mainstream open source chat models in the WMT translation benchmark.

“We do want to point out that some of these models we compared with were not suitable for multilingual environments. But because there are no alternatives in the open source community, this comparison is only possible now. Our results show that with the right technology, It is possible to build on top of open-source LLM for powerful multilingual chat. We hope that our findings and the release of the BLOOMChat checkpoint will contribute to ongoing discussions in the open-source community and inspire further development in the field of LLM."

The project team evaluated BLOOMChat's multilingual chat capabilities and cross-lingual task capabilities using qualitative and quantitative measures. A total of 3 different scenarios were tested and evaluated, including English, Chinese, Arabic, French, Spanish and Indian languages.

Experiment 1: Human Preference Sorting

The aim is to compare the chat capabilities of the BLOOMChat model with existing open-source models as well as selected closed-source models in multiple languages. The 22 questions in English from Appendix E of "OpenAssistant Conversations" were used as a baseline. First, some human volunteers manually translated the 22 English questions into their respective native languages; then, a different group of volunteers evaluated the answers given by each model under the premise of anonymity.

BLOOMChat is compared with three open source models OpenAssistant-30B, LLaMA-Adapter-V2-65B and BLOOMZ (176B):

A total of 1158 comparisons were submitted by 51 volunteers across all models and 6 languages. As shown in the figure above, BLOOMChat (65.92%) significantly outperforms several other open-source models.

Compared to GPT-4:

Experiment 2: Model Quality Evaluation

This experiment aims to verify the quality of multilingual texts generated by BLOOMChat.

81.8% of the responses were classified as "Correct" or "Acceptable with minor flaws". Despite fine-tuning only on the English dataset, BLOOMChat achieved more than 70% "correct" or "acceptable" ratings in each language.

Experiment 3: WMT translation task

In order to gain a preliminary understanding of the model's ability to solve cross-lingual NLP tasks, the translation ability of the model on the WMT translation task is evaluated.

Overall, BLOOMChat performs significantly better than other BLOOM variants and open source chat models in translation tasks, but still has a certain gap with GPT-4.

Additionally, the BLOOMChat team also acknowledged some limitations of the model:

  • BLOOMChat may occasionally generate reply messages that sound reasonable but are incorrect or off topic.

  • BLOOMChat may inadvertently switch languages ​​within a single reply, affecting the coherence and intelligibility of the output.

  • BLOOMChat may produce repetitive phrases or sentences, making responses less engaging and less informative.

  • BLOOMChat may have limited performance in generating code or solving complex math problems.

  • BLOOMChat may inadvertently generate replies with inappropriate or harmful content.

More information can be found in the full announcement: https://sambanova.ai/blog/introducing-bloomchat-176b-the-multilingual-chat-based-llm/


ChatGPT has been soaring for 160 days, and the world is no longer what it used to be.

A new artificial intelligence Chinese website https://ai.weoknow.com
updates the available chatGPT resources available in China every day

Guess you like

Origin blog.csdn.net/zyqytsoft/article/details/131075796