Harbin Institute of Technology open source "movable type" dialogue model

1. Introduction

Large-scale language models (LLMs) have achieved impressive success in the general domain of natural language processing. For a wide range of application scenarios, this technology shows strong potential, and interest in academia and industry continues to heat up. More than 30 teachers and students from the Institute of Natural Language Processing of Harbin Institute of Technology participated in the development of a large-scale general dialogue model Movable Type 1.0 . Apps offer more possibilities and choices.

picture

Limitations:  Due to the small number of model parameters and the autoregressive generation paradigm, movable type may still generate misleading replies containing factual errors or harmful content containing prejudice/discrimination, please carefully identify and use the generated content, and do not use the generated content Harmful content is spread to the Internet. If adverse consequences occur, the disseminator shall be responsible for them.

2. Model Features

Type 1.0

  • Movable type 1.0 was developed by more than 30 teachers and students from the Institute of Natural Language Processing of Harbin Institute of Technology

  • On the basis of BLOOM-7B, after fine-tuning instructions, a more general ability to complete tasks is obtained

    • Support bilingualism in Chinese and English : Excellent results have been achieved in standard Chinese/English benchmarks and subjective evaluations, while supporting multilingual dialogue capabilities

    • Richer instruction fine-tuning data : More instruction fine-tuning templates are artificially constructed, as well as SFT data constructed by a series of Self-instruction instructions, making instruction fine-tuning data richer

      • Gain better command followability

      • Supports generating codes and tables

    • Higher-quality security data : Based on multiple rounds of confrontation attacks, manually design security data in the form of SFT to strengthen the security and compliance of model responses

      • The security index reached 84.4%, surpassing ChatGPT on a specific test set

Type 2.0

  • Movable Type 2.0 was developed by the Social Computing and Information Retrieval Research Center (SCIR) of Harbin Institute of Technology

  • Based on Movable Type 1.0, the quality of model responses is further optimized through reinforcement learning of human feedback (RLHF), making it more in line with human preferences

    • Stable PPO training with a variety of tricks : training is more stable and efficient

      • Keep the data distribution consistent during training

      • Incorporate a KL-divergence penalty in the reward function

      • Actor weight moving average

    • Chinese preference data marked in multiple dimensions : richer answers, stronger ability to follow instructions, and clearer logic

      • Is it inductive to mark the instruction?

      • Score each reply from the three dimensions of usefulness, authenticity and harmlessness

      • Comprehensive consideration of Instruction category and preference sorting of reply quality

picture

In order to better promote the technical progress of Chinese large-scale models, Harbin Institute of Technology Daer Laboratory has open sourced the two versions of the large language models of "Movable Type 1.0" and "Movable Type 2.0" . The GitHub address is  https://github.com/HIT-SCIR/huozi, and you can also click "Read the full text" to enter.

At the same time, we open sourced the first human-labeled Chinese dataset for training the RLHF reward model .

Researchers, developers and technology enthusiasts are welcome to try it out and provide valuable feedback and suggestions.

3. Model evaluation

Public benchmark list

  • C-Eval dataset : It is a comprehensive Chinese basic model evaluation dataset, covering 52 subjects and four levels of difficulty. We test on the val set using the dev set of this dataset as the source of few-shots  5-shot .

  • Gaokao  is a data set that uses Chinese college entrance examination questions as an evaluation of the ability of large language models to evaluate the language ability and logical reasoning ability of the model. We only kept the multiple-choice questions among them, and  zero-shot tested all models uniformly after random division.

  • MMLU  is an English evaluation data set containing 57 multiple-choice tasks, covering elementary mathematics, American history, computer science, law, etc., and the difficulty ranges from high school level to expert level. It is currently the mainstream LLM evaluation data set. We adopted an open source evaluation scheme, and finally 5-shot

Model C-Eval MMLU GAOKAO (Science) GAOKAO (Liberal Arts)
GPT-4 68.3 86.4 - -
ChatGPT 50.0 67.3 364 398
LLAMA-7B - 27.8 - -
Chinese-Llama-7B 6.5 31.4 105 126
Chinese-Falcon-7B 24.5 21.0 113 121
Bloom-7b 22.4 25.5 114 127
BLOOMZ-7B - 28.7 - -
Type 1.0 21.7 35.6 120 138

Manual comprehensive evaluation

We built a set of comprehensive bilingual test data sets (525 records in total) to conduct manual comprehensive evaluation on the fluency, relevance, authenticity and other indicators generated by the model.

                                    Comprehensive quality (%) Fluency (%) Correlation(%) Authenticity (%) Instructions followed (%) safety(%)
Type 1.0 70.4 94.6 91.5 85.5 81.1 84.4
ChatGPT 86.5 98.8 98.1 92.9 86.8 81.9
  • 综合质量: Human-evaluated comprehensive quality of model-generated text.

  • 流畅性: Whether the language model can generate fluent responses

  • 相关性: Whether the response generated by the language model is relevant to the question (correct or not)

  • 真实性: Whether the model generation results have no obvious error information, whether misleading information, or information of doubtful authenticity is generated.

  • 指令遵循: Whether it can accurately meet the needs specified by humans.

  • 安全性: The ratio of the induced model to generate harmful replies and the test model to generate safe and harmless replies.

4. Example of interaction

  • poetry writing

    picture

  • copywriting

    picture

  • Math Word Problems

    picture

  • code generation

    picture

    picture

  • multi-language

    picture

  • Quiz

    picture

  • form ability

    picture

  • Safe and harmless

    picture

5. "ChatGPT Research Report"

The Institute of Natural Language Processing of Harbin Institute of Technology organized a number of teachers and students to write this research report, which introduced and summarized ChatGPT in as much detail as possible in terms of technical principles, application scenarios, and future development. The PDF file of the report has been uploaded to Github.

6. Conclusion

The introduction of the "movable type" large language model is the latest effort of the Institute of Natural Language Processing of Harbin Institute of Technology in the field of natural language processing. The open-source nature of the project encourages wider participation and experimentation, helping to advance the research and application of natural language processing techniques. However, due to the model parameters and autoregressive generation paradigm, movable type may still generate harmful content, please carefully identify and use the generated content, and do not spread the generated harmful content to the Internet. Finally, we sincerely invite you to visit our GitHub project page, experience the large language model of movable type, and discuss the future development of Chinese natural language processing together.

Editor in charge of this issue: Zhang Weinan

Editor of this issue: Yang Xin

Guess you like

Origin blog.csdn.net/weixin_48827824/article/details/132294403