GPT-4: Paper reading notes

  • Input and output of GPT-4: The input content is text or pictures, and the output content is text. Therefore, GPT-4 is a model with multi-modality on the input side.
  • The effect of GPT-4: It is still not as good as humans in the real world, but it has reached the level of humans or even surpassed humans in many professional tasks. For example, GPT-4 can pass the lawyer qualification examination in the top 10% of all candidates (GPT-3.5 can only be ranked in the bottom 10%).
  • GPT-4 image input function: The content announced by GPT-4 does not yet support image uploading. This is an internal beta function. Currently, OpenAI has only selected one partner company to test the image input function.
  • GPT-4’s alignment process: GPT-4 took six months to align. Align here enables the model to execute human instructions on the one hand, and at the same time enables the model to generate output that is consistent with human views and is safe and useful. Align’s approach involves using learning from bad examples in user experience. OpenAI believes that GPT-4 is their best model currently and has made great progress in terms of security and controllability.
  • GPT-4’s deep learning stack: OpenAI rebuilt the deep learning stack (together with Microsoft cloud Azure) and redesigned a supercomputing cluster for training GPT-4. This cluster was also used to train GPT-3.5 a year ago. During this training, they fixed some bugs and found that the training process was very stable.
  • GPT-4 training tasks: GPT-4 is also trained using traditional language model tasks.
  • The role of RLHF: To make the model’s answers consistent with human intentions and to keep the model safe and controllable. (In other words, RLHF is to control the model so that the model can better understand the intention of the questioner and answer in the way the user likes)
  • Discoveries in pre-training: OpenAI’s ability to discover models seems to be obtained from the pre-training process. RLHF cannot improve the scores of various exams, and sometimes makes it worse. Grades dropped.
  • Predictable training loss: OpenAI knows the final loss when it starts training on GPT-4. This loss result is extrapolated from a loss function trained on another dataset that is 10,000 times smaller (but with the same method). Because the training stability of large models is very important, this method is very practical.
  • GPT-4 is more rational: One problem with previous large models is that the larger the model, the more irrational it becomes. But GPT-4 overcomes this shortcoming.
  • Comparison of the capabilities of GPT-4 and GPT-3.5: For daily conversations, there is not much difference between GPT-4 and GPT-3.5. But as the task difficulty increases, this difference becomes apparent. GPT-4 is more reliable and more creative.
  • GPT-4’s math and literature are bad: GPT-4’s math is still bad. In addition, GPT-4 is not strong enough in the linguistics and literature tests itself, but most of what it generates is empty talk.
  • Comparison between GPT-4 and other large NLP models: GPT-4’s performance on multiple Benchmark data sets is significantly higher than other previous language models, and it is large Crushing in magnitude.
  • Comparison between GPT-4 and other large CV models: GPT-4’s image input performance is also good, but it is not as good as GPT-4’s NLP effect.
  • Multi-language performance of GPT-4: GPT-4 has the best performance in English, and also has good results in Chinese. In addition, the performance of GPT-4 in different languages ​​is not directly related to the number of users of the language.
  • GPT-4’s System Message function: Let GPT-4 play a designated role to determine the tone of voice when talking to the user.
  • Security of GPT-4: The security of GPT-4 has been significantly improved, 40% higher than that of GPT-3.5. In addition, GPT-4 improves security by itself: it sets a reward signal in the RLHF process, creates a classifier based on the pre-trained model, and determines whether an answer is sensitive, dangerous and should not be answered. Thereby improving the security of your own answers.
  • Limitations of GPT-4: The training data of GPT-4 ends in September 2021 (although the model may be updated with new data in the subsequent process) . Additionally, GPT-4 is susceptible to user spoofing.
  • Confidence of GPT-4: Before going through RLHF, GPT-4’s confidence in the answer content and the possibility of the answer itself being correct are basically aligned. However, after RLHF, the calibration of the model dropped significantly.
  • GPT-4 text length: The text length of GPT-4 is 8192 Tokens, which is already very long compared to previous models. In addition, GPT-4 also has a version with a token length of 32768.
  • An image generation method of GPT-4: First let GPT-4 generate code according to the specified description, and then run the code to get the image. GPT-4 can generate images in this way, but they are relatively rudimentary images.

Guess you like

Origin blog.csdn.net/hanmo22357/article/details/134490372