Anthropic LLM paper reading notes

  • Research time: The work done at the same time as Instrcut GPT, although it was released later than ChatGPT, was actually completed earlier than ChatGPT.
  • Application difference with ChatGPT : This model has a higher probability of answering I don’t know than ChatGPT.
  • Reinforcement Learning for Large Language Models (RLHF): This approach was found to improve performance on almost all NLP tasks. As the parameters increase, the effect becomes better and better. If the model is assisted by reinforcement learning, the effect will be further improved; if it is trained with usefulness, the model will be improved even more; but if it is trained with harmfulness, the usefulness of the model will decrease, and the non-harmfulness will be reduced. promote.
  • The model is constantly updated: Training is performed with a new reward function and reinforcement learning goal every week, which is equivalent to online learning.
  • The model achieves usefulness and harmlessness: Let the model learn on two sets of data sets representing usefulness and harmlessness respectively. These two characteristics are actually contradictory. Although it is not a big problem to put the two data sets together for training, subsequent improvements are still needed for optimization.
  • Data annotation: In the data annotation stage, the model generates two answers each time, and the annotation workers choose the answer they think is better.
  • The effect of RLHF on models of different specifications: If the scale of the model itself is small, the accuracy of the model under zero-shot conditions will decrease after RLHF; but this problem is Solve it after the model size becomes larger.
  • Data type: Multi-round conversation data is used instead of conventional QA (single-round) data, so it is similar to ChatGPT.
  • Compare the effects of different models: Compare through the Elo score, and calculate the winning rate in the two models through the Elo score. The higher the winning rate, the better the model performs.
  • The relationship between model accuracy and data volume: As the amount of data increases exponentially, the accuracy of the model shows a linear improvement.
  • The relationship between model accuracy and the number of dialogue rounds: The general trend is that as the number of dialogue rounds increases, the accuracy of the model will decrease.

Supongo que te gusta

Origin blog.csdn.net/hanmo22357/article/details/134564785
Recomendado
Clasificación