I can’t imagine how cool it would be to automatically label training data.

Deep learning has three bottlenecks

  • Computing power: The previous articles have more or less mentioned it, and it is relatively easy to understand. I will write more details in the future.

  • Algorithm: I will keep writing in the column about deep learning from getting started to not wanting to give up.

  • Data: Write something today

      DeepMind published an article a few days ago and found that if the CNN is made into a structure as large as the Transformer, the training results obtained are actually not much different. In fact, there has been an argument for a long time that deep learning will consider the data and Computing power is the decisive factor. I am not convinced about this point of view (at least because of serialization, LSTM is difficult to make as efficient as transfomer and CNN), but if you put it from another perspective, the algorithm determines the limit of the neural network. Strength and data determine the upper limit of the neural network (in some cases, it has to be said the other way around), then I agree with both hands.

      Let’s read a paper first:

  Training_language_models_to_follow_instructions_with_human_feedback.pdf (openai.com)

      This paper explains the entire process of Openai's RLHF. I believe that you may have seen some of the pictures in the paper in some PPTs, such as the following picture:

      

Image

      But if you read this article carefully, you will find a lot of useful things, including Easter eggs. It is recommended to read the full article in detail if possible. In fact, RLHF was first proposed by Openai and Deepmind in 2017. The idea is very useful and simple. It means humans label and re-enforce learning, but why in the six years from 2017 to 2023, only Openai has done it on ChatGPT? Don’t others understand this paper?

      No, let’s take a look at a very interesting paragraph mentioned in the paper:

    To produce our demonstration and comparison data, and to conduct our main evaluations, we hired a team of about 40 contractors on Upwork and through ScaleAI. Compared to earlier work that collects human preference data on the task of summarization (Ziegler et al., 2019; Stiennon et al., 2020; Wu et al., 2021), our inputs span a much broader range of tasks, and can occasionally include controversial and sensitive topics. Our aim was to select a group of labelers who were sensitive to the preferences of different demographic groups, and who were good at identifying outputs that were potentially harmful. Thus, we conducted a screening test designed to measure labeler performance on these axes. We selected labelers who performed well on this test; for more information about our selection procedure and labeler demographics, see Appendix B.1. 

     Yes, you read that right. In order to do RLHF and PPO, OpenAI hired 40 Contractors companies to do data annotation and alignment. Imagine how many people worked together to complete the creation of the data set. .

      In fact, the real difficulty of RLHF is not technology, but money. After all, labor cost is the most expensive!

      Then I don’t have money and can’t afford to hire so many companies and people. Can I train a large model and can I get a data set? sure,

Guess you like

Origin blog.csdn.net/kingsoftcloud/article/details/134946336