Finally someone clarified the status quo of GPT

Table of contents

Quote:

How to train the GPT assistant?

How can we use the model more effectively?

Karpathy also mentioned AutoGPT:

About Andrej Karpathy

Conclusion:


Quote:

Hello everyone, we are Quanzhi Planet , start your unique interstellar journey of knowledge

Following the release of Windows Copilot, the popularity of the Microsoft Build conference was detonated by another speech.

Former Tesla AI director Andrej Karpathy pointed out in his speech that the thinking tree and AlphaGo's Monte Carlo tree search have similar characteristics.

Netizens shouted: This is the most detailed and interesting guide on the use of large language models and GPT-4 models!

Karpathy revealed that LLAMA65B is significantly more powerful than GPT-3175B due to training and data expansion. Additionally, he introduces ChatBotArena, the large anonymous chatbot arena.

Claude's score is between ChatGPT3.5 and ChatGPT4.

Netizens said: Karpathy's speeches are always wonderful, and this speech did not disappoint people.

What became popular with the speech is also a note compiled by Twitter netizens based on the speech. There are a total of 31 notes, and the number of reposts has exceeded 3000+

So, what exactly was included in this much-watched speech?

How to train the GPT assistant?

Karpathy's presentation consisted of two main parts.

In the first part, he elaborates on how to train a "GPT assistant".

Karpathy focuses on the four training phases of AI assistants: pre-training, supervised finetuning, reward modeling, and reinforcement learning.

Each stage requires the use of datasets.

Karpathy added more examples to complement:

Then enter the fine-tuning stage.

Using a smaller supervised dataset, fine-tuning this base model with supervised learning creates an assistant model that can answer questions.

He showed the evolution of some models, and I believe many people have seen the picture of the "evolution tree" before.

Karpathy believes that the best open source model is Meta's LLaMA series, because OpenAI has not disclosed anything about GPT-4.

To be clear, the base model is not the helper model.

Although the underlying models are capable of answering questions, the answers they give are not reliable enough. At this point, we can use the helper model to answer the question. By training and supervised fine-tuning on the base model, the assistant model outperforms the base model and is more reliable in generating answers and understanding text structure.

Reinforcement learning is another crucial stage in language model training.

Using high-quality data that has been manually labeled for training, a reward model can be used to build a loss function to improve performance. Next, reinforcement training is performed by increasing the probability of positive labels and decreasing the probability of negative labels.

Relying on human decision-making abilities is important for improving AI models when dealing with creative tasks. Introducing human feedback can train the model more efficiently.

Reinforcement learning has been further enhanced with human feedback, so we now have a RLHF model.

Now that the models have been trained, it's time to think about how to best use these models to solve problems.

How can we use the model more effectively?

Karpathy started a discussion for the second part, mainly including issues such as hinting strategy, fine-tuning, the rapidly developing tool ecosystem, and future expansion.

Karpathy provides concrete examples to illustrate:

When writing articles, we often involve various mental activities, and we need to think carefully about how to express them accurately. Compared to humans, for the GPT model, this is just a sequence of tags.

Prompts can alleviate this cognitive difference.

Karpathy further elaborates on how the thought chain prompt works.

For a Transformer in natural language processing to perform better at solving reasoning problems, it must provide information incrementally, rather than overly complex problems all at once.

If you give it multiple examples, it will transplant similar patterns and end up creating even better results.

If the model generates something wrong, you can prompt it to regenerate. When the model answers the question, it must follow its sequence.

If you don't require checking, it won't check automatically.

This question involves System1 and System2.

Daniel Kahneman is the winner of the Nobel Prize in Economics. In his book "Thinking, Fast and Slow", he proposed two subsystems of the human cognitive system: System1 and System2. Among them, System1 mainly relies on intuition, while System2 is a system responsible for logical analysis.

Simply put, System1 is an automated, quick process, while System2 is the deliberate part.

A recent popular paper called "Thinking Trees" also addresses this issue.

Thoughtful often refers not to simply giving an answer to a question, but more like a prompt used with Python glue code to string multiple prompts together. The model needs to maintain multiple hints, using a tree search algorithm to find the hints that need to be expanded.

Karpathy thinks this idea is very similar to AlphaGo:

When AlphaGo plays Go, it needs to think about where the next piece should be placed. At first, it learned by imitating humans.

Besides that, it also applies Monte Carlo tree search to obtain strategies with multiple possibilities. It can evaluate a variety of potential moves, and then screen out those relatively better strategies. In a way, I think it's like the AlphaGo algorithm.

Karpathy also mentioned AutoGPT:

I don't think it works as well as I would like, so I don't recommend it for practical use. There is no doubt that we can learn from its development ideas, but I think it will take time.

Secondly, there is another trick to use retrieval to enhance generation and effective hints.

While transformers are running, the contents of the window context are their working memory. If you can contextualize task-related information, it will excel because it has immediate access to it.

In short, you can make the model's data access more efficient by building an index of related data.

Transformers would perform even better if it had a primary document to refer to.

Finally, Karpathy gave a brief introduction to using constraint hints and fine-tuning in large language models to improve their performance. Use constraint hints to force the model to output text that conforms to template requirements, and fine-tuning to adjust the model's weights to improve performance.

My recommendation is to use large language models for low-stakes applications, always combined with human supervision. Think of them as sources of inspiration and advice, and consider them partners rather than tools for fully autonomous agency. Specifically, consider using copilots.

About Andrej Karpathy

Dr. Andrej Karpathy's first position after graduation was working in computer vision research at OpenAI.

Later, Musk, one of the co-founders of OpenAI, became interested in Karpathy and introduced him to Tesla. However, it was precisely because of this incident that a serious dispute occurred between Musk and OpenAI, which eventually led to Karpathy's dismissal. At Tesla, Karpathy served as the head of projects such as Autopilot and FSD.

Karpathy returned to OpenAI in February, seven months after leaving Tesla.

He recently tweeted his interest in the development of an open source large language model ecosystem, which is a bit of a sign of the early Cambrian explosion.

Conclusion:

Quanzhi Planet, start your unique knowledge interstellar journey! Light up your creativity, shine under the starlight of knowledge, and become a pioneer in the new era of knowledge dissemination! Let's explore the wonderful world of AI intelligence together , and let creativity and wisdom bloom here!

Guess you like

Origin blog.csdn.net/GPT56788/article/details/131022578