How ChatGPT Works: A Deep Dive

This article was first published on the WeChat public account: Daqian World, my WeChat: qq449245884, I will share with you the front-end industry trends, learning paths, etc. as soon as possible. For more open source works, please see GitHub github.com/qq449245884… , including the complete test sites, materials and my series of articles for interviews with first-line manufacturers.

Come and experience the ChatGpt plus version for free, we paid the experience address: chat.waixingyun.cn , you can join the technical group at the bottom of the website, and find bugs together.

This article mainly discusses how ChatGPT works. ChatGPT is a large-scale language model based on the GPT-4 architecture developed by OpenAI. First, the article introduces the basic concept of GPT, which is to generate predictive network models . The GPT model is trained with a large amount of text data to learn to generate coherent text in various contexts.

Then, the article elaborates on the training process, which is divided into two stages: pre-training and fine-tuning . In the pre-training phase, the model learns to understand text data, including vocabulary, grammar, facts, etc.; in the fine-tuning phase, the model is tuned using a dataset with restricted tasks to obtain more accurate output. The authors also mention the source of the training data, emphasizing the importance of acquiring knowledge in large amounts of web text data.

When explaining the output generation, the article mentions a key technology: beam search (Beam Search) . This is a heuristic search strategy for selecting optimal text sequences. Additionally, the authors highlight strategies for addressing generated content issues, including setting filters and tuning temperature parameters.

Finally, the article discusses the limitations of ChatGPT, such as the possibility of bias when processing input data, or the inability to answer some questions. Nonetheless, the authors point out that ChatGPT is a powerful tool capable of providing valuable assistance in various tasks.


How do large language models like ChatGPT actually work? Well, they are both very simple and extremely complex.

111111111.gif

You can think of a model as a tool that computes the probability of an output given some input. In language models, this means that given a sequence of words, they calculate the probability of the next word in the sequence, just like advanced autocomplete.

image.png

To understand where these probabilities come from, we need to talk about something called a neural network . It's a network-like structure where numbers are input to one side and probabilities are output to the other. They are simpler than you might think.

2222.gif

Imagine we wanted to train a computer to solve 3x3the simple problem of recognizing symbols on a pixelated display. We need a neural network like this:

  • an input layer
  • two hidden layers
  • an output layer.

image.png

Our input layer consists of 9 nodes called neurons, one for each pixel. Each neuron will hold a number from 1 (white) to -1 (black). Our output layer consists of 4 neurons, each representing one of the possible symbols. Their values ​​will end up being probabilities between 0 and 1.

image.png

Between these, we have arrangements of neurons called **"hidden" layers**. For our simple use case, we only need two. Each neuron is connected to neurons in adjacent layers by a weight, which can have a value between -1 and 1.

image.png

When a value is passed from an input neuron to the next layer, it is multiplied by weights. That neuron then simply adds up all the values ​​it receives, compresses that value between -1 and 1, and passes it on to every neuron in the next layer.

image.png

The neurons in the last hidden layer do the same, but compress the values ​​between 0 and 1 and pass them to the output layer. Each neuron in the output layer holds a probability, with the highest number being the most likely outcome.

image.png

When we train this network, we feed it an image for which we know the answer, and calculate the difference between the answer and the probability calculated by the network. We then adjust the weights to approximate the desired result. But how do we know how to adjust the weights?

image.png

我们使用称为梯度下降反向传播的巧妙数学技术来确定每个权重的哪个值会给我们最低的误差。我们不断重复这个过程,直到我们对模型的准确性感到满意。

3333.gif

这被称为前馈神经网络 - 但这种简单的结构不足以解决自然语言处理的问题。相反,LLM倾向于使用一种称为Transformer的结构,它具有一些关键概念,可以释放出很多潜力。

image.png

首先,让我们谈谈单词。我们可以将单词分解为 token ,这些 token 可以是单词、子单词、字符或符号,而不是将每个单词作为输入。请注意,它们甚至包括空格。

4444.gif

就像我们的模型中将像素值表示为0到1之间的数字一样,这些token也需要表示为数字。我们可以为每个标记分配一个唯一的数字并称之为一天,但还有另一种表示它们的方式,可以添加更多上下文。

555.gif

我们可以将每个 token 存储在一个多维向量中,指示它与其他标记的关系。为简单起见,想象一下在二维平面上绘制单词位置。我们希望具有相似含义的单词彼此靠近。这被称为 embedding 嵌入

image.png

embedding 有助于创建相似单词之间的关系,但它们也捕捉类比。例如,单词“dog”和“puppy”之间的距离应该与“cat”和“kitten”之间的距离相同。我们还可以为整个句子创建 embedding 。

image.png

transformer 的第一部分是将我们的输入单词编码为这些 embedding。然后将这些嵌入馈送到下一个过程,称为 attention ,它为 embedding 添加了更多的上下文。attention 在自然语言处理中非常重要。

image.png

Embedding 难以捕捉具有多重含义的单词。考虑 bank 这个词的两个含义。人类根据句子的上下文推断出正确的含义。MoneyRiver 在每个句子中都是与 bank相关的重要上下文。

image.png

attention 的过程会回顾整个句子,寻找提供词汇背景的单词。然后重新调整 embedding 权重,使得单词“river”或“money”在语义上更接近于“word bank”。

image.png

这个 attention 过程会多次发生,以捕捉句子在多个维度上的上下文。在所有这些过程之后,上下文 embedding 最终被传递到神经网络中,就像我们之前提到的简单神经网络一样,产生概率。

这是一个大大简化了的LLM(像ChatGPT这样的语言模型)工作原理的版本。为了简洁起见,本文省略或略过了很多内容。

编辑中可能存在的bug没法实时知道,事后为了解决这些bug,花了大量的时间进行log 调试,这边顺便给大家推荐一个好用的BUG监控工具 Fundebug

交流

有梦想,有干货,微信搜索 【大迁世界】 关注这个在凌晨还在刷碗的刷碗智。

This article GitHub github.com/qq449245884 ... has been included, and there are complete test sites, materials and my series of articles for interviews with first-line manufacturers.

Guess you like

Origin juejin.im/post/7234146188508168252