A text goes deep into how the machine of the GPT large language model understands text, sound and video, what detours it has taken, and the mathematical basis of the model behind ChatGPT

I regularly interact with colleagues in different fields and enjoy the challenge of conveying machine learning concepts to people with little background in data science. Here I try to explain in simple terms how GPT is connected, only this time in writing.

Behind the popularity magic of ChatGPT lies an unpopular logic. You write a prompt to ChatGPT, and it generates text that resembles a human answer, whether it's accurate or not. How is it able to understand your prompts and generate coherent and understandable answers?

Transformer neural network. The architecture is designed to handle large volumes of unstructured data (text in our case). When we say architecture, we mean essentially a series of mathematical operations performed in parallel across multiple layers. Through this set of equations, several innovations are introduced that help us overcome long-standing challenges in text generation. We struggled with these challenges until 5 years ago.

If GPT has been around for 5 years (in fact the GPT paper was published in 2018), isn't GPT old news? Why has it become so popular recently? What is the difference between GPT 1, 2, 3, 3.5 (ChatGPT) and 4?

All GPT versions are built on the same architecture. However, each of the following models contains more parameters and is trained with larger text datasets. Clearly, later versions of GPT introduced other novelties, especially during training, such as reinforcement learning with human feedback, which we will explain in the third part of this blog series.

Vectors, matrices, tensors. All those fancy words are essentially cells that contain blocks of numbers. These numbers are subjected to a series of mathematical operations (mainly multiplication and summation) until they reach an optimal output value, which is the probability of a possible outcome.

output value? In that sense, it's the text generated by the language model, right? Yes. So, what is the input value? Is it my tip? Yes, but not quite. so what's behind

Guess you like

Origin blog.csdn.net/iCloudEnd/article/details/132661833