foreword

With the popularity of generative AI such as ChatGPT, many developers are interested in AI. The author is a development engineer at the application layer. Presumably many similar developers do not know much about AI, so from my own understanding, I write an introduction to AI for Xiaobai, hoping to help everyone.

This is GPT's evaluation of this article, so please feel free to eat:

Very good analysis, very thorough explanation of the basic concepts in the field of artificial intelligence and the principle of ChatGPT. In the process, you mentioned the concepts of large language models and neural networks, and explained their application in ChatGPT. In addition, you also mentioned other important AI fields, such as natural language processing, computer vision, reinforcement learning, and autonomous driving, so that readers can have a more comprehensive understanding of the overall situation in the field of artificial intelligence.

basic concept

First, we need to introduce the basic concepts that the artificial intelligence industry needs to use:

Neural Networks (Neural Networks): A machine learning algorithm that mimics the human nervous system for tasks such as recognizing images, speech, and natural language.
Natural Language Processing (NLP): Computer processing technology of human language, including speech recognition, text processing, machine translation, etc.
Machine Learning: An artificial intelligence technique that allows computers to learn from data sets in order to make predictions or make decisions on new data.
Deep Learning (Deep Learning): A branch of machine learning that uses multi-layer neural networks for learning and inference, and is used in image recognition, speech recognition, natural language processing and other fields.
Reinforcement Learning: A machine learning technique that allows computers to learn behavior and decision making by interacting with their environment, such as the game of Go and Atari.
Large model (LLM): A large model refers to a machine learning model with a large number of parameters and complex structure, which usually requires a large amount of computing resources and data to train and optimize. These models can be used for various tasks such as natural language processing, computer vision, and speech recognition, among others. An example is ChatGPT, which has 175 billion parameters.
Computer Vision: An artificial intelligence technology that enables computers to understand and interpret image and video content, such as face recognition, object tracking, scene segmentation, etc.
Data Mining: A technique for automatically discovering patterns and knowledge from large data sets, used in fields such as business, medicine, and science.
Human-Computer Interaction (HCI): Study the interaction between humans and computers, and design smarter and more humanized user interfaces and designs.
Autonomous Driving: A self-driving car based on artificial intelligence technology and sensors capable of driving and navigating without human intervention.
Speech Recognition: A machine learning technique that enables computers to recognize and interpret human speech for voice interaction and control.

Analysis from chatGPT

We start from the application layer of chatGPT and analyze it in reverse, which may be easier to understand.

The principle of ChatGPT

All it does is basically just repeatedly asking "given the text so far, what should the next word be?" -- and adding one word at a time. At each step, it gets a list of words with probabilities and then assembles them with different randomness. If it is a more professional term, it uses a deep learning model such as a long short-term memory network (LSTM) or a transformer (Transformer) to model the context and predict the probability distribution of the next word or word sequence.

how does the probability come

We imagine a scene, starting with "the cat is", to splice a sentence. Here is a JavaScript example of an n-gram (note that ChatGPT does not use this algorithm):

// 定义n-gram模型的参数
const n = 2; // n-gram的n值
const data = ['猫', '是', '小', '动', '物', '之', '一', '。', '狗', '也', '是', '小', '动', '物', '之', '一', '。', '喵', '喵', '是', '猫', '发', '出', '的', '声', '音', '。', '汪', '汪', '是', '狗', '发', '出', '的', '声', '音', '。']; // 语料库

// 定义生成下一个单词的函数
function generateNextWord(prefix, model) {
    
    
  const candidates = model[prefix];
  if (!candidates) {
    
    
    return null;
  }
  const total = candidates.reduce((acc, cur) => acc + cur.count, 0);
  let r = Math.random() * total;
  for (let i = 0; i < candidates.length; i++) {
    
    
    r -= candidates[i].count;
    if (r <= 0) {
    
    
      return candidates[i].word;
    }
  }
  return null;
}

// 定义生成句子的函数
function generateSentence(prefix, model, maxLength) {
    
    
  let sentence = prefix;
  while (true) {
    
    
    const next = generateNextWord(prefix, model);
    if (!next || sentence.length >= maxLength) {
    
    
      break;
    }
    sentence += next;
    prefix = sentence.slice(-n);
  }
  return sentence;
}

// 训练n-gram模型
const model = {
    
    };
for (let i = 0; i < data.length - n; i++) {
    
    
  const prefix = data.slice(i, i + n).join('');
  const suffix = data[i + n];
  if (!model[prefix]) {
    
    
    model[prefix] = [];
  }
  const candidates = model[prefix];
  const existing = candidates.find(candidate => candidate.word === suffix);
  if (existing) {
    
    
    existing.count++;
  } else {
    
    
    candidates.push({
    
     word: suffix, count: 1 });
  }
}

// 使用示例
const prefix = '猫是';
const maxLength = 10;
const sentence = generateSentence(prefix, model, maxLength);
console.log(sentence); // 输出 "猫是小动物之一。"

In this example, a fixed result is returned every time, but if the n-gram list is long enough, it can have a certain degree of randomness. So at this time, a large language model is needed to provide enough corpus

big language model

A large language model (such as GPT-3) can be considered as a machine learning model based on the neural network of deep learning technology, so it can be regarded as a kind of neural network model. Specifically, large language models are trained using unsupervised learning methods that utilize large text datasets for learning to perform well in natural language processing tasks.

Machine learning models generally fall into the following categories:

1. Linear regression model: used to predict the value of continuous variables, such as the prediction of housing prices.

2. Logistic regression model: used for classification problems, such as spam classification.

3. Decision tree model: used for classification and regression problems, it can automatically find out the decision rules in the data.

4. Random Forest Model: An ensemble learning model based on multiple decision trees for classification and regression problems.

5. Support vector machine model: used for classification and regression problems, to find the optimal hyperplane in high-dimensional space.

6. Neural network model: used for processing complex tasks such as image recognition, natural language processing, and speech recognition.

7. Clustering model: used to divide data into different categories, such as K-means clustering, etc.

8. Reinforcement learning model: used for intelligent decision-making and control problems, such as the control of autonomous vehicles.

So where did the big model come from?

Large models are usually trained in neural network algorithms because neural network algorithms can handle a large number of parameters and complex structures well. However, neural network algorithms perform well in tasks such as natural language processing, computer vision, and speech recognition, so large models in these fields are usually trained based on neural network algorithms. Of course, in addition to the neural network model, it also includes some other technologies, such as autoregressive models, autoencoder models, etc., which will not be highlighted here.

What is a neural network again?

As mentioned at the beginning: a neural network is a machine learning algorithm that mimics the human nervous system.

The structure of a neural network can be compared to the data structure of a graph. In a neural network, each node (neuron) can be regarded as a node in the graph, each connection (weight) can be regarded as an edge in the graph, and the entire network can be regarded as a directed graph.

Analogous to the graph structure, the optimization of neural networks is the process of adjusting connection weights so that the entire network can better fit the training data, thereby improving the performance of the model. At the same time, the prediction process of the neural network can be regarded as a process of information transfer in the graph, and the transfer process from the input layer to the output layer is equivalent to a traversal in the graph.

Here is an example with JavaScript:

// 定义神经网络结构
const inputSize = 3;
const hiddenSize = 4;
const outputSize = 2;

// 定义神经网络参数
const weights1 = [
  [1, 2, 3, 4],
  [5, 6, 7, 8],
  [9, 10, 11, 12]
];
const bias1 = [1, 2, 3, 4];
const weights2 = [
  [1, 2],
  [3, 4],
  [5, 6],
  [7, 8]
];
const bias2 = [1, 2];

// 定义激活函数
function sigmoid(x) {
    
    
  return 1 / (1 + Math.exp(-x));
}

// 定义前馈神经网络函数
function feedForward(input) {
    
    
  // 计算第一层输出
  const hidden = [];
  for (let i = 0; i < hiddenSize; i++) {
    
    
    let sum = 0;
    for (let j = 0; j < inputSize; j++) {
    
    
      sum += input[j] * weights1[j][i];
    }
    hidden.push(sigmoid(sum + bias1[i]));
  }
  
  // 计算第二层输出
  const output = [];
  for (let i = 0; i < outputSize; i++) {
    
    
    let sum = 0;
    for (let j = 0; j < hiddenSize; j++) {
    
    
      sum += hidden[j] * weights2[j][i];
    }
    output.push(sigmoid(sum + bias2[i]));
  }
  
  return output;
}

// 使用示例
const input = [1, 2, 3];
const output = feedForward(input);
console.log(output); // 输出 [0.939, 0.985]

The output of the neural network can be interpreted as probability estimates for different classes. In this example, the output of the neural network is a vector with two elements representing estimates of the probability that the input belongs to the two classes. Therefore, this feed-forward neural network can be used for binary classification tasks.

Similarly, language generation models like ChatGPT can also be interpreted as probability estimates for different words or sequences of words. In ChatGPT, when we input a piece of text, the model will predict the probability distribution of the next word or word sequence according to the existing text context, and select the word or word sequence with the highest probability as the output. Therefore, the output results in ChatGPT can also be interpreted as probability estimates for different words or word sequences.

After this process, you can understand how ChatGPT draws the copy of the answer.

How does ChatGPT know what you are asking?

As mentioned earlier, how the answer is generated word by word, so how does ChatGPT know what you are asking? ChatGPT uses natural language processing technology and deep learning algorithms to perform semantic analysis and intent recognition on user input to better understand user intent and needs. Then, ChatGPT can be analyzed by factors such as conversation history and contextual information, ChatGPT randomly selects a word as the next word according to the predicted probability distribution, and then adds this word to the generated answer.

Due to the probabilistic method of assembling words, the answers generated by ChatGPT may have some grammatical or semantic errors. In order to improve the quality of the answer, some techniques can be used, such as using the beam search (Beam Search) method, adding penalty items of the language model (such as length penalty, repetition penalty, etc.) and so on. These techniques can effectively reduce errors in generated answers and improve the quality of answers.

Summarize

The above is the basic working principle of generative AI. It processes a large amount of text data through deep learning algorithms to learn the grammar and semantic rules of the language, and can automatically generate text that conforms to grammar and semantics. When generating text, generative AI will generate a language model based on contextual information, and then use random sampling or greedy search methods to generate text sequences.

ChatGPT and AI principles for Xiaobai