AI large model knowledge point combing

What is the AI ​​Big Model

AI large models refer to deep learning models with a huge amount of parameters, usually containing billions or even trillions of parameters. These models can improve predictive capabilities by learning large amounts of data, leading to important breakthroughs in natural language processing, computer vision, autonomous driving, and more.
The definition of AI large models can be classified according to the scale of parameters. According to OpenAI's classification method, AI models can be divided into the following categories:
Small models: ≤ 1 million parameters
Medium models: 1 million – 100 million parameters
Large models: 100 million – 1 billion parameters
Extremely large models: ≥ 1 billion parameters
Among them, large models and extremely large models can be regarded as AI large models. In general, the "big model" should be based on a super-large-scale model that can even be called a "hyperparameter", which requires a large number of computing resources, stronger computing power, and better algorithm optimization methods for training and optimization.

The development history of AI large model

On November 30, 2022, ChatGPT3.5 will be launched by San Francisco-based OpenAI.
In February 2023, Google launched Bard, a dialogue artificial intelligence service similar to ChatGPT, based on the dialogue programming language model (LaMDA) developed by it. But there are many limitations, the word processing only supports American English.
On March 12, 2023, OpenAI released the multimodal model GPT-4, and plans to launch the image input function.
In February 2023, Baidu also confirmed the name of the ChatGPT-like chat robot project as "Wen Xin Yi Yan", and the English name is ERNIE Bot.
In February 2023, the team of Professor Qiu Xipeng from the Natural Language Processing Laboratory of Fudan University launched MOSS, a large-scale conversational language model.
On March 14, 2023, Zhipu AI, a company transformed from Tsinghua's technological achievements, started an invitation-only internal test of ChatGLM based on the GLM-130B 100 billion base model. Inference use on consumer-grade graphics cards.
On April 7, 2023, the language model "Tongyi Qianwen" developed by Alibaba Cloud began to invite users to test the experience. At this stage, the model mainly invites corporate users to conduct experience tests. Users who have obtained the invitation code can participate in the experience through the official website. On
May 6, 2023, HKUST Xunfei released the large-scale cognitive model "Spark". Liu Qingfeng, chairman of HKUST Xunfei, said that the current Xunfei Xinghuo cognitive large model has already surpassed ChatGPT in three major capabilities: text generation, knowledge question and answer, and mathematical ability, and will catch up with ChatGPT as a whole by the end of October.
In March 2023, Anthropic, a startup co-founded by former OpenAI employees, launched Claude, a large-scale language model. It can be instructed to perform a range of tasks, including searching documents, summarizing, writing and coding, and answering questions about specific topics.
In March 2023, Huawei announced that it will soon launch a large model of Pangu.

The underlying principle of AI large model

The principle of AI large models (such as deep learning models) is based on the training of neural networks and large amounts of data. By simulating the neuron structure of the human brain, these models perform multi-layer abstraction and processing of input data, so as to realize the learning and prediction of complex tasks.
The training of the AI ​​large model is mainly divided into four steps: data preprocessing, model building, model training, and model evaluation. A more detailed introduction is as follows:

1. Data preprocessing: First, the original data needs to be cleaned, sorted and labeled in order to provide suitable input for the model. This stage may include operations such as removing noise, filling missing values, normalizing, etc.
2. Building a neural network: Next, design and build a neural network according to the task requirements. A neural network usually consists of multiple layers, each containing several neurons. Neurons are connected by weights to represent the relationship between input data and output data.
3. Forward propagation: Input the preprocessed data into the neural network, and calculate the output of each layer of neurons according to the weight. This process is called forward propagation.
4. Activation function: After each layer of the neural network, an activation function (such as ReLU, Sigmoid, or Tanh, etc.) is usually used to perform nonlinear transformation on the output to increase the expressiveness of the model.
5. Loss function: In order to measure the gap between the model prediction results and the real target, a loss function needs to be defined. The loss function calculates the prediction error and uses it as the optimization objective. Common loss functions include mean square error (MSE), cross-entropy loss (Cross-Entropy Loss), etc.
6. Optimization algorithm: According to the loss function, select an appropriate optimization algorithm (such as gradient descent, stochastic gradient descent, Adam, etc.) to update the weights and biases in the neural network to reduce the value of the loss function. This process is called backpropagation.
7. Training and verification: Repeat the above steps until the model achieves satisfactory performance on the training set. In order to prevent overfitting, it is also necessary to evaluate the generalization ability of the model on the validation set. If you find that the model performs poorly on the validation set, you can adjust the network structure, hyperparameters or training strategies, etc.
8. Deployment and use: When the model performs well on the training set and verification set, the data model can be deployed and used.

Problems Solved by AI Large Models

1. Natural language processing: AI large models, such as GPT-3 and BERT, have greatly improved the performance of natural language processing tasks, such as translation, question and answer, word segmentation, text generation and other fields. AI large models allow computers to understand and process natural language more accurately by learning massive corpus and context.
2. Computer vision: AI large models, such as ResNet and EfficientNet, have promoted the development of computer vision tasks, including object detection, image classification, semantic segmentation and other fields. The AI ​​large model enables computers to identify and analyze images more accurately by learning a large amount of image data and building a deeper and more complex neural network.
3. Face recognition: large models, such as Facenet and DeepFace, improve the accuracy and robustness of face recognition, and greatly enhance the application of face recognition technology in security, finance, medical and other fields.
4. Voice recognition: AI large models, such as Wav2Vec and Transformer, enable voice recognition technology to achieve higher accuracy, and greatly improve the application of voice recognition technology in interactive applications and smart home fields.

Advantages and disadvantages of large models

Advantages:
1. More accurate: AI large models have more parameters and can handle more complex information and deeper context, improving precision and accuracy.
2. Smarter: AI large models can simulate human thinking and learning patterns, and improve the intelligence of artificial intelligence through a large amount of training data.
3. More versatility: AI large models can adapt to different work and environments, and can adapt to various natural language, visual and sound data.
4. More efficient: The AI ​​large model greatly improves the computing efficiency through parallel computing and distributed training, and can process a large amount of data in a short time.
Disadvantages:
1. Computing resource issues: AI large models require more computing resources, such as multiple GPUs and distributed computing, etc., and the high cost hinders popularization and application.
2. Data set problem: AI large models require a large amount of labeled data in order to train and optimize the model. But the data in real scenes are usually incomplete, inconsistent and lack of annotation.
3. Interpretability issues: AI large models are usually difficult to explain the prediction results, and it is difficult to explain the basis and reasons for their judgments, which makes the use and application of large models risky and misjudgmental.
4. Environment dependence: AI large models have higher dependence on language and environment, and need to be customized and used for specific scenarios.
5. OpenAI admits that ChatGPT "occasionally writes answers that seem reasonable but are incorrect or absurd", which is common in large language models and is called artificial intelligence hallucinations. Its reward model is designed around human supervision, which can lead to over-optimization, which affects performance, i.e. Goodhart's law.

Influence

AI large models have extremely high performance and accuracy, and will bring positive impacts in many aspects, such as natural language processing, computer vision, medical diagnosis, traffic control and other fields. But at the same time, AI large models may also bring the following social impacts:
1. Economic impact: AI large models may bring huge investment, requiring high computing resources and excellent talent teams. This could further exacerbate the digital divide, lead to monopoly by giant tech companies, and adversely affect small businesses and developers. At the same time, AI large models can improve production efficiency and reduce labor costs through automation and intelligence; AI large models can help people better understand complex problems and discover new solutions and business models; 2. Employment impact: AI big
models The model can realize human-machine cooperation or automation in some fields, reducing the demand for human resources. This may have an impact on existing industries and jobs, requiring reskilling or a shift in career direction. AI large models may change the social structure, leading to the disappearance of certain occupations or the emergence of new occupations.
3. Privacy protection: The data used to train large models often contain a large amount of personal privacy data, such as medical data, bank accounts, etc., and it is particularly important to protect the security and privacy of these data. Therefore, appropriate data privacy and security protection mechanisms are required.
4. Bias problem: The decision-making process of large AI models is often very complicated, making it difficult to explain the decision-making process and prone to prediction bias. This can lead to bias and discrimination, and appropriate norms and standards need to be developed to regulate the development and application of AI.
5. Cause ethical issues: AI large models may have an impact on human values ​​and morals, causing some ethical issues. For example, when presented with an ethical dilemma on a self-driving car (such as whether a pedestrian should be allowed to pass), a large AI model might give a different answer, which could be controversial.

personal opinion

The era of AI large-scale models blooming and a hundred schools of thought contending is already a reality. Whether you want to admit it or not, the AI ​​era has arrived. Instead of worrying about gains and losses in the crisis of AI seizing employment opportunities, it is better to quickly accept this new technology, introduce AI into your work, and use AI to improve your productivity and creativity. Join if you can't beat it, no shame. There is still a glimmer of life in conformity with the times, and stubbornness and complacency can only be crushed by the torrent of the times.

おすすめ

転載: blog.csdn.net/yang1fei2/article/details/131177338