Two scenarios of search, promotion and promotion model construction and application & deep learning engineering application

The source of the original text: https://developer.aliyun.com/article/1084327?
The author's words are very good. At first glance, it is a senior algorithm that has been working in the business for many years.

Introduction

1. The three elements of the outbreak

2. Deep Learning

 

1. The three elements of the outbreak

This course will introduce how to build applications from the perspective of AI architects and the skills needed by AI architects. It will be introduced from two dimensions of deep learning and machine learning. AI engineering is in a dynamic state with variability, and many people in the industry are exploring it. The outbreak of AI is determined by the following three elements of AI:

image.png

(1) Data-based algorithm capabilities

(2) Big data capability

(3) Engineering capability

There are massive data and large-scale computing power, which promote the innovation of various algorithms. From the perspective of an AI engineering architect, how to quickly iterate the algorithm, I hope that the algorithm engineer can focus on the algorithm itself, and the engineering or AI architect, or the engineering platform can solve the braking of AI, and can achieve For a better high-availability, high-performance, and low-cost AI platform, the architect first needs to have the following capabilities:

(1) The basis of algorithms is needed, because engineering is to accelerate AI algorithms. In many cases, machine learning or deep learning algorithms are not like data processing. They are deterministic calculations and probabilistic calculations, because ultimately hope Converging to a model with good accuracy, no one defines what kind of model is the best. If you have an understanding of the algorithm, you can achieve joint optimization of the model and the system, and you are not limited to accelerating the algorithm. If there is a method on the model, it can converge on the same trend in probability. If there is a method on mathematics, so that the probability is equivalent, the optimization effect obtained will be far greater than the effect of digging on the calculation at the bottom. , so it is necessary to understand the algorithm and the principle of the algorithm, so as to be able to optimize the optimization points more skillfully;

(2) Big data capabilities, because the explosion of AI is due to the accumulation of big data, many algorithm engineers, before running the training model, must first clean up or organize the data. In order to support the algorithm, it is necessary to build a platform, which is impossible What is missing is the ability to have big data. How to do big data calculation, big data cleaning, how to integrate a lot of data together, do data preprocessing or cleaning, etc. are very important capabilities;

(3) Engineering capabilities. As the model becomes larger and deeper, there will be more and more distributed requirements for heterogeneous machines, and the execution calculation of the model will often be carried out, whether it is forward or forward and Together, they will be described as a DAG graph. How to better map to the following resources, hardware, storage IO, and computing requires strong engineering capabilities.

 

2. Deep learning

  image.png

Deep learning is divided into two scenarios. The first scenario is search recommendation advertisements, and the other scenario is generally called perception, such as image speech and natural area language processing. These two categories have the same requirements for AI architects, but they also have different characteristics.

Search promotion, generally called search advertising and recommendation, first of all, big data, generally called a big data model, because data is often semi-structured and structured data, such as clicks, logs, and many sparse features , these cases often have very large sparse features, and not every feature is valuable in the search. For example, if you are interested in a certain movie, you will have features on these movies. Others have no interest in films in this category, and have no character. The feature of sparsity is common for search, promotion and promotion;

There is a complex data processing process, because the search, promotion and promotion is to estimate and recommend clicks, which is called burying points in the service of information flow data flow, complex processing of data, and arrangement of related data together. The process will also be more complicated, because search, promotion and promotion are often dynamic and require high real-time performance, because people's interests are not perceptual, which means that when a person sees a picture, the picture is a A cat, then it is a cat, and a dog, then it is a dog. It will not change quickly, but people's interests will change. In search, promotion and promotion, real-time, thousands of people and thousands of faces, real-time Learning, incremental learning. It is a commonly needed project. It is precisely because of real-time learning, thousands of people, and incremental learning that the engineering system will be more complicated, because the model will be in the state of learning all the time, and new ones must be updated all the time. The model is pushed to the line. In this scenario, the ground choose is difficult to determine.

If it is the result of a speech recognition, it is easy to know how much deviation there is between the result and the correct value, and it can only represent itself in the recommendation. The effect of the real model on the entire business requires a lot of verification on the online system, but to verify the online system, there are many very complicated processes in the engineering system, including how to do test tubes, gray scale , How to collect the results to evaluate the update of the model will have higher requirements for the engineering system. At the same time, because of the training of super-large-scale sparse features, first of all, the data is very large, and a better distributed training framework is required. At the same time, because the model is large, There are very large sparse features, and the model of the training framework called parameter server will be used. The model itself cannot be placed on one node, especially a huge embedding, so it is necessary to put the model in this embedding layer and use the distributed ParameterServer Storage to manage, walker server obtains the model or parameters to be trained for training, so it is generally a ParameterServer architecture, and because it is very sparse, in order to improve the training acceleration ratio, a distributed asynchronous training method will be used instead of a synchronous one training method.

On the contrary, in the speech natural language processing perception class, it is often not a sparse model, but a dense one, so it requires high computing power, and this kind of model often has many very dense operators, such as the convolution operator, which The training acceleration ratio and inference optimization requirements are very high, because the final model is very deep and has high-density calculations. How to improve the landing, that is, the performance on inference, the training acceleration ratio uses many GPU clusters for training, how to improve The effect of training, this category is very important for data labeling. The current trend is to use semi-supervised learning or self-learning methods to reduce labeling, but it is still important for the quality of labeling.

Natural language processing, now more than training BASE MODEL by yourself, but it is better if you have the ability, computing power or cost to train a larger base model.

Usually, transfer learning is done in one's own vertical scene through other people's BASE MODEL. The above are the issues that current AI architects need to consider. In terms of training, there are a large number of benchmarks for rankings. For example, our own model can achieve better accuracy on the standard set than other people's models.

The competition is all about studying how to make the accuracy of the model high, so a large number of algorithm engineers will use a synchronous training framework. Because the accuracy can be more guaranteed, they will generally conduct training based on the ALLreduce framework. These two scenarios are supported by deep learning frameworks, which mainly include the following two types of frameworks

(1) tinselflo

(2) petals

In search, promotion, and promotion, the platform needs to be integrated, because the data is complex, and a better platform is needed to clean up the data. Sometimes there will be dirty data in the data, because the search, promotion and promotion platform is often more real-time. In a more dynamic situation, data will be uploaded and services will be updated continuously. Sometimes services may cause bugs and dirty data. Therefore, a more convenient big data platform is necessary for data processing. The second trend is the real-time model, and people's interests will change.

It is highly relevant to the business. Each industry and each person has a different understanding of their own business. Often in the model, the business side needs to master the skills of feature selection and model tuning, while engineers only provide tools, platforms, and Applicability enables fast model tuning.

Finally, the engineering system will be more complicated. Based on this foundation, Ali built an online machine learning platform based on PAI, from thousands of people to more real-time

  image.png

The platform includes big data engines, real-time computing and offline computing engines. There is blink in the computing platform, which integrates the applicability of big data into the flink community. In offline computing, there are ultra-large-scale offline computing The engine combines these two types of features to generate a sample library, supports large-scale, sparse training platforms, has a deep optimized training engine, and continues to do online training. During the training, there will be constant generation of delta snapshots. Model verification, final deployment, online observation, feedback, etc. will be carried out.

image.png

The above framework is needed as long as it is for search, promotion and promotion. This set of engineering systems is replicable, but for each industry, business understanding is different.

For Ali’s business is e-commerce, but the business search recommendation and advertising goals are not new retail, it may be entertainment or the industry needs to make algorithms and share experience, but there will be gaps between industries, such as live video broadcasting, But software, audiences, and target groups are different. Because they have their own operation strategies, they need to master the ability of algorithm tuning, but architects can assist in engineering. The above is also the reason for recommending solutions. Because the project in search, promotion and promotion is very complicated, the architect will connect the process of the project in series, and the intermediate algorithm and intermediate tuning method will be empowered to the user, and the user can basically be responsible for his own business.

The above picture is just to expand the picture, for example, how to store user data? How to store material data? How is behavioral data stored? How to combine it with big data engine? How to train? How to publish to the recall sorting module during training? How to do services in real time? Division and so on. Because it serves Ali's search and recommendation, Ali has a huge audience of users.

image.png

The Double 11 node has many users, many materials, commodities, merchants, and people, and the model is very large. Now search recommendations and advertisements are slowly being introduced from the original logistic regression to deep learning. Existing frameworks have encountered many bottlenecks in large-scale volumes, such as distributed training, which cannot be distributed to thousands of nodes for training, or can be distributed to thousands of nodes for training, but the efficiency of training not good. Because of this demand, enterprises are forced to continue to optimize.

It is precisely because of the demand that the deeply optimized version of Tensorflow, called PAI-Tensorflow, integrates with communication operators, and how to push down based on communication operators? Under the running framework, because there will be thousands of walkers training at the same time, the pressure on the PS side will be great. How to improve the skelebility of the PS, how to use the characteristics of spas for communication, how to optimize the running library, and optimize it so that the scale can be scaled The training to thousands of nodes can achieve a higher acceleration than Kaiyuan's version, and has achieved higher improvement. The above is the very good data that has been achieved at the Yunqi Conference last year.

  image.png

Due to real-time requirements, the model will be continuously updated. Model requirements At the same time, because the training is performed all the time, the features are not fixed, and the engine must support dynamic feature changes in order to be able to train continuously in real time, so features such as dynamic embedding will also be launched on the engine. In this field, we work closely with the Google team to actively push back the ability of large-scale sparseness and real-time dynamic INBED to the community and empower more users.

Go directly to the Alibaba Cloud machine learning platform, directly use the optimized framework, and support ultra-large-scale sparse model training.

The above content is the issues that AI architects need to consider in search, promotion and promotion. For example, they need to look at business scenarios and requirements, and they may need to consider in some places. The next chapter is about deep learning in image language and natural mathematics. The skill requirements are different.

Guess you like

Origin blog.csdn.net/qq_43592352/article/details/131687972