Talk about the milestones in machine learning in the industry

A milestone in machine learning in the industry

Taken from the official account

1. System-the economic basis determines the superstructure

In engineering systems, improving revenue is the fundamental motivation for optimizing algorithms. There has been a popular saying in the industry: "If you can add a few machines to solve the problem, don't let people optimize it." At first glance, it is counter-intuitive, but considering the context, the core idea of ​​this sentence is to control things well. General direction. Machines are cheap and manpower is expensive. In the stage of rapid business development, there are many more important things to do. Whether it can be one cycle faster than the competition is the difference between the life and death of the team. In this case, the excessive pursuit of algorithmic improvement may be ill-advised, but it is not wise.

Compared with the academic world, the search recommendation and advertising scenes on the Internet have an obvious feature, that is, the data scale is large, the training data is rich, and the cost of obtaining positive and negative feedback is low. This creates a solution that is incompatible with traditional machine learning algorithms, and usually model solutions are not so economical.

In addition, most engineering systems are designed from the business demand side or the product demand side, and algorithms are rarely regarded as the real business side. One of the common criticisms engineers make to researchers is that the algorithms developed often lack corresponding requirements. The requirements of the business side, even if the difficulty of realization is sometimes unreasonable, is usually an objective reflection of the market. Therefore, most engineering design schemes for algorithms are more like additional requirements beyond the mainstream requirements, often castration and then castration.

When using machine learning, there are several issues that are common: data quality construction-how does ABtest do, whether the traffic fluctuates greatly, how much confidence is in the experiment, whether the burying plan has a third-party test, and whether the data is unified.

2. Why is the large-scale Logistics Regression a milestone?

In the past two years, major companies have implemented a wave of Parameter Servers, which are known as hundreds of billions of feature scales. The core technology of this advertising business: Click Through Rate (CTR) . The estimation task was first proposed by Google, and the breakthrough for domestic choice was to introduce id features in Logistics Regression, which caused a huge amount of calculation. As we all know, the LR model is a linear model and requires feature crossover. Internet users, products, and content are all of a very exaggerated magnitude. After crossover, a very large feature set is often obtained.

Large-scale must first solve the problem of computing power. Although the machine learning teams of many Internet companies have a lot of data, they can only use part of the data if they can't run. Also, because of insufficient training data, feature engineering can't be done too much. They have to manually select features, which is time-consuming and laborious. If the computational power is sufficient and the sample size is increased, this problem can be easily solved.

The same is true for machine learning. Large companies may make a dozen attempts a day, but small companies can only do one or two. The cold weapon confronted the artillery, only to be crushed. The SOTA proposed by the young marshal in 14 years, 100T data, 1 billion features, half an hour iterative 100 rounds of computing power, and now there are very few companies that can achieve it.

Another aspect is online services. With such a large-scale model, how to release it online, and how to maintain the consistency of online data when updating the model, are always difficult. The model is large and there are many corresponding features, so where are these features stored? Offline features can be cached, what about real-time features, and data needs to be communicated, can it be real-time? If the model cannot be loaded into the memory by a single machine, the difficulty will be another order of magnitude.

Taken together, the large-scale LR model is a test of the team's engineering system capabilities. From another perspective, this is an industrial-grade philosophy, pursuing universality, pursuing efficiency, reducing the model's dependence on individual algorithms, and defeating small workshop-style feature engineering by stacking a large number of features, full of violent aesthetics.

3. Why do you want to engage in deep learning? Because of efficiency

The large-scale LR above seems to be a "dumb method". In recent years, the industry has invested heavily in deep learning, which is another promising path. To be honest, most of the deep learning in recommendation and search has not achieved as impressive results as the image field. But it has a fatal temptation-no or a small amount of artificial feature engineering.

This solution is not improved compared to the previous model, but it does not require feature engineering, so it can bring huge efficiency improvements. If you want to make a more general solution, in terms of business, it may have required several classmates to hum and hum for several months of feature engineering. Now the deep learning solution can be quickly developed.

Guess you like

Origin blog.csdn.net/weixin_43901214/article/details/108723949