The machine learning platform that data people have to know

about the author

@飞狐冲冲

Responsible for data mining and analysis-related work in a well-known central enterprise in China, and once worked as an algorithm engineer in large Internet companies such as JD.com and Meituan, and has certain algorithm development experience.

01 Why do you need a machine learning platform?

Everyone knows that big data and artificial intelligence technologies in today's society are boosting the rapid development of various fields. Major companies are using machine learning algorithms to mine the commercial value behind their businesses, create AI products, and quickly convert data into benefits.

Therefore, business, data, and algorithms have become the three important factors of AI products. Through algorithmic modeling of data, business is empowered and value is generated. Students who have a little knowledge of algorithms know that algorithm development generally includes data preparation, feature engineering, algorithm modeling, model evaluation, model tuning, model deployment, and model monitoring, as shown in the following figure:
The machine learning platform that data people have to know

It can be seen from the entire process of algorithm development that in the entire process of algorithm development and application, there are actually many things that are not very relevant to machine learning itself, but are strongly related to other engineering fields and are commonly used, such as model deployment, task monitoring, Model visualization even includes environment construction and resource scheduling. Therefore, how to help you get rid of the tedious engineering development, how to make machine learning apply quickly and provide universal capabilities, has become the meaning of the concept of machine learning platform.

02 Glossary

In order to make it easier for everyone to understand and not confuse, here is a certain explanation of the terms related to the machine learning platform.

Machine learning algorithms. Abbreviated as algorithm, it refers to the algorithm built using various programming languages, in most cases it has nothing to do with the specific business. It mainly includes statistical methods, traditional machine learning algorithms, deep learning and even certain mathematical rules. For example, unsupervised K-means clustering algorithm, supervised LR, random forest, GBDT and other algorithms, as well as DNN, RNN deep learning and other algorithms.

Machine learning model. Referred to as model, it is a collection of a series of algorithm parameters used to directly predict new data, which is strongly dependent on specific businesses and strongly related to the business. It usually needs to work with machine learning algorithms. Such as financial risk control model, recommendation model, advertising click rate model, sales forecast model, etc.

Machine learning framework. It can also be called a machine learning runtime environment. It refers to a software system that can directly provide machine learning algorithm writing, model training and model application, such as Tensorflow, MxNet, etc. These frameworks directly schedule computing resources and storage resources, and their operating mechanisms and The specific business scenario is irrelevant.

Machine learning platform. As explained above, the entire machine learning modeling process is encapsulated through platformization, allowing users to use mainstream machine learning frameworks for algorithm development and most of them provide a visual construction process. For example, Alibaba Cloud PAI, Tencent Ti-ML, etc. (will be described in detail later). Its purpose is to enable machine learning to be quickly applied in engineering and to generate value.

03 Machine learning platform functions

The machine learning platform can support one-stop algorithm services such as algorithm development, sharing, model training, deployment, and monitoring. Its general framework and functions are shown in the figure above. Its functions mainly include a large number of built-in basic algorithms, unified data management, and integration The operating environment, visual modeling, model reuse, etc. can also be used for algorithm supermarket development precipitation solutions on this basis. Here we mainly introduce visual modeling.

The machine learning platform that data people have to know

Visual modeling is different from algorithm engineers using programming languages ​​(such as Python, java, etc.) to develop algorithms. It completes the mapping of data to graphics by dragging and dropping, and guides users to intuitively operate and explore data. As shown below:

The machine learning platform that data people have to know

The machine learning platform that data people have to know

Visual modeling allows users to quickly complete the construction of machine learning, deep learning, natural language processing and other algorithm models based on the data processing and algorithm nodes inside the platform, so as to realize the functions of data association and model prediction. This reduces the reliance on professional algorithm engineers for modeling work, and makes algorithm development more intelligent and efficient.

04 Introduction to machine learning platforms in the industry

Well-known machine learning platforms in the industry include Alibaba Cloud PAI, Tencent Ti-ML, Prophet of the Fourth Normal Form, Merrill Lynch Tempo, etc.

4.1 Alibaba Cloud PAI

Alibaba Cloud PAI is currently the most widely used machine learning platform in China, and it is also recognized as one of the most powerful platforms in China. The main advantages are:

1. Multi-frame support

2. Multi-language indirect support (Python programming entry is provided by default, other languages ​​need to provide their own operating environment)

3. Alibaba Cloud is highly integrated

4. Rich API interface

The machine learning platform that data people have to know

4.2 Tencent Ti-ML

Ti-ML contains three self-products of machine learning platform, namely:

(1) Ti-ONE, a one-stop machine learning platform, provides AutoML capabilities to automatically build machine learning programs;

(2) Ti-EMS, which automatically performs resource demand reasoning and scheduling according to the customer's machine learning program; (3) Ti-Insight, according to the needs of the industry, has built-in various mainstream machine learning scenarios, and users can directly Templates to build your own machine learning applications.

Tencent launched its machine learning platform relatively late, but its function and positioning are the same as Aliyun PAI. The main advantages are:

1. Multi-frame support.

2. Multi-language indirect support.

3. Tencent Cloud is highly integrated.

4. Rich API etc.

The machine learning platform that data people have to know

4.3 Fourth Normal Form Prophet

The fourth paradigm company is an AI technology and service provider that specializes in machine learning platforms. Prophet is one of the most exposed machine learning platforms in China. International authoritative research organization IDC released the first "IDC MarketScape: China Machine Learning Development Platform Market Evaluation".

The evaluation results show that the fourth paradigm market share ranks first in China and is the market leader in machine learning platforms. The main advantages are:

1. Self-contained, usually can be easily deployed independently.

2. Domestic commercial companies specializing in machine learning can usually provide secondary development services conveniently.

3. Self-developed GDBT computing performance, processing large-scale data and high-dimensional features have obvious advantages.

4.4 Merrill Lynch tempo

Merrill Lynch was established in 1998 and has been established for a relatively long time. Although the company is not large in scale, it has already accumulated a lot in the field of data analysis.

Among them, TempoData machine learning platform is its main product, and its main advantages are:

1. The entry barrier is low, and the complexity of functions is much lower than Alibaba Cloud PAI.

2. As a professional business service company, it is convenient for secondary development.

The machine learning platform that data people have to know

In addition, there are Baidu EasyDL, Jiuzhang Yunji, etc., which will not be introduced here.

05 End

Most of the machine learning platforms were widely used internally in the early stage. With the popularity of artificial intelligence and big data technology, the maturity of products gradually increased, and finally they were marketed. The machine learning platform based on AI applications is the foundation for fast practice of business innovation. Not only does it help algorithm engineers get rid of tedious engineering development, they focus their limited energy on the iteration of their own superior algorithm strategies, and it also enables non-professional IT personnel to do Visual modeling reduces the threshold of AI development.

The above content is summarized for personal understanding. If there is something wrong, please correct me, thank you very much~~

The private land of a data person is a big family that helps the data person grow up, helping partners who are interested in data to clarify the learning direction and accurately improve their skills. Follow me and take you to explore the magical mysteries of data

1. Go back to "Data Products" and get <Interview Questions for Data Products from Big Factory>

2. Go back to "Data Center" and get <Dachang Data Center Information>

3. Go back to "Business Analysis" and get <Dachang Business Analysis Interview Questions>;

4. Go back to "make friends", join the exchange group, and get to know more data partners.

Guess you like

Origin blog.51cto.com/13526224/2607896