One article to understand machine learning, big data/natural language processing/algorithms

Content from: Understanding Machine Learning

 

1. Definition of Machine Learning

Broadly speaking, machine learning is an approach that empowers machine learning so that it can do things that cannot be done by direct programming. But in a practical sense, machine learning is a way of using data, training a model, and then using the model to make predictions .

 

 

pattern recognition

Pattern Recognition = Machine Learning . The main difference between the two is that the former is a concept developed from industry, while the latter is mainly derived from computer science.

data mining

Data Mining = Machine Learning + Database . A person with data mining thinking is the key, and he must also have a deep understanding of data, so that it is possible to derive patterns from data to guide business improvement. Most of the algorithms in data mining are the optimization of machine learning algorithms in databases.

Statistical Learning

Statistical learning is approximately equal to machine learning  . Statistical learning is a highly overlapping discipline with machine learning. Because most methods in machine learning come from statistics. But to a certain extent, there is a difference between the two. The difference is that statistical learners focus on the development and optimization of statistical models, which is partial to mathematics , while machine learners focus more on being able to solve problems and partial to practice . Therefore, Machine learning researchers focus on improving the efficiency and accuracy of learning algorithms executed on computers .

computer vision

Computer Vision = Image Processing + Machine Learning . Image processing techniques are used to process images into suitable inputs into machine learning models, which are responsible for identifying relevant patterns from images.

Speech Recognition

Speech Recognition = Speech Processing + Machine Learning . Speech recognition is the combination of audio processing technology and machine learning. Speech recognition technology is generally not used alone, and is generally combined with related technologies of natural language processing.

natural language processing

Natural Language Processing = Text Processing + Machine Learning  . Natural language processing technology is mainly a field that allows machines to understand human language. In natural language processing technology, a large number of technologies related to compilation principles are used, such as lexical analysis, syntax analysis, etc. In addition, at the level of understanding, technologies such as semantic understanding and machine learning are used.

 

2. Approaches to Machine Learning

1. Regression algorithm

There are two important subclasses of regression algorithms: linear regression and logistic regression .

Linear Regression : How do I fit a straight line that best fits all my data? It is usually solved using the "least square method". The idea of ​​"least squares" is this, assuming that the straight line we fit represents the true value of the data, and the observed data represents the value with error. To minimize the effect of errors, a straight line needs to be solved to minimize the sum of squares of all errors .

In the field of computer science, there is a special discipline called " numerical computing ", which is specially used to improve the accuracy and efficiency of computers in various calculations. For example, the famous " gradient descent " and " Newton's method " are classical algorithms in numerical computing, and are also very suitable for solving the problem of solving the extreme value of a function. Gradient descent is one of the simplest and most effective methods for solving regression models.

 

Logistic regression is an algorithm that is very similar to linear regression, but, by its very nature, linear regression handles the types of problems that are not the same as logistic regression. Linear regression deals with numerical problems , that is, the final predicted result is a number, such as housing prices. Logistic regression is a classification algorithm, that is, the prediction results of logistic regression are discrete classifications , such as judging whether this email is spam, and whether users will click on this advertisement, etc.

In terms of implementation, logistic regression just adds a sigmoid function to the calculation result of linear regression, and converts the numerical result into a probability between 0 and 1 (the image of the sigmoid function is generally not intuitive, you only need to understand The larger the value, the closer the function is to 1, the smaller the value, the closer the function is to 0). Then we can make predictions based on this probability. For example, if the probability is greater than 0.5, the email is spam, or whether the tumor is malignant and so on.

 

 

The classification line drawn by the logistic regression algorithm is basically linear (there is also a logistic regression that draws a nonlinear classification line, but such a model will be very inefficient when dealing with a large amount of data), which means that when the two types are When the boundary between the two is not linear, the expressive power of logistic regression is insufficient.

The following two algorithms are the most powerful and important algorithms in the machine learning world, and both can fit nonlinear classification lines.

2. Neural network

What is the learning mechanism of neural network? Simply put, it is decomposition and integration .

 

For example, a square, broken down into four polylines goes into the next layer of visual processing. Each of the four neurons processes a polyline. Each polyline is further decomposed into two straight lines, and each straight line is decomposed into two black and white faces. As a result, a complex image becomes a lot of details entering the neurons, which are processed by the neurons and then integrated, and finally come to the conclusion that what is seen is a square. This is the mechanism of visual recognition in the brain, and it is also the mechanism of neural network work.

Let's look at the logical architecture of a simple neural network. In this network, it is divided into input layer, hidden layer, and output layer. The input layer is responsible for receiving the signal, the hidden layer is responsible for decomposing and processing the data, and the final result is integrated into the output layer . A circle in each layer represents a processing unit, which can be considered to simulate a neuron. Several processing units form a layer, and several layers form a network, which is a "neural network".

 

In a neural network, each processing unit is actually a logistic regression model. The logistic regression model receives the input of the upper layer and transmits the prediction result of the model as the output to the next layer. Through such a process, neural networks can perform very complex nonlinear classification.

The following figure will demonstrate a well-known application of neural networks in the field of image recognition. This program is called LeNet, which is a neural network built on multiple hidden layers. A variety of handwritten digits can be recognized through LeNet, and it achieves high recognition accuracy and good robustness.

Figure 10 Effect display of LeNet

The square at the bottom right shows the image of the input to the computer, and the output of the computer is shown after the word "answer" in red above the square. The three vertical image columns on the left show the outputs of the three hidden layers in the neural network. It can be seen that as the layers go deeper, the deeper the layers deal with the lower the details. For example, the basic processing of layer 3 has been It's the line details.

3. SVM (Support Vector Machine)

In a sense, the support vector machine algorithm is an enhancement of the logistic regression algorithm: by giving the logistic regression algorithm stricter optimization conditions, the support vector machine algorithm can obtain better classification boundaries than the logistic regression algorithm. But without some sort of functional technique, the SVM algorithm is at best a better linear classification technique.

However, by combining with the Gaussian "kernel", the support vector machine can express very complex classification boundaries, so as to achieve a good classification effect. " Kernel " is actually a special function, the most typical feature is that it can map a low-dimensional space to a high-dimensional space .

For example, as shown in the following figure:

 

Figure 11 Legend of support vector machine

How do we draw a circular classification boundary in a 2D plane? It can be difficult in a 2D plane, but with a "kernel" you can map 2D space to 3D space, and then use a linear plane to achieve a similar effect. That is to say, the nonlinear classification boundary divided by the two-dimensional plane can be equivalent to the linear classification boundary of the three-dimensional plane. Therefore, we can achieve the nonlinear division effect in the two-dimensional plane by simple linear division in the three-dimensional space.

 

Figure 12 Cutting in three-dimensional space

 

4. Clustering algorithm

A salient feature of the previous algorithm is that my training data contains labels, and the trained model can predict labels for other unknown data. In the following algorithm, the training data is unlabeled, and the purpose of the algorithm is to infer the labels of these data through training. There is a general term for such algorithms, namely unsupervised algorithms ( algorithms with previously labeled data are supervised algorithms ). The most typical representative of unsupervised algorithms is clustering algorithms .

Let us still take a two-dimensional data as an example, a certain data contains two features. I want to label different types of them through a clustering algorithm, how can I do that? Simply put, the clustering algorithm is to calculate the distance in the population, and divide the data into multiple groups according to the distance.

The most typical representative of the clustering algorithm is the K-Means algorithm .

5. Dimensionality reduction algorithm

The dimensionality reduction algorithm is also an unsupervised learning algorithm whose main feature is to reduce the data from high-dimensional to low-dimensional levels . Here, the dimension actually represents the size of the feature quantity of the data . For example, the house price includes four characteristics of the length, width, area and number of rooms of the house, that is, the data with the dimension of 4 dimensions. It can be seen that the length and width actually overlap with the information represented by the area, such as area = length × width. Through the dimensionality reduction algorithm, we can remove redundant information and reduce the features to two features: area and number of rooms, that is, from 4-dimensional data compression to 2-dimensional. Therefore, we reduce the data from high-dimensional to low-dimensional, which is not only conducive to representation, but also accelerates computation.

The dimension reduced in the dimensionality reduction process just mentioned belongs to the level visible to the naked eye, and compression will not bring about the loss of information (because the information is redundant). If it is invisible to the naked eye, or if there are no redundant features, the dimensionality reduction algorithm can also work, but this will bring some loss of information. However, the dimensionality reduction algorithm can mathematically prove that the information of the data is preserved to the greatest extent in the compression from the higher dimension to the lower dimension. Therefore, there are still many benefits to using dimensionality reduction algorithms.

The main role of dimensionality reduction algorithms is to compress data and improve the efficiency of other machine learning algorithms. With dimensionality reduction algorithms, data with thousands of features can be compressed into a few features. In addition, another benefit of the dimensionality reduction algorithm is the visualization of data , for example, 5-dimensional data is compressed to 2-dimensional, and then it can be visualized in a 2-dimensional plane. The main representative of dimensionality reduction algorithm is PCA algorithm (ie principal component analysis algorithm) .

6. Recommendation algorithm

The recommendation algorithm is a very popular algorithm in the industry at present , and it has been widely used in the e-commerce industry, such as Amazon , Tmall , and JD.com . The main feature of the recommendation algorithm is that it can automatically recommend to users what they are most interested in, thereby increasing the purchase rate and improving efficiency. There are two main categories of recommendation algorithms:

One is the recommendation based on the content of the item, which recommends items similar to the content purchased by the user to the user. The premise is that each item must have several tags, so that the items similar to the items purchased by the user can be found. The advantage of this recommendation is that the degree of association is larger, but since each item needs to be labeled, the workload is larger.

The other type is recommendation based on user similarity , which is to recommend items purchased by other users with the same interests as the target user to the target user. For example, Xiao A has bought items B and C in the history. Xiao D, a user similar to Xiao A, purchased item E, so he recommended item E to Xiao A.

Both types of recommendations have their own advantages and disadvantages. In general e-commerce applications, they are generally used in combination. The most famous algorithm in the recommendation algorithm is the collaborative filtering algorithm.

7. Other

In addition to the above algorithms, there are other algorithms such as Gaussian discriminant, Naive Bayes, decision tree and so on in the machine learning community. But the six algorithms listed above are the most used, the most influential, and the most comprehensive. One of the characteristics of the machine learning world is that there are many algorithms, and the development of a hundred flowers is blooming.

The following is a summary. According to whether the training data has labels or not, the above algorithms can be divided into supervised learning algorithms and unsupervised learning algorithms, but the recommended algorithms are special, neither belonging to supervised learning nor unsupervised learning, they are separate one type.

Supervised Learning Algorithms: Linear Regression, Logistic Regression, Neural Networks, SVM

Unsupervised Learning Algorithms: Clustering Algorithms, Dimensionality Reduction Algorithms

Special Algorithms: Recommendation Algorithms

In addition to these algorithms, there are some algorithms whose names appear frequently in the field of machine learning. But they are not a machine learning algorithm per se, but were born to solve a sub-problem. You can understand them as sub-algorithms of the above algorithm, used to greatly improve the training process. The representatives are: gradient descent method, mainly used in linear regression, logistic regression, neural network, recommendation algorithm; Newton method, mainly used in linear regression; BP algorithm, mainly used in neural network; SMO algorithm, mainly used in in SVM.

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325261620&siteId=291194637