Into machine learning

 About the author : I am a sophomore, majoring in artificial intelligence. I have learned c, c++, java, python, Mysql and other programming knowledge. Now I am committed to learning the knowledge of artificial intelligence. Thanks to CSDN for letting us meet, I also I will be committed to sharing my learning knowledge here.

Column introduction: This column, as a column of deep learning, focuses on introducing the theoretical knowledge of important frameworks such as neural network and reinforcement learning. It will also introduce some algorithms, and a small part of the code as an example. The specific code learning will be placed in the neural network learning column later. . The purpose is to let everyone understand the relationship between neural networks, reinforcement learning, deep learning, and artificial intelligence, and use this knowledge to solve some problems in life.

learning target:

(1) Master the mathematical theory behind deep learning.

(2), using convolutional neural network and capsule network to study computer vision.

(3) Implement NLP tasks using recurrent networks and attention models.

(4) Explore and understand reinforcement learning and deep learning.

(5) Understand and explore the application of deep learning in daily life.

Daily sharing: Work hard every day, not for anything else, just to have more choices in the future, choose comfortable days, and choose people you like!

Table of contents

 Discuss artificial intelligence

Overview of Machine Learning

machine learning algorithm

supervised learning

1. Linear Regression and Logistic Regression

2. Support Vector Machine

3. Decision tree

4. Naive Bayes

unsupervised learning

K-means

 reinforcement learning

Q-learning

Summarize


 Discuss artificial intelligence

The development of artificial intelligence has undergone three changes from World War II to today, with ups and downs. The first two rises were in the 20th century, and the third time is from 2012 to the present. I believe many friends have already known about Chatpt Yes, the birth of chatgpt can be said to be an epoch-making event. Some people say that this is the beginning of the fourth semi-industrial revolution, so the development of artificial intelligence is very necessary. So what is artificial intelligence, and what is the relationship between artificial intelligence and deep learning, machine learning, and neural networks?

Here I give the mind map I drew, which is also my understanding of artificial intelligence. Artificial intelligence is not an exact concept, so anything that can interact with the outside world should be called artificial intelligence. When it comes to neural interaction, Down to voice communication, etc., these can actually be understood as artificial intelligence. My understanding of artificial intelligence is a system that can interact with the environment.

Overview of Machine Learning

Machine learning is often associated with terms like big data and artificial intelligence. But, there is still a big difference between these three. Big data can be said to use machine learning, and machine learning needs big data. It can be said that the two of them are also important frameworks for artificial intelligence.

It is said that the data processed by Google in one day reaches a capacity of 20PB. This data is very scary, and this data is still increasing. According to IBM estimates, tea ungjai creates 2.5 exabytes of data every day, and the data was created two years ago.

For so much data, if only relying on humans to distinguish, it may never be able to distinguish, so machine learning was born. Machine learning, as the name suggests, is to discover the logic, patterns and laws existing in the data. Machine learning is like the human brain, enabling machines to respond by analyzing data coming back from sensors. For example, the person recognition and early warning system of the camera, the face information of the person collected by the camera is sent back to the overall system, and then after feature extraction and information comparison, it is judged whether it is a dangerous person, and an instruction is issued to the alarm system. So how to issue correct instructions requires accurate processing of information and generation of corresponding instructions.

The above mentioned machine learning, so what is deep learning? In fact, you can understand that deep learning is a subfield of machine learning and neural networks, but it is not entirely a subfield. It can be said that deep learning is to assist machine learning, and the same is true for neural networks. They complement each other, and they can stand alone.

machine learning algorithm

The term "machine learning" is just a generic way of referring to general techniques for inferring patterns from large data sets, roughly understood as the ability to make predictions about new data based on existing data. Machine learning algorithms can be roughly divided into the following categories:

(1) Supervised Learning.

(2), Unsupervised Learning (Unsupervised Learning).

(3) Reinforcement Learning (RL).

supervised learning

Supervised learning algorithms are a class of machine learning algorithms that use previously labeled data to learn features to classify similar unlabeled data. Let me give an example to understand it well:

Everyone should be familiar with blocking on mobile phones. You can block messages, videos, etc., so how do you do it? In fact, it is classified according to some tags. For example, we mark spam as 0, work mail as 1, family mail as 2, and relative mail as 3. At that time, we only need to classify according to the marks, but this classification does not require us to classify. It is the software you use to classify. Of course, this is also a very complicated matter. That is to say, a lot of data is needed for training to generate an accurate model. In this case, neural network and deep learning are developed.

1. Linear Regression and Logistic Regression

Regression Algorithms (Regression Algorithms) is a supervised algorithm that uses the characteristics of the input data to predict values. In simple terms, it is to find the parameter values ​​that are most suitable for the input data set. This may not be easy to understand. Let me give you an example. For example, if you have 100 yuan, you know the price of fruit, you need to buy fruit, and most likely you can buy more fruit. How should we allocate it? It is necessary to calculate according to the prices of various fruits to find the maximum purchase amount of each fruit, so it is easy to understand.

In the linear regression algorithm, the goal is to find the appropriate parameters for the function on the input data closest to the target value to minimize the cost function (Cost Function). The cost function is a function of calculating the error, which is used to measure the gap between the real value and the predicted value. The cost function is usually represented by the mean square error (MSE), where the square of the difference between the expected value and the predicted result is taken,

Suppose there is a residential area where houses need to be sold and you just need to buy them. You have 1,000,000 yuan and need to buy a house. You don’t know how big the house is and how many bedrooms it has. Toilet, kitchen, in which block, which area, assuming that the bedroom price is 2000, the toilet price is 1000, the kitchen price is 1500, there are 10 blocks, and the area is divided into 0~5, then its price is:

2000w_{1}+1000w_{2}+1500w_{3}+10w_{4}+5w_{5}=1000000.Of course, the result obtained here may not be particularly perfect, which requires a lot of data. For example, you use 1,000 houses to calculate the values ​​of these parameters w_{i}to achieve the optimal solution.

Initialize the vector w with some random values

cycle:

        E=0 #Initialize the cost function with 0

           The for loop traverses each individual (sample) of the training set, the target pair ( x_{i},t_{i}):

                 E+=( (\sum w_{i}*t_{i}-t_{i})^{2}) # t_{i}is the actual price of the house

          MSE=E/total number of samples #mean square error

          Based on MSE, the weight w is updated using the gradient descent method

       Know that the MSE is below the threshold

So our goal is to get the optimal w solution, that is, to obtain the optimal parameters. After the model is generated, it can be used directly. When we introduce the neural network later, we will introduce how to train in detail.

In addition to linear regression, there is another one is logistic regression. The difference between this and linear regression is the final result. The result of linear regression is, \sum x_{i}*w_{i}and the final result of logistic regression is a special Logistic function \sigma (\underset{x}{\rightarrow}*\underset{w}{\rightarrow}). The result he obtained is understandable It is a probability value, and the range is [0-1]. For example, in the World Cup, we know the position of a player's shot, and combine the data of his previous shot position to train a data set. We only need to enter the value this time. The probability of this shot can be obtained (without considering other factors); the closer to 1, the greater the possibility of occurrence.

Logistic regression is not a classification algorithm, but it can also be used as a classification algorithm. For example, 0.5 is used as the judgment value. If the result is greater than 0.5, it means that it can be kicked in, otherwise, it cannot be entered.

2. Support Vector Machine

Support Vector Machine (SVM) is a supervised machine learning algorithm, mainly used for classification. As explained earlier, it is actually to find a hyperplane to divide the data.

 Just like the picture above, red points and blue points are two different categories, our purpose is to separate them, so the plane in the middle is the segmentation plane, also called hyperplane, the concept of hyperplane is high-dimensional space For example, a hyperplane is a point in one-dimensional space, a line in two-dimensional space, and a surface in three-dimensional space. Of course, in the actual situation, the data sets on both sides will be interlaced, so the simple plane is Boone's accurate classification, and the activation function is needed, which I introduced in the computer vision column.

In addition to using activation functions, other methods can be used for some nonlinear classifications: introducing soft margins (Soft Margin) or using kernel techniques (Kernel Trick).

The working principle of the soft interval is to allow mistakes, that is, it allows some data to be misclassified, as long as it is within the prescribed ratio, it is acceptable. This is the simplest, but for some very precise data classification, there will be large error and prone to overfitting.

The nuclear technique, simply speaking, is to change the dimension. If it cannot be solved in two-dimensional space, it can be solved in three dimensions. In this way, problems that cannot be classified linearly in two dimensions may be classified in three dimensions.

3. Decision tree

Finally, a supervisory algorithm, Decision Tree, is introduced. The creation of a decision tree is based on the principle of a tree. It consists of a decision node and a leaf node. The decision node performs a test on a specific attribute, and the leaf node indicates the target attribute. value.

We borrow an iris data set for analysis:

After years of development, decision trees have been improved in two major ways: the first is Random Forest, which is an inheritance method that combines the predictions of multiple trees; the second is Gradient Boosting Machine (Gradient Boosting Machine), which creates multiple sequential decision trees, each tree trying to improve the error of the previous tree. Because of these improvements, decision tree algorithms are becoming more and more accepted for use. 

4. Naive Bayes

Naive Bayes differs from other machine learning algorithms in that most machine learning techniques attempt to estimate the probability p(Y|X) of an event Y given condition X.

We give a concept, assuming that event Y and its probability are known, and the sample is X. Naive Bayes theorem states that p(X|Y)=p(Y|X)p(X)/p(Y), where p(X|Y) represents the probability of X given Y conditions, which is naive Bayesian is called a Generative Approach for a reason. Below I give an example:

Suppose there is a cancer that only affects older people and only 2% of people under the age of 50 get it, and a test of people under the age of 50 will only be 3.9% positive. The question is: If a test is 98 percent accurate for cancer, when a 45-year-old takes the test and comes back positive, what is the probability that the person has cancer?

p(cancer|test=positive)=0.98*0.02/0.039=0.50.

This classifier is called Naive Bayes because it assumes the independence of different events to calculate their probabilities. In fact, this is very similar to the calculation method in probability theory. It's just a practical application to make.

unsupervised learning

Unsupervised learning is the second category of machine learning algorithms. It does not require the data to be pre-labeled, but instead lets the algorithm draw conclusions. In unsupervised learning, the most common is clustering, a technique that attempts to separate data into subsets.

To put it simply, it is to put data with similar characteristics in a group, using a famous saying "like flock together" .

 Deep learning also uses unsupervised learning, although not the same as clustering. In Natural Language Processing (NLP), an unsupervised (or semi-supervised, depending on the object of inquiry) algorithm is used to represent word vectors. The most commonly used method is word2vec.

Another interesting application of unsupervised learning is the Generative Model. Unlike the discriminative model, it uses a large amount of data in a specific domain (such as images or text) to train the generative model, and the model will try to generate and use for training. new data that is similar to the previous data.

K-means

K-means is a clustering algorithm that groups elements of a dataset into k distinct clusters (the origin of the k in the name).

(1) Select k random points from the feature space, called centroids (Centroids) to represent the centers of k clusters.

(2) Assign each sample of the dataset (that is, each point in the feature space) to the cluster closest to the centroid.

(3) For each cluster, recalculate the new centroid by taking the average of all points in the cluster.

(4) For the new centroid, repeat (2) and (3) until the condition is met to stop.

K-means clustering_bbtfubin's blog-CSDN blog

 reinforcement learning

Reinforcement learning is the third category in machine learning. Reinforcement learning is used a lot. As the name suggests, reinforcement learning is to give machines the ability to learn. Depending on the goal, choose an action or choice at each step. In reinforcement learning, an agent takes actions that change the state of the environment. The agent uses the new state and reward to determine the next move. For example, for example, a few years ago, "Alpha Dog" played a game with Go master Li Shidol. What the Alpha Dog here actually uses is reinforcement learning, and chooses the best way to play chess based on the predicted chess position.

Of course, the application of reinforcement learning is not limited to this, such as confrontation games, automatic driving of vehicles, stock investment, etc. Reinforcement learning is a very large system, and it can be regarded as a relatively important branch of machine learning.

Q-learning

Q-learning (Q-learning) is a time-series difference reinforcement learning algorithm with partial strategy. The reason why the Q-learning algorithm is called Q-learning is because it uses a Q-table, which is used to store the combination of all actions. For example, in a chess game, the Q-table is empty at first, but it will be included in each step. At the same time, the Q value is formed according to these combinations. The larger the Q value, the more attractive the action is, that is to say, the higher the weight, the more likely it will be the target of the next action.

Q-table initialized with arbitrary values

For each state sequence:

         Observe the initial state s

         For each step in this state sequence

                    Select a new action a using a Q-table based strategy

                    Observe the reward r and enter the new state s'

                    Update q(s,a) in the Q table using the Bellman equation

         Until the termination state of this state sequence

The Q-learning algorithm uses the Bellman Equation to update q(s, a) after each new action. Here is just a brief introduction, and will be introduced in detail later.

But, you should be able to find that the Q table in Q learning will become larger and larger as the action is, and the reading speed will become slower and slower, so you can use a neural network to replace the Q table. For example, it has played a big role in games such as Go, Dota2 and Doom.

Summarize

The content of this section is introduced here. Many people have a vague understanding of machine learning. Here I will introduce some basic knowledge in machine learning, and I will introduce how to use it in detail later. The next section is the learning of neural networks. Welcome to leave likes and favorites.

Guess you like

Origin blog.csdn.net/qq_59931372/article/details/130497404