Artificial Intelligence Algorithms Popular Explanation Series (2): Logistic Regression

  Today, the machine learning algorithm we introduce is called logistic regression. Its English name is Logistic Regression, or LR for short.

As before, let's take an example before introducing the algorithm. Then, look at how the algorithm solves the problem in the case.

   Here, we will directly use the case used in the "K-proximity algorithm" lesson. Let me briefly describe the case: a company developed a game and got some user data. As follows:

Each graph in the figure represents a user. The horizontal axis is the age of the user, and the vertical axis is the time the user uses the phone every day. Red means the user likes the game, blue means the user doesn't like the game. There is now a new user, shown in green. The company wanted to know: Did this new user like the game?

In the K-neighbor method, we find several neighbors of the new user, and then compare which color has more people in the neighbors. According to the principle of clustering, this new user should have the same preferences as most of his neighbors, so as to judge whether he likes the game or not.

However, there is a problem with the proximity of K, that is, the amount of calculation is very large. For each new user, we need to calculate the distance from himself to everyone, then sort the distance from small to large, and finally to its K neighbors.

If there are 10,000 old users, then use the k-proximity method to predict the type of 10,000 new users. That requires a total of 10000 x 10000 = 100000000 calculations, which requires a large amount of calculation and consumes a lot of resources.

The logistic regression algorithm we introduced today has a different idea. It first finds a certain rule from old users, and then directly uses this rule to determine the type of new user.

Next, let's take a look at how logistic regression is done.

For this case, we first draw a line to divide the existing users into two sides.

This line is drawn casually. But we found that the graph above the line is mostly red, and the graph below the line is mostly blue. Therefore, we will draw a preliminary rule: "Users above the line tend to like this game, and users below the line tend not to like this game". If the new user is above the line, we can judge that he is likely to like the game too, which should be red. If a new user is below the line, he most likely doesn't like the game, it should be blue.

But, how do I know if the line is a good division? There is a criterion here, called the error rate. The so-called error rate is how many users are drawn on the wrong side by a line. The error rate is a ratio. In our case, because the total number of people is fixed, for convenience, we directly replace the error rate with the number of errors.


Take this line for example. It considers red users above and blue users below. However, the 4 blues above it are misclassified, and the four reds below it are also misclassified. So the number of people it misses is 4+4=8.

 

Let's tweak this line a bit and see if we can do it a little better. Now I turned the thread a little counterclockwise. At this time, there are still 4 reds below him that are wrongly divided, but only three blues above him are wrongly divided. In the end, the number of errors became 7, which is a little better than the number of errors 8 just now, indicating that progress has been made. That is, this line is more reasonable than the previous one.

 

Then, we can fine-tune it again and calculate the error rate again. Fine tune and recalculate. This goes on and on. In the process of fine-tuning, if the error rate becomes higher in any one time, we discard that result. If the error rate decreases, we write it down. until the error rate is minimized. We have found an optimal dividing line.

If this is the optimal line in the figure, three blues are wrongly crossed on the upper left side; 2 reds are wrongly crossed on the lower right side. 5 in total.

We use this line as a standard to predict new users. As shown below:

 

In this case, we found that the green new user is on the upper left side, so we judged that this user should be red. That said, he probably likes the game, and the company should promote it to him.

In the logistic regression algorithm, fine-tuning and calculating the error rate are done automatically. Moreover, the direction of fine-tuning is not random, it is controlled by a method called "gradient descent", which ensures that each fine-tuning is adjusted in the "correct" direction. As long as the data is given to it, it can automatically adjust until it is adjusted to the optimum, and the program stops. This is the principle of logistic regression.


In fact, we don't have to use straight lines to divide, it is also possible to use curves, and the effect may be better.

For example, if you change to such a curve, the number of errors will drop to 4, which is even lower than the optimal value of 5 using a straight line. That is, this classification works better.

The curve is also regressed. At the beginning, draw a random curve, and then slowly adjust the position of the curve, the curvature of each corner, etc., until the final look.

We call this process "training". "Training" is a commonly used term in the field of artificial intelligence. For example, AlphaGo, which defeated the human Go champion, was trained; the models used in image recognition, speech recognition, automatic driving, etc. were also trained.

The word training is borrowed from. Before the advent of artificial intelligence, the place where the word training was used the most was probably the circus. For example, if a circus lets a monkey perform on a bicycle, the monkey needs to be trained first. Monkeys can't ride bicycles at first, and they fall when they ride on them. Give it a few more rides, though, and you'll make a little bit of progress. Every time he improves a little, the trainer gives him a little reward, such as feeding him a little food and petting him a few times. If the ride is not good, such as falling, the trainer punishes it, such as yelling at it, or even hitting it. In this way, even if the monkeys do not understand human language, the monkeys can be taught to ride bicycles only through the mechanism of reward and punishment.

The same goes for algorithms in artificial intelligence, where engineers train models by rewarding and punishing computer programs.

So how do you reward computer programs? it's actually really easy. Take our logistic regression as an example, the process of the program constantly adjusting the dividing line is actually constantly adjusting the parameters expressing that line. If the adjustment of a certain parameter makes the error rate of the division lower, indicating that the adjustment is good, we will keep the parameters adjusted this time. "Reservation" is the reward for the program. If this adjustment makes the error rate higher, throw it away. "Abandonment" is a punishment for the computer. In this way, the parameters left to the end are the best parameters, and the final curve is the optimal curve.


What actually does this is something called a loss function, which is similar to the mechanism in training monkeys, which is used to judge whether the monkeys do "good" or "bad", and evaluates "good" and "bad". quantify. If the loss function is done well, we can train a better model.

Like monkeys, computers can't understand human language, but we can train a good model just by rewarding and punishing it.

Well, today we introduced the logistic regression algorithm and used it to make predictions for new users.

Finally, I leave you with a thought question. For today's case, here is a new division curve, and its error number is only 3, which is lower than that of the previous curve. Do you think this is a better division method? If the answer comes to mind, just leave it in the comments.

 

related articles:

Artificial intelligence algorithm popular explanation series (1): K-proximity method

Artificial Intelligence Algorithms Popular Explanation Series (2): Logistic Regression

Artificial intelligence algorithm popular explanation series (3): decision tree

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325027631&siteId=291194637