This section is about Logistic Regression, which is also considered a more orthodox machine learning algorithm. What is orthodoxy? The machine learning algorithm in my concept is generally such a step:

1) For a problem, we describe it in mathematical language, and then build a model, such as a regression model or a classification model, to describe the problem;

2) The cost function of the model is established by maximum likelihood, maximum posterior probability, or minimizing classification error, which is an optimization problem. Find the solution to the optimization problem, that is, the best model parameters that fit our data;

3) Then we need to solve this cost function to find the optimal solution. This solution can be divided into many situations:

a) If an analytical solution exists for this optimization function. For example, when we find the maximum value, we generally take the derivative of the cost function and find the point where the derivative is 0, that is, the maximum or minimum value. If the cost function can be simply derived, and there is an analytical solution to the formula that is 0 after the derivation, then we can directly obtain the optimal parameters.

b) If the formula is difficult to derive, for example, there are implicit variables in the function or the variables are coupled with each other, which means they are mutually dependent. Or the formula cannot be explained after derivation, for example, the number of unknown parameters is greater than the number of known equations. At this time, we need to use an iterative algorithm to find the best solution step by step. Iteration is a magical thing. It keeps the lofty goal (that is, finding the optimal solution, such as climbing the top of the mountain) in mind, and then sets a short-term goal for yourself (that is, every step you take, you are closer to the lofty goal. A little closer), down-to-earth, with no excuses, like a snail, climbing step by step, the only belief that supports it is: as long as I climb a little higher every step, then I will definitely reach the peak of my life by taking a few steps. Enjoy the arrogance and self-forgetfulness of climbing to the top of the mountain.

Another thing to consider is that if the cost function is a convex function, then there is a global optimal solution, and there is only one mountain within a radius of 500 miles. It is destined to be the only one you are looking for. But if it is non-convex, then there will be many locally optimal solutions, and there will be endless mountains. Human vision is great and small. You don’t know which mountain is the highest. Maybe you will be tricked by fate. Innocently trapped in a local optimum, watching the sky, thinking that what you found is the best. Unexpectedly, there are mountains outside the mountains, and there are people outside the people, and the light always blooms silently in the unknown distance. But maybe fate is attached to the kind-hearted you, and it always brings you the best destination. There are also many people who do not believe in fate, those who think that man is sure to conquer the sky, and vow to find the best, otherwise they will not give up and never compromise with fate. Take the distance you can take in one breath. So sad...haha.

Uh, I don't know where to go, and I don't know if what I said is wrong, please feel free to correct me if I'm wrong. Then let's get to the point. As mentioned above, logistic regression is such a process: faced with a regression or classification problem, establish a cost function, then iteratively solve the optimal model parameters through optimization methods, and then test to verify the quality of our solved model, There is a sea of people, and the world is rolling. Have we found the most suitable one for her?

**1. Logistic Regression**

Logistic regression is a commonly used machine learning method in the industry to estimate the probability of something. I also saw it used for advertisement prediction in the classic "The Beauty of Mathematics" before, that is, according to the possibility of an advertisement being clicked by the user, place the advertisement most likely to be clicked by the user in a place where the user can see it, and then call it. He "click me!" If the user clicks, you will receive money. That's why our computers are now flooded with ads.

There are similar possibilities for a user to buy a certain product, the possibility for a patient to suffer from a certain disease, and so on. The world is random (except for artificial deterministic systems, of course, but there may also be noise or erroneous results, but the possibility of this error happening is too small, so small that it will not happen in millions of years, and it is so small that it is negligible. ), so the occurrence of everything can be expressed in terms of probability or probability (Odds). "Odds" refers to the ratio of the probability of something happening to the probability of not happening.

Logistic regression可以用来回归，也可以用来分类，主要是二分类。还记得上几节讲的支持向量机SVM吗？它就是个二分类的例如，它可以将两个不同类别的样本给分开，思想是找到最能区分它们的那个分类超平面。但当你给一个新的样本给它，它能够给你的只有一个答案，你这个样本是正类还是负类。例如你问SVM，某个女生是否喜欢你，它只会回答你喜欢或者不喜欢。这对我们来说，显得太粗鲁了，要不希望，要不绝望，这都不利于身心健康。那如果它可以告诉我，她很喜欢、有一点喜欢、不怎么喜欢或者一点都不喜欢，你想都不用想了等等，告诉你她有49%的几率喜欢你，总比直接说她不喜欢你，来得温柔。而且还提供了额外的信息，她来到你的身边你有多少希望，你得再努力多少倍，知己知彼百战百胜，哈哈。Logistic regression就是这么温柔的，它给我们提供的就是你的这个样本属于正类的可能性是多少。

还得来点数学。（更多的理解，请参阅参考文献）假设我们的样本是{**x**, y}，y是0或者1，表示正类或者负类，**x**是我们的m维的样本特征向量。那么这个样本**x**属于正类，也就是y=1的“概率”可以通过下面的逻辑函数来表示：

这里**θ**是模型参数，也就是回归系数，σ是sigmoid函数。实际上这个函数是由下面的对数几率（也就是**x**属于正类的可能性和负类的可能性的比值的对数）变换得到的：

换句话说，y也就是我们关系的变量，例如她喜不喜欢你，与多个自变量（因素）有关，例如你人品怎样、车子是两个轮的还是四个轮的、长得胜过潘安还是和犀利哥有得一拼、有千尺豪宅还是三寸茅庐等等，我们把这些因素表示为x_{1}, x_{2},…, x_{m}。那这个女的怎样考量这些因素呢？最快的方式就是把这些因素的得分都加起来，最后得到的和越大，就表示越喜欢。但每个人心里其实都有一杆称，每个人考虑的因素不同，萝卜青菜，各有所爱嘛。例如这个女生更看中你的人品，人品的权值是0.6，不看重你有没有钱，没钱了一起努力奋斗，那么有没有钱的权值是0.001等等。我们将这些对应x_{1}, x_{2},…, x_{m}的权值叫做回归系数，表达为θ_{1}, θ_{2},…, θ_{m}。他们的加权和就是你的总得分了。请选择你的心仪男生，非诚勿扰！哈哈。

所以说上面的logistic回归就是一个线性分类模型，它与线性回归的不同点在于：为了将线性回归输出的很大范围的数，例如从负无穷到正无穷，压缩到0和1之间，这样的输出值表达为“可能性”才能说服广大民众。当然了，把大值压缩到这个范围还有个很好的好处，就是可以消除特别冒尖的变量的影响（不知道理解的是否正确）。而实现这个伟大的功能其实就只需要平凡一举，也就是在输出加一个logistic函数。另外，对于二分类来说，可以简单的认为：如果样本**x**属于正类的概率大于0.5，那么就判定它是正类，否则就是负类。实际上，SVM的类概率就是样本到边界的距离，这个活实际上就让logistic regression给干了。

所以说，LogisticRegression 就是一个被logistic方程归一化后的线性回归，仅此而已。