`本文已参与「新人创作礼」活动，一起开启掘金创作之路`

Before discussing the logistic regression problem *(Logistic Regression) * **, let's discuss some real-life situations: determine whether an email is spam? Determine whether a transaction is a fraudulent transaction? To determine whether a document is a valid document? This kind of problem, we call it the** classification problem *(Classication Problem)* **. In classification problems, we often try to predict whether the result belongs to a certain class (correct or false).

We start our discussion with a binary classification problem, i.e. whether the problem is true or false.

We call the two classes that the *Dependent Variable * **may belong to as Negative Class and Positive Class, respectively** , *then * **the** dependent *variable* :

Among them, 0 represents the negative class, 1 represents the positive class

*We assume the **classification problem (Malignant or Benign)* ** of predicting whether a tumor is a malignant tumor . We assume the relationship between the malignant or benign tumor and the size of the tumor, and a linear regression method can be used to find a straight line that fits the data:

According to the linear regression model, we can only predict continuous values, then for classification problems, we only need to output 0 or 1, we can predict:

For the data shown above, such a linear model seems to work well for the classification task. Suppose we observe another malignant tumor of a very large size and add it to our training set as a new instance, which will have some effect on our linear model and obtain a new line.

At this time, it is not so appropriate to use 0.5 as a threshold to predict whether the tumor is benign or malignant. It can be seen that the linear regression model, because its predicted value can go beyond the range of [0, 1], is not suitable for solving such a problem.

We introduce a new model, logistic regression, whose output variable ranges moderately between 0 and 1.

The assumptions of the logistic regression model are:

Among them, some flags are explained:

The image of the function is:

Combining the logistic function and the hypothesis function yields the hypothesis for the logistic regression model:

For the model, you can have the following understanding:

The function of hø(x) is, for a given input variable, to calculate the *Estimated Probability* ** of the output variable = 1 according to the selected parameters, namely:

For example, if for a given x, hø(x)=0.7 is calculated through the determined parameters, it means that there is a 70% probability that y is a positive class, and correspondingly, y is a negative class. The probability is thirty percent (1-0.7=0.3).

The above is the logistic regression model. Next time we discuss ** *Decision Boundary * **and** Cost *Function* **.