Coursera Stanford Wu Enda Machine Learning Course Notes (2)

After watching the course for a week, I found that I had forgotten all about it, so I decided to make a note for review. If there is any infringement problem, please contact me, I will delete it immediately and apologize.

At the same time, any form of reprinting, including full-text reprinting and partial reprinting, is prohibited. If you want to use it, please contact me at [email protected]. If I find infringement, I have studied intellectual property law, hehe


Week 3 Logistic Regression

Classification

Common classification problems are:

(1) spam (2) tumor (3) credit card fraud

Example: classifying tumors benign or not

Figure 1 below means benign, 0 means malignant. We have eight instances (X) distributed as follows. The resulting hypothesis is the blue one (this is just an assumption, in fact linear regression is not suitable here, for example, we assume that there are several outliers, then the linear regression collapses).

At this point, we use the threshold for classification: when the x value is greater than the red point projected by the threshold, we predict y = 1; when the value is less than the threshold, we consider y = 0.



Logistic Regression Model

Logistic regression is a confusing concept (because of the word regression), but in fact it is applied to classification problems, we have: 0<=hθ(x)<=1 (in contrast, regression's hypothesis can be in any range)


The linear regression we talked about last week is type A, and the logistic regression model uses type B (the two x’s are the same, but the sizes are different because the unicorn arm can’t be controlled when the second formula is written). The g(z) in B is called the sigmoid function/logistic function.


Example: Suppose we have hypothesis: hθ(x) = g(z), where z=θ^T*X. Assume that when the tumor is benign, y=1; when the tumor is malignant, y=0. By calculation, we get: hθ(x)=0.7=P(y=1|x:θ), that is, the tumor has a 70% probability of being benign. By calculating 1-hθ(x)=0.3 =P(y=0|x:θ) , we can also get that the tumor has a 30% probability of being malignant.

There is a doubt here, the result of this classification problem should be 0 & 1 binomial, why do we need probability to explain its result?


In my course "Statistical Modeling", there is the same concept: link function (see figure below). In GLM, our explanatory variables and response variables have a two-layer relationship: for the response variable f(x), the explanatory variable x has a nonlinear relationship with its response scale; for the link function g(x), the explanatory variable x There is a linear relationship with it on the link scale. Personally, I feel that this concept is very similar to the concept mentioned in the previous paragraph of logistic regression. Their relationship should belong to the relationship between data mining and machine learning: data mining and machine learning have many repeated concepts, and even supervised learning and unsupervised learning of machine learning. There are similar theories in data mining. The problem is approached from a statistical point of view, while machine learning is more inclined to a computer point of view. But it should be noted here that we use logistic regression for hypothesis to make its range between [0, 1]; and we use link function for the model to better fit the model or deal with some problems in the model assumptions .



Decision Boundary

Here we will combine the above knowledge with a tumor example to explain what a decision boundary is.

The black and green formulas below are the logistic function we will be using, and the blue curve is its function. We used threshold=0.5 to classify benign tumors (y=1) and malignant tumors (y=0). It can be seen from the blue curve that when z>=0, y=1; when z<0, take y=0. At this time, z = θ0+θ1x1+θ2x2, that is, when -3+x1+x2>=0, y=1; similarly, when x1+x2<3, y=0. The black linear function x1+x2=3 has been marked in the coordinate diagram in the lower right corner. This linear function represents z=0 and is called the decision boundary. On the lower left side of the decision boundary, the classification is y=0, and the upper right side is y=1 (it is best not to rely on the existing data to understand, it should be understood as the left and right/upper and lower sides of the function). Of course, the value of z cannot always be linear. When faced with nonlinear problems, we can combine the knowledge of ellipse and circle learned in high school to construct "inside the circle (y=0)" and "outside the circle ( y=1)" and other concepts to complete z>0, z<=0 (I think the equals sign can be placed at will, it is not very important which side).

What needs to be realized here is that

(1) Usually the given data is not so perfect, benign and malignant tumors are just clearly divided into two groups. In future analysis, it may be more likely to be confused and blurred, pay attention to how to deal with it

(2) The number of covariate/features in reality is often far more than two (x1, x2), and now the understanding can be understood by relying on 2D and 3D coordinate diagrams, but when the dimension rises to the point where it is difficult to express with icons, how to understand Woolen cloth? Therefore, we need to establish our own knowledge system: the number of covariates is variable, but the application rules of the game will not change, and knowledge should be abstracted.

(3) The decision boundary is a property, not decided by a trading set. (quoted from Wu Enda’s original words in class) The boundary is not decided by the training set. Just imagine, if the training set changes, then the boundary also changes. This is unreasonable. The bounds are determined by the parameters determined by the training set, so that it has a certain stability.




Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324443786&siteId=291194637