Logistic Regression - Classification

摘要: 本文是吴恩达 (Andrew Ng)老师《机器学习》课程,第七章《logistic回归》中第46课时《分类》的视频原文字幕。为本人在视频学习过程中记录下来并加以修正,使其更加简洁,方便阅读,以便日后查阅使用。现分享给大家。如有错误,欢迎大家批评指正,在此表示诚挚地感谢!同时希望对大家的学习能有所帮助。

In this and next few videos, I want to start to talk about classification problems where the variable y that you want to predict is a discrete value. We'll develop an algorithm called logistic regression, which is one of the most popular and most widely used learning algorithm today.

Two-class / Binary Classification:

Email: Spam / Not Spam?

Online transactions: Fraudulent(Yes / No)?

Tumor: Malignant / Benign ?

y ∈ {0, 1}

Where,

0: "Negative Class" (e.g., benign tumor)

1: "Positive Class" (e.g., malignant tumor)


Multi-class classification:

y ∈ {0, 1, 2, 3}

Here are some examples of classification problems. Earlier, we talked about emails, spam classification as an example of a classification problem. Another example would be classifying online transactions. So, if you have a website that sells stuff and if you want to know if a particular transaction is fraudulent or not, whether someone is using a stolen credit card or has stolen the user's password. There's another classification problem, and earlier we also talked about the example of classifying tumors as a cancerous malignant or as benign tumors. In all of these problems, the variable that we're trying to predict is a variable y that we can think of as taking on two values, either 0 or 1, either a spam or not spam, fraudulent or not fraudulent, malignant or benign. Another name for the class we denote with 0 is the negative class and another name for the class that we denote with 1 is the positive class. So 0 may denote the benign tumor and 1 positive class may denote a malignant tumor. The assignment of the 2 classes, you know, spam, no spam, and so on, the assignment of the classes to positive and negative, to 0 and 1 is somewhat arbitrary and it doesn't really matter. But often there is this intuition that the negative class is conveying the absence of something, like the absence of malignant tumor, whereas 1, the positive class, is conveying the presence of something that we may be looking for. But the definition of which is negative which is positive is somewhat arbitrary and it doesn't matter that much. For now, we're going to start with classification problems with just two classes: 0 and 1. Later on, we'll talk about multi-class problems as well, where the variable y may take on say, four values 0, 1, 2 and 3. This is called a multi-class classification problem, but for the next few videos, let's start with the two class or the binary classification problem, and we'll worry about the multi-class setting later.

So, how do we develop a classification algorithm? Here's an example of a training set for a classification task for classifying a tumor as malignant or benign and notice that malignancy takes on only two values 0(or no) or 1 (or yes). So, one thing we could do given this training set is to apply the algorithm that we already know, linear regression to this data set and just try to fit the straight line to the data. So, if you take this training set and fit a straight line to it, maybe you get hypothesis that looks like that. Alright, so that's my hypothesis, h_\theta (x)=\theta ^{T}x. If you want to make predictions, one thing you could try doing is then threshold the classifier outputs at 0.5. That is at the vertical axis value 0.5. And if the hypothesis outputs a value that's greater than equal to 0.5 you predict y as 1. If it's less than 0.5, you predict y equals 0. Let's see what happens when we do that. So, let's take 0.5, and so that's where the threshold is. And thus, using linear regression this way. Everything to the right of this point we will end up predicting as the positive class because the output values are greater than 0.5 on the vertical axis and everything to the left of that point we will end up predicting as a negative value. In this particular example, it looks like linear regression is actually doing something reasonable even though this is a classification test we're interested in.

But now let's try changing problem a bit. Let me extend out the horizontal axis a little bit and let's say we got one more training example way out there on the right. Notice that that additional training example this one out here it doesn't actually change anything, right? Looking at the training set, it is pretty clear what a good hypothesis it is. Everything to the right of somewhere around here we should predict as positive, and everything to the left we should probably predict as negative because from this training set it looks like all the tumors larger than a certain value around here are malignant, and all the tumors smaller than that are not malignant, at least for this training set. But once we've added that extra example out here, if you now run linear regression, you instead get a straight line to fit the data. That might maybe look like this, if you now threshold this hypothesis at 0.5, you end up with a threshold that's around here, so that everything to the right of this point you predict as positive and everything to the left of that point you predict as negative. And this seems a pretty bad thing for linear regression to have done, right? Because these are our positive examples, these are our negative examples. It's pretty clear we should really be separating the two classes somewhere around there, but somehow by adding one example  way out here to the right, this example really isn't giving us any new information. I mean, it should be no surprise to the learning algorithm that the example way out here turns out to be malignant. But somehow adding that example out there caused linear regression to change its straight line fit to the data from this magenta line out here to this blue line over here, and caused it to give us a worse hypothesis. So, applying linear regression to a classification problem usually often isn't a great idea. In the first instance, in the first example, before I added this extra training example, previously linear regression was just getting lucky and it got us a hypothesis that worked well for that particular example, but usually apply linear regression to a data set, you might get lucky but often it isn't a good idea so I wouldn't use linear regression for classification problems.

Here's one other funny thing about what would happen if we were to use linear regression for a classification problem. For classification, we know that y is either 0 or 1. But if you're using linear regression, the hypothesis can output values much larger than 1 or less than 0 even if all of the training examples have labels y equals 0 or 1, and it seems kind of strange that even though we know that the labels should be 0 or 1, it seems kind of strange if the algorithm can offer values much larger than 1 or much smaller than 0. So what we'll do in the next few videos is develop an algorithm called logistic regression which has the property that the output the predictions of logistic regression are always between 0 and 1, and doesn't become bigger than 1 or become less than 0. And by the way, logistic regression is and we will use it as a classification algorithm. It's maybe sometimes consuming that the term regression appears in this name even though logistic regression is actually a classification algorithm. But that's just the name it was given for historical reasons so don't confuse by that. Logistic regression is actually a classification algorithm that we apply to settings where the label y is discrete value 0 or 1. So hopefully you now know why if you have a classification problem using linear regression isn't a good idea. In the next video, we'll start working out the details of the logistic regression algorithm.

<end>

发布了41 篇原创文章 · 获赞 12 · 访问量 1306

猜你喜欢

转载自blog.csdn.net/edward_wang1/article/details/104522435