Andrew Ng machine learning (a) - supervised learning and unsupervised learning

Before understanding supervised learning and unsupervised learning, let's chat what is machine learning (ML)?

Machine Learning:

First of all it, learning can be called a process of replication, give chestnut: We often participate in student examination questions, examination of the examination room before we may not have done, but before the exam we usually brush a lot of problems, by brush topics learned problem-solving methods, and therefore face the strange question on the test can also calculate the answer.

Machine Learning ideas are similar: we can use some of the training data (title already done), so that the machine can use them (problem-solving approach) analysis of unknown data (examination subject). Like the teacher before the exam to exam examination what we expect the same.
Here Insert Picture Description

Simple sentence: machine learning is to allow the machine to focus on learning from large amounts of data, and then get a more realistic model of the law, through the use of the model makes the machine better than ever performance.

Supervised learning:

Definition: The existing data set, know the relationship between input and output. According to this known relationship, to get an optimal training model. In other words, both supervised learning the training data characteristic (feature) another label (label), through training, so that the machine can make a connection between features and labels, in the face of not only the characteristics of the label data, tag can be judged.

According to Andrew Ng teacher video in the words Summary:
Now we recall this lesson we introduced the supervised learning. The basic idea is that our data set for each sample has a corresponding "right answer." Then make predictions based on these samples, as in the example houses and tumors do.
We have also introduced the regression problem, namely by regression to launch a continuous output, after we introduced the classification problem, the goal is the introduction of a discrete set of results

Plainly, machine learning can be understood as we teach machines how to do things.

Supervised learning classification: Regression (Regression), classification (Classification)

Regression (Regression)
return to your question is in continuous variables.

For chestnut: predict house prices

Suppose you want to predict house prices, rendering such data sets below. The horizontal axis, the size of the house is different square feet on the vertical axis, is different house prices, the unit (do $). Given the data, assuming that a person has a house, 750 square feet, he wants to sell the house, I want to know how much to sell.
Here Insert Picture Description
This time, supervised learning algorithm will be able to return to come in handy, we can draw a straight line or a second order function, etc. According to the data sets to fit the data.

Through the image, we can see straight Fitting out 150k, curve fitting out is 200k, so to continue training and learning, to find the most suitable model has been fitted to the data (prices).

Regression popular point is to the point (the training data) existing analysis model fitting an appropriate function y = f (x), where y is label data, and for a new independent variable x, by the function models get label y.

Classification (Classification)
and regression biggest difference is that, for the classification is the result of discrete output is limited.

For chestnut: the estimated nature of the tumor

Suppose someone discovered a breast tumor, there is a lump in the breast z malignant tumor is dangerous, harmful; benign tumors are harmless.

Suppose in the data set, the horizontal axis is the size of the tumor, the vertical axis is 0 or 1, and may be Y or N. In the known tumor sample, a labeled malignant, benign labeled 0. So, as the blue sample is benign, red is malignant.

Here Insert Picture Description
This time, the machine learning task is to estimate the nature of the tumor is malignant or benign.

The classification came in handy in this case is to model human input training sample various data (here is the size of the tumor, of course, real life will use more data, such as age, etc.), resulting in "input a person's data to determine "the result, the result must be discrete, only" if suffering from cancer yes "or" no. "

So it simply is classified, through analysis of the input feature vector, for a new vector to obtain its label.

Unsupervised Learning:

Definition: We do not know the relationship between the data centralized data, characteristics, but to get the relationship between the clustering or data based on certain models.

So to speak, than supervised learning, unsupervised learning is more like a self-study, let the machine learn to do things, there is no label (label) is.

Just take the example used above, the interpretation of machine learning to better understand the difference between the two:

For the usual exam, the equivalent of supervised learning we have done a lot of problems all know it's the standard answer, so in the learning process, we can control the answers, to find a way to analyze the problem, the next time there is no answer in the face when the problem often can be accurately resolved. Without supervised learning, we do not know any of the answers, do not know they did the right thing, but do question the process, if not know the answer, we can roughly separate language, mathematics, English these topics, because these inherent problems or have some connection.

As shown below, in unsupervised learning, we are only given a set of data, our goal is to find that special group of data structure. For example, we use unsupervised learning algorithm will this set of data is divided into two distinct clusters ,, such an algorithm called clustering algorithm.
Here Insert Picture Description

Life Application:

1.Google News According to different content structure divided into different label finance, entertainment, sports, and this is no clustering supervised learning.

2. given gene are classified according to the crowd. FIG DNA data is for a different set of people we measured their degree of expression of the DNA for a particular gene. Then the measurement results of the clustering algorithm can be broken up into different types. This is an unsupervised learning, because we are just given some data, but do not know which is the first type of person, which is the second type of person, and so on.
Here Insert Picture Description

Published 80 original articles · won praise 140 · views 640 000 +

Guess you like

Origin blog.csdn.net/linjpg/article/details/103657763