1 Introduction (Introduction)

1.1 Welcome

Used in the life of a machine learning algorithm:

(1) Open Google, Bing to search for what you need, precisely because they have a good learning algorithm

(2) Each time you read your e-mail spam filters that can help you filter a lot of spam

Machine learning why it is so popular:

(1) artificial intelligence, machine learning used in the field

Find the shortest path between A and B, web search, photo tagging, anti-spam

(2) involving all sectors and basic science

Database mining

Electronic Medical Records: The medical records of medical knowledge becomes

Computational Biology: a large number of gene sequence data collected by biologists, DNA sequencing, and so on, the machine is running algorithms allow us to better understand the human genome

Engineering, in all areas of engineering, we have larger and more large data sets, we tried to use learning algorithms to interpret the data

1.2 Machine learning is what?

1.2.1 Machine learning is not a widely accepted definition

  The first definition of machine learning from Arthur Samuel. He defined machine learning as a case of performing specific programming, giving the field of computer learning.

  Define another's last point, presented by Tom Mitchell, from Carnegie Mellon University, Tom definition of machine learning is defined as a good learning problems, he said, a program is considered to learn from experience E, T solve the task, achieve performance metrics P, if and only if, after the experience with E, P after evaluation, program performance when processing T has improved.

1.2.2 The main two types are what we call supervised learning and unsupervised learning

  The idea refers to the supervised learning, we will teach the computer how to complete the task, but in unsupervised learning, we intend to make it their own learning.

1.3 Supervised Learning (Supervised Learning)

Example 1: Prediction Rates

  Recently, a student collected some data rates from Portland, Oregon Research Institute. You put these data drawn, looks like this: The horizontal axis represents the area of ​​the house, the unit is square-foot vertical axis represents the prices, units thousands of dollars. Based on this set of data that, if you have a friend, he has a 750 square feet house, and now he wants to sell the house, he wanted to know how much to sell this house.

 

 You can draw a straight line in this set of data, or in other words, fitting a straight line, according to this line, we can infer, this house might sell 150,000 $ $$, of course, this is not the only algorithm. There may be better, for example, we do not have a straight line fit the data, with the quadratic equation to fit might be better. According to the graph of the quadratic equation, we can deduce from this point, this house can sell nearly $ $$ 200,000. Later we will discuss how to choose the learning algorithm, decide how to use linear or quadratic equations to fit. There are two scenarios make you a friend's house for sale more reasonable. These are good examples of learning algorithms inside. These are examples of supervised learning.

Example 2: Breast cancer Forecast: Let's say you want to guess by looking at the medical records of breast benign or not, if someone is dangerous and detect breast tumors, malignant harmful and benign tumors harm not so big

 

  The horizontal axis represents the size of the tumor, on the vertical axis I represents 0 and 1 are marked or not malignant. We had seen the cancer, if the malignancy is denoted by 1, not malignant or benign as 0.

  Now we have a friend Unfortunately Check out breast tumors. Let's say she's probably such a big tumor, then the problem of machine learning is that you can estimate the probability that the tumor is malignant or benign. Terms, this is a classification problem.

  Classification means that we try to infer discrete output values: 0 or 1 benign or malignant, and in fact in the classification problem, the output may be more than two values. For example, breast cancer may have three, so you want to predict a discrete output 0,1,2,3.0 on behalf of benign breast 1 represents a Class 1, Class 2 represents a second cancer, 3 for Class 3, but it is also classification problems.

  Now I use different symbols to represent the data. Since we have seen the size of the tumor characteristics distinguish malignant or benign, so then I can draw, I use different symbols to represent benign and malignant tumors. Or that the sample is negative and positive samples now we do not all draw X, benign tumors changed by O, malignant continue expressed in X. Predicting whether or not malignant tumors.

Tumor plurality of predicted characteristics: Example 3

  In some other machine learning problems, you may experience more than one feature. For example, we only know the size of the tumor, but also know the age corresponding to the patient. In other machine learning problems, we usually have more features, my friend studying this problem, usually these features, such as mass density, tumor cell size and shape consistency consistency and so on, there are other feature. This is what we about to learn one of the most interesting learning algorithm.

That the algorithm can process two kinds of three kinds or five kinds of characteristics, even with an infinite number of features can be handled.

 

 

 

  We listed a total of five different characteristics, three kinds of two axes and right, but in some learning problems, you want to use more than three or five kinds of features. Instead, you want to use an unlimited variety of features, so that your algorithm can use a lot of features, or clues do speculation. Then how do you handle an unlimited number of features, even how to store these features have problems, your computer's memory is certainly not enough. We'll talk later an algorithm, called support vector machine, which has a clever mathematical technique that allows an unlimited number of computer processing features.

 Supervised learning basic idea: our data set for each sample has a corresponding "right" answer, we'll make a prediction based on a sample, examples like a house and tumors do.

  Regression problem, namely by regression to launch a continuous output.

  Classification problems, the goal is the introduction of a discrete set of results.

 Quiz:

  1. Do you have a large number of the same goods, imagine that you have thousands of pieces of identical goods waiting to sell, then you want to predict the next three months to sell many pieces?

  2. Do you have many customers, then you want to write a software to check each user's account. For each account, you want to determine whether they have been stolen?

That these two problems, they belong to the classification or regression?

First, a regression problem, because you know, if I have thousands of pieces of cargo, I would see it as a real number, a continuous value. Therefore, the number of items sold, as well as a continuous value.

Question two is a classification problem, because I will predict the value, use 0 to represent the account is not stolen, denoted by 1 account has been stolen. So if we had stolen according to the account, they set to 0 or 1, and then use the algorithm guess an account is 0 or 1, because only a small number of discrete values, so I put it classified as a classification problem.

These are supervised learning content.

1.4 unsupervised learning (Unsupervised Learning) 

 

 

For supervised learning, recall data sets, each data set, oh this data has been standard called negative or positive, that is benign or malignant. So for supervised learning of each piece of data, we have clearly know that the training set corresponding to the correct answer, is benign or malignant.

Unsupervised learning without any labels or have the same label or tag is not. So we know the data set, but do not know how to deal with, nor told what each data point yes. Others do not know, it is a set of data. You find some kind of structure it from the data? For data collection, unsupervised learning will be able to determine the aggregate data for two different clusters. It's a, that's another difference between the two. Yes, unsupervised learning algorithms may put these data into two different clusters. So called clustering algorithm.

One example is the cluster application in Google News. If you've never seen it before, you can go to this URL URL news.google.com to see. Google News every day, collecting a lot, a lot of news content network. It then these news groups, composed of associated news. So Google News to do is search for a lot of news events, automatically put them clustered together. So, these news events are all the same topic, so the display together.

 1.4.1 Unsupervised Learning

Because we do not informed in advance algorithm some of the information, for example, this is the first kind of person, who is the second category of people, there is a third category, and so on. We just say, yes, this is a bunch of data. I do not know what data there. I do not know who is what type. I do not even know what the different types of people there, and what these types Yes. However, you can automatically find the structure of the data in it? That you want to automatically clustering those individuals into classes, I can not know in advance which is which. Because we do not have to respond to the data in the dataset algorithm correct answer, so this is unsupervised learning.

 1.4.2 unsupervised learning or clustering applications

  It is used to organize large-scale computer clusters. I have some friends in the large data center, where there is a large cluster of computers, what kind of machine they want to solve the easy collaborative work, if you can make the machines work together, you can make your work more efficient data center. The second application is the analysis of social networks. So your friend known information, such as you often send email, or your Facebook friends, Google + circle of friends, we can automatically given group of friends do? Each group that is where people are familiar with each other, know the owner of the group? There are market segmentation. Many companies have a large database that stores customer information. So, you can retrieve customer data set, automatically discover each category, and automatically put customers into different market segments, you can automatically and more effectively sell or sold with different market segments. This is also the unsupervised learning, because we have all the customer data, but we do not know in advance what market segments, as well as our customers what the dataset respectively. We do not know who is in the number one market segment, who is number two market, and so on. Then we must let the algorithm find it all from the data. Finally, unsupervised learning can also be used for astronomical data analysis, clustering algorithm gives these surprising, interesting, useful theory to explain how galaxies are born. These are examples of clustering, clustering unsupervised learning just one.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

q

 

Guess you like

Origin www.cnblogs.com/weststar/p/11548175.html