Machine Learning in Action (1) —— basics

Machine learning in action (1) ---- basics

 

1.  Difference between Supervised learning and Unsupervised learning.

 

2. Supervised learning or Unsupervised learning? How to choose a proper machine learning algorithm?

(1)  Consider your goal

What are you trying to get out of this?

If you’re trying to predict or forecast a target value, then you need to look into supervised learning


If you’ve chosen the supervised learning, what is your target value?

Discrete value -> classification algorithm (KNN, Decisiontrees, SVM, Naive Bayes etc.) 

Continuous value -> regression (Linear, Locally weighted linear, Ridge, Lasso etc.)

 

If you’ve chosen the unsupervised learning. And if you are trying to fit your data into some discrete groups,then you may consider the clustering algorithm. And if you also need to have some numerical estimate of how strong the fit is into each group, you probably should look into a density estimation algorithm.

……

(2)  Consider your data set

What data do you have or can you collect?

Are the features nominal or continuous ?

Are there missing values in the data set?

Are there outliers in the data set? (we should distinguish between missing values and outliers)

……

3.  Steps in developing a machine learning application.

(1)  Collect initial data set for training algorithm and testing.

(2)  Prepare the input data for our program. We need to make sure that the input data is in a useable format, that is we should convert the initial data set to proper data structure that can be handled in a programming language. Also note that for different algorithms the data may need to be converted to different formats.

(3)  Analyze the input data. Make sure that the input data are indeed valid; Find those obvious patterns or outliers; Plot the data, and if we have more than 3 features, it’s hard to plot all of them at a time, so if necessary we need to distill the features down to 2 or 3 so that we can visualize the data.

(4)  Weather there exists garbage in the data set or not. Usually we can recognize the outliers in the plot easily. (human involved)

(5)  Train the algorithm. Pick the right algorithm and feed it with cleaned data we get from the steps above. And it’ll be better if the result of this step can be stored in a format that can be used in the future. But notice that in the case of unsupervised learning there’s no training step because we don’t have a target value.

(6)  Test the algorithm. Use the test data set and evaluate the algorithm to see how well it does. In the case of supervised learning, you have some known values you can use to evaluate the algorithm. In unsupervised learning, you may have to use some other metrics to evaluate the success.

(7)  Use it to do some practical work and continuously adjust your algorithm through steps above until you are satisfied with it.


猜你喜欢

转载自blog.csdn.net/qq_39464562/article/details/80952863