"Machine learning real" HD Chinese tagged PDF + HD English PDF + source code

1 EDITORIAL

Recently I found a good book to do machine learning project, "machine learning combat", introduced and machine learning algorithms to achieve mainstream. The actual content for efficient daily tasks, "machine learning real" Not to reveal the mathematical principles behind the machine learning algorithms from a theoretical point of view, but by "the principles outlined examples of questions + + + actual code operating results

As we all know, artificial intelligence, machine learning is a very important field of study in research, in the context of today's era of big data, data capture and extract valuable information from or pattern, to become the industry to survive, and development of decisive means, which makes this area of ​​research analysts and mathematicians of the past for the exclusive attention of more and more people.

Getting Started with the proposed reference "machine learning combat", is divided into four parts, namely classification (supervised learning, including KNN / tree / Naive Bayes / logistic regression / svm / changing sample weights bagging and adaboosting), regression (supervised learning, linear regression, locally weighted, characterized in dimension than the number of samples long reduction factor, such as ridge regression, Lasso, etc., regression trees, this control is not very good), unsupervised learning (kmeans, apriori / fp- growth) and other tools (PCA / SVD / MAPREDUCE).

2 study reference

"Machine learning real" Chinese high-definition version, 339 with directory bookmarks, text can be copied; HD English, 382 with directory bookmarks, text can be copied;

English version of the two comparative studies. Explain in detail and with source code.

The book Baidu cloud disk download link: "Machine learning real" HD tagged PDF + download source code

 

 

3 The book of lists

Chapter 1 basic machine learning   

1.1 What is Machine Learning   

1.1.1 sensor data and mass   

1.1.2 Machine learning is very important   

1.2 Key Terms   

1.3 The main task of machine learning   

1.4 How to choose the appropriate algorithm   

1.5 Development of machine learning application procedure   

Advantage 1.6 Python language   

1.6.1 executable pseudocode   

1.6.2 Python more popular   

1.6.3 Python language features   

1.6.4 Python language shortcomings   

1.7 NumPy library foundation   

1.8 Summary   

Chapter 2 k- nearest neighbor    

2.1 k- nearest neighbor algorithm Overview   

2.1.1 Preparation: Using Python import data   

2.1.2 parse the data from a text file   

2.1.3 How to test classifier   

2.2 Example: Using k- nearest neighbor pairs to improve the effect of dating sites   

2.2.1 Data preparation: parse the data from a text file   

2.2.2 Data Analysis: Create a scatter plot using Matplotlib   

2.2.3 Data preparation: normalized value   

2.2.4 Test method: as a complete program verification classifier   

2.2.5 Use arithmetic: to build a complete system available   

Example 2.3: Handwriting Recognition System   

2.3.1 Data preparation: to convert images to test vectors   

2.3.2 Test method: use k- nearest neighbor recognize handwritten numbers   

2.4 Summary   

Chapter 3 Decision Tree    

3.1 decision tree structure   

3.1.1 Information gain   

3.1.2 Data Partitioning   

3.1.3 Recursive building decision trees   

3.2 Matplotlib drawing annotations in Python dendrogram   

3.2.1 Matplotlib comment   

3.2.2 notes tree structure   

3.3 Test classifier and storage   

3.3.1 Test algorithms: decision tree classification execution   

3.3.2 using algorithms: decision tree storage   

Example 3.4: Prediction using a decision tree type of contact lenses   

3.5 Summary   

Chapter 4 classification method based on probability theory: Naive Bayes    

4.1 classification based on Bayesian decision theory   

4.2 Conditional Probability   

4.3 Using conditional probability to classify   

4.4 Naive Bayes for document classification   

4.5 Using Python for text classification   

4.5.1 Data preparation: Building vector word from the text   

4.5.2 training algorithm: calculated from the word probability vector   

4.5.3 Test algorithm: Change classifier based on reality   

4.5.4 Data preparation: Document bag of words model   

4.6 Example: Using Naive Bayesian spam filtering   

4.6.1 prepare the data: text segmentation   

4.6.2 Test algorithms: Naive Bayes cross-check   

4.7 Example: Using Naive Bayes classifier region tend to get individual ads   

4.7.1 Data collection: import RSS feeds   

4.7.2 Data analysis: display area related terms   

4.8 Summary   

Chapter 5 Logistic Regression    

5.1 Based on Logistic Regression and Sigmoid function   

5.2 # # good regression coefficient based optimization method for determining   

5.2.1 Method gradient ascent   

5.2.2 training algorithms: using gradient ascent to find the best parameter #   

5.2.3 Analysis of data: the decision to draw the boundaries   

5.2.4 training algorithm: stochastic gradient rises   

5.3 Example: predict mortality sick horses from hernia disease   

Missing values ​​in the data processing: Data preparation 5.3.1   

5.3.2 Test algorithm: Logistic regression with classification   

5.4 Summary   

Chapter 6 SVM   

6.1 Based on # large gap separating data   

Looking for a large interval 6.2 #   

6.2.1 classifier optimization problem solving   

General framework 6.2.2 SVM applications   

6.3 SMO efficient optimization algorithm   

6.3.1 Platt's SMO algorithm   

6.3.2 Application of simplified version of SMO algorithm to deal with small-scale datasets   

6.4 with a complete Platt SMO algorithm acceleration optimization   

6.5 Application of kernel on complex data   

6.5.1 kernel function mapping the data to a high-dimensional space   

6.5.2 kernel function   

6.5.3 function in the use of nuclear tests   

6.6 Example: handwriting recognition problems Review   

6.7 Summary   

Chapter 7 yuan use AdaBoost algorithm to improve the classification 

performance    

7.1 Based on sample data sets of multiple classifiers   

7.1.1 bagging: Construction of random resampling method based on data classifiers   

7.1.2  boosting   

7.2 Training algorithm: based on false boosted classifier performance   

7.3 Construction of single weak classifier is a decision tree based   

7.4 AdaBoost algorithm to achieve complete   

7.5 Test method: classification based on AdaBoost   

Example 7.6: Application of AdaBoost data set in a hard   

7.7 Classification unbalanced   

7.7.1 Other classification performance metrics: accuracy, recall and ROC curve   

7.7.2 cost function based classifier decision control   

7.7.3 Data sampling approach to the problem of unbalanced   

7.8 Summary   

Part # numeric data regression prediction 

Chapter 8 prediction numerical data: Return    

8.1 linear regression to find the best fitting line #   

Locally weighted linear regression 8.2   

8.3 Example: to predict the age of abalone   

8.4 reduction factor to "understand" the data   

8.4.1 ridge regression   

8.4.2  lasso   

8.4.3 before the stepwise regression   

8.5 weigh deviation and variance   

8.6 Example: Lego Set predict price   

8.6.1 Data collection: using the Google Shopping API   

8.6.2 training algorithm: model   

8.7 Summary   

Chapter 9 regression tree   

Locality complex data modeling 9.1   

9.2 Continuous and discrete feature tree Construction   

9.3 The CART algorithm for return   

9.3.1 Building a Tree   

9.3.2 to run code   

9.4 tree pruning   

9.4.1 Pre-pruning   

After pruning 9.4.2   

9.5 Model Tree   

9.6 Example: Comparison of standard regression and regression tree   

9.7 Using Python's Tkinter GUI library Create   

9.7.1 Creating a GUI Tkinter   

9.7.2 Integrated Matplotlib and Tkinter   

9.8 Summary   

The third part of unsupervised learning 

Chapter 10 K- means clustering algorithm to use unlabeled data packets   

10.1 K- means clustering algorithm   

10.2 using the processing to improve the performance of the clustering   

10.3-half K- means algorithm   

10.4 Example: The point on the map clustering   

10.4.1  Yahoo! PlaceFinder API   

10.4.2 geographical coordinates cluster   

10.5 Summary   

Chapter 11 Apriori algorithm using correlation analysis   

11.1 Correlation Analysis   

11.2 Apriori principle   

11.3 Apriori algorithm to find frequent use set   

11.3.1 generate candidate sets   

11.3.2 Structural integrity of Apriori algorithm   

11.4 from centralized mining association rules frequent item   

11.5 Example: polls congressional mode   

11.5.1 data gathering: to build the US Congress voting records of transaction data sets   

11.5.2 test algorithm: Based on Congressional voting record mining association rules   

Example 11.6: similar features found in the poisonous mushroom   

11.7 Summary   

Chapter 12. FP-growth algorithm to efficiently find frequent item sets   

12.1 FP Tree: for efficient encoded data set   

12.2 Construction of FP tree   

12.2.1 create a data structure of FP tree   

12.2.2 build FP tree   

12.3 mining frequent itemsets FP from a tree   

12.3.1 extraction condition mode yl   

12.3.2 create conditions FP tree   

12.4 Example: find some co-occurrence word on Twitter source   

12.5 Example: from mining news website clickstream   

12.6 Summary   

Part IV Other Tools 

Chapter 13 PCA to simplify the use of data   

13.1 dimensionality reduction technology   

13.2  PCA   

13.2.1 moving coordinate axes   

13.2.2 implement the PCA in NumPy   

Example 13.3: using a semiconductor manufacturing PCA dimensionality reduction   

13.4 Summary   

Chapter 14 SVD simplify the use of data   

14.1 SVD applications   

14.1.1 implicit semantic indexing   

14.1.2 recommendation system   

14.2 matrix factorization   

14.3 using the Python implementation SVD   

14.4 based collaborative filtering recommendation engine   

14.4.1 similarity calculation   

14.4.2 based on the similarity of the goods or user-based similarity?   

14.4.3 Evaluation recommendation engine   

14.5 Example: a gourmet restaurant recommendation engine   

14.5.1 have not been recommended dishes   

14.5.2 recommended the use of SVD improving effect   

14.5.3 Construction of the challenges facing the recommendation engine   

14.6 SVD-based image compression   

14.7 Summary   

Chapter 15 Big Data with MapReduce   

15.1 MapReduce: distributed computing framework   

15.2 Hadoop streaming   

15.2.1 Distributed computing the mean and variance of the mapper   

15.2.2 Distributed Computing mean and variance reducer   

15.3 Hadoop running program on network services Amazo #   

15.3.1 AWS services available on the   

15.3.2 open Amazo # network services Tour   

15.3.3 running Hadoop jobs on EMR   

Machine learning on 15.4 MapReduce   

15.5 mrjob in Python to automate MapReduce   

15.5.1 mrjob seamlessly integrated with the EMR   

A script analysis 15.5.2 mrjob of MapReduce   

15.6 Example: Distributed SVM algorithm of Pegasos   

15.6.1 Pegasos algorithm   

15.6.2 training algorithms: SVM implement MapReduce version with mrjob   

15.7 MapReduce do you really need?   

15.8 Summary   

Appendix A Python Getting Started   

Appendix B Linear Algebra   

Appendix C Probability review   

Appendix D resources   

index   

Copyright Notice

Wonderful Digest

 

Guess you like

Origin www.cnblogs.com/pfm-cnblogs1/p/11780406.html