1 EDITORIAL
Recently I found a good book to do machine learning project, "machine learning combat", introduced and machine learning algorithms to achieve mainstream. The actual content for efficient daily tasks, "machine learning real" Not to reveal the mathematical principles behind the machine learning algorithms from a theoretical point of view, but by "the principles outlined examples of questions + + + actual code operating results
As we all know, artificial intelligence, machine learning is a very important field of study in research, in the context of today's era of big data, data capture and extract valuable information from or pattern, to become the industry to survive, and development of decisive means, which makes this area of research analysts and mathematicians of the past for the exclusive attention of more and more people.
Getting Started with the proposed reference "machine learning combat", is divided into four parts, namely classification (supervised learning, including KNN / tree / Naive Bayes / logistic regression / svm / changing sample weights bagging and adaboosting), regression (supervised learning, linear regression, locally weighted, characterized in dimension than the number of samples long reduction factor, such as ridge regression, Lasso, etc., regression trees, this control is not very good), unsupervised learning (kmeans, apriori / fp- growth) and other tools (PCA / SVD / MAPREDUCE).
2 study reference
"Machine learning real" Chinese high-definition version, 339 with directory bookmarks, text can be copied; HD English, 382 with directory bookmarks, text can be copied;
English version of the two comparative studies. Explain in detail and with source code.
The book Baidu cloud disk download link: "Machine learning real" HD tagged PDF + download source code
3 The book of lists
Chapter 1 basic machine learning
1.1 What is Machine Learning
1.1.1 sensor data and mass
1.1.2 Machine learning is very important
1.2 Key Terms
1.3 The main task of machine learning
1.4 How to choose the appropriate algorithm
1.5 Development of machine learning application procedure
Advantage 1.6 Python language
1.6.1 executable pseudocode
1.6.2 Python more popular
1.6.3 Python language features
1.6.4 Python language shortcomings
1.7 NumPy library foundation
1.8 Summary
Chapter 2 k- nearest neighbor
2.1 k- nearest neighbor algorithm Overview
2.1.1 Preparation: Using Python import data
2.1.2 parse the data from a text file
2.1.3 How to test classifier
2.2 Example: Using k- nearest neighbor pairs to improve the effect of dating sites
2.2.1 Data preparation: parse the data from a text file
2.2.2 Data Analysis: Create a scatter plot using Matplotlib
2.2.3 Data preparation: normalized value
2.2.4 Test method: as a complete program verification classifier
2.2.5 Use arithmetic: to build a complete system available
Example 2.3: Handwriting Recognition System
2.3.1 Data preparation: to convert images to test vectors
2.3.2 Test method: use k- nearest neighbor recognize handwritten numbers
2.4 Summary
Chapter 3 Decision Tree
3.1 decision tree structure
3.1.1 Information gain
3.1.2 Data Partitioning
3.1.3 Recursive building decision trees
3.2 Matplotlib drawing annotations in Python dendrogram
3.2.1 Matplotlib comment
3.2.2 notes tree structure
3.3 Test classifier and storage
3.3.1 Test algorithms: decision tree classification execution
3.3.2 using algorithms: decision tree storage
Example 3.4: Prediction using a decision tree type of contact lenses
3.5 Summary
Chapter 4 classification method based on probability theory: Naive Bayes
4.1 classification based on Bayesian decision theory
4.2 Conditional Probability
4.3 Using conditional probability to classify
4.4 Naive Bayes for document classification
4.5 Using Python for text classification
4.5.1 Data preparation: Building vector word from the text
4.5.2 training algorithm: calculated from the word probability vector
4.5.3 Test algorithm: Change classifier based on reality
4.5.4 Data preparation: Document bag of words model
4.6 Example: Using Naive Bayesian spam filtering
4.6.1 prepare the data: text segmentation
4.6.2 Test algorithms: Naive Bayes cross-check
4.7 Example: Using Naive Bayes classifier region tend to get individual ads
4.7.1 Data collection: import RSS feeds
4.7.2 Data analysis: display area related terms
4.8 Summary
Chapter 5 Logistic Regression
5.1 Based on Logistic Regression and Sigmoid function
5.2 # # good regression coefficient based optimization method for determining
5.2.1 Method gradient ascent
5.2.2 training algorithms: using gradient ascent to find the best parameter #
5.2.3 Analysis of data: the decision to draw the boundaries
5.2.4 training algorithm: stochastic gradient rises
5.3 Example: predict mortality sick horses from hernia disease
Missing values in the data processing: Data preparation 5.3.1
5.3.2 Test algorithm: Logistic regression with classification
5.4 Summary
Chapter 6 SVM
6.1 Based on # large gap separating data
Looking for a large interval 6.2 #
6.2.1 classifier optimization problem solving
General framework 6.2.2 SVM applications
6.3 SMO efficient optimization algorithm
6.3.1 Platt's SMO algorithm
6.3.2 Application of simplified version of SMO algorithm to deal with small-scale datasets
6.4 with a complete Platt SMO algorithm acceleration optimization
6.5 Application of kernel on complex data
6.5.1 kernel function mapping the data to a high-dimensional space
6.5.2 kernel function
6.5.3 function in the use of nuclear tests
6.6 Example: handwriting recognition problems Review
6.7 Summary
Chapter 7 yuan use AdaBoost algorithm to improve the classification
performance
7.1 Based on sample data sets of multiple classifiers
7.1.1 bagging: Construction of random resampling method based on data classifiers
7.1.2 boosting
7.2 Training algorithm: based on false boosted classifier performance
7.3 Construction of single weak classifier is a decision tree based
7.4 AdaBoost algorithm to achieve complete
7.5 Test method: classification based on AdaBoost
Example 7.6: Application of AdaBoost data set in a hard
7.7 Classification unbalanced
7.7.1 Other classification performance metrics: accuracy, recall and ROC curve
7.7.2 cost function based classifier decision control
7.7.3 Data sampling approach to the problem of unbalanced
7.8 Summary
Part # numeric data regression prediction
Chapter 8 prediction numerical data: Return
8.1 linear regression to find the best fitting line #
Locally weighted linear regression 8.2
8.3 Example: to predict the age of abalone
8.4 reduction factor to "understand" the data
8.4.1 ridge regression
8.4.2 lasso
8.4.3 before the stepwise regression
8.5 weigh deviation and variance
8.6 Example: Lego Set predict price
8.6.1 Data collection: using the Google Shopping API
8.6.2 training algorithm: model
8.7 Summary
Chapter 9 regression tree
Locality complex data modeling 9.1
9.2 Continuous and discrete feature tree Construction
9.3 The CART algorithm for return
9.3.1 Building a Tree
9.3.2 to run code
9.4 tree pruning
9.4.1 Pre-pruning
After pruning 9.4.2
9.5 Model Tree
9.6 Example: Comparison of standard regression and regression tree
9.7 Using Python's Tkinter GUI library Create
9.7.1 Creating a GUI Tkinter
9.7.2 Integrated Matplotlib and Tkinter
9.8 Summary
The third part of unsupervised learning
Chapter 10 K- means clustering algorithm to use unlabeled data packets
10.1 K- means clustering algorithm
10.2 using the processing to improve the performance of the clustering
10.3-half K- means algorithm
10.4 Example: The point on the map clustering
10.4.1 Yahoo! PlaceFinder API
10.4.2 geographical coordinates cluster
10.5 Summary
Chapter 11 Apriori algorithm using correlation analysis
11.1 Correlation Analysis
11.2 Apriori principle
11.3 Apriori algorithm to find frequent use set
11.3.1 generate candidate sets
11.3.2 Structural integrity of Apriori algorithm
11.4 from centralized mining association rules frequent item
11.5 Example: polls congressional mode
11.5.1 data gathering: to build the US Congress voting records of transaction data sets
11.5.2 test algorithm: Based on Congressional voting record mining association rules
Example 11.6: similar features found in the poisonous mushroom
11.7 Summary
Chapter 12. FP-growth algorithm to efficiently find frequent item sets
12.1 FP Tree: for efficient encoded data set
12.2 Construction of FP tree
12.2.1 create a data structure of FP tree
12.2.2 build FP tree
12.3 mining frequent itemsets FP from a tree
12.3.1 extraction condition mode yl
12.3.2 create conditions FP tree
12.4 Example: find some co-occurrence word on Twitter source
12.5 Example: from mining news website clickstream
12.6 Summary
Part IV Other Tools
Chapter 13 PCA to simplify the use of data
13.1 dimensionality reduction technology
13.2 PCA
13.2.1 moving coordinate axes
13.2.2 implement the PCA in NumPy
Example 13.3: using a semiconductor manufacturing PCA dimensionality reduction
13.4 Summary
Chapter 14 SVD simplify the use of data
14.1 SVD applications
14.1.1 implicit semantic indexing
14.1.2 recommendation system
14.2 matrix factorization
14.3 using the Python implementation SVD
14.4 based collaborative filtering recommendation engine
14.4.1 similarity calculation
14.4.2 based on the similarity of the goods or user-based similarity?
14.4.3 Evaluation recommendation engine
14.5 Example: a gourmet restaurant recommendation engine
14.5.1 have not been recommended dishes
14.5.2 recommended the use of SVD improving effect
14.5.3 Construction of the challenges facing the recommendation engine
14.6 SVD-based image compression
14.7 Summary
Chapter 15 Big Data with MapReduce
15.1 MapReduce: distributed computing framework
15.2 Hadoop streaming
15.2.1 Distributed computing the mean and variance of the mapper
15.2.2 Distributed Computing mean and variance reducer
15.3 Hadoop running program on network services Amazo #
15.3.1 AWS services available on the
15.3.2 open Amazo # network services Tour
15.3.3 running Hadoop jobs on EMR
Machine learning on 15.4 MapReduce
15.5 mrjob in Python to automate MapReduce
15.5.1 mrjob seamlessly integrated with the EMR
A script analysis 15.5.2 mrjob of MapReduce
15.6 Example: Distributed SVM algorithm of Pegasos
15.6.1 Pegasos algorithm
15.6.2 training algorithms: SVM implement MapReduce version with mrjob
15.7 MapReduce do you really need?
15.8 Summary
Appendix A Python Getting Started
Appendix B Linear Algebra
Appendix C Probability review
Appendix D resources
index
Copyright Notice
Wonderful Digest