Machine Learning | Lee Hang "statistical learning methods" sorting of notes (a) Introduction to statistical learning methods

This series Li Hang "statistical learning methods," the study notes finishing, the following directories:

  Introduction (a) statistical learning methods

  (B) Perceptron

  (C) k neighbors

  (D) Naive Bayes

  (E) decision tree

  (F) logistic regression model with maximum entropy

  (Vii) support vector machine

  (H) upgrade method

  (Ix) EM algorithm and its promotion

  (X) Hidden Markov Models

       (Xi) Conditional Random Fields

Introduction to the first chapter of statistical learning methods

Statistical data subject is learning the basic assumptions about the data is the same data with a certain statistical regularity.
 · Features:
Iid data; model belonging to a hypothesis space (learning range); optimal prediction at a given evaluation criteria; select the best model is implemented by an algorithm

1.2 supervised learning

Given the limited training data departure, assuming independent and identically distributed data, and assuming that the model belongs to a hypothesis space, an application has been evaluation criteria, choose an optimal model from the hypothesis space, it has been on the training data and unknown data to the test the most accurate predictions under the evaluation criteria.
* Supervised learning: classification, labeling (series forecasting) and regression
· Concept
Input space, the output space and feature space
Joint probability distribution
Hypothesis space 

1.3 three elements

a. Model
The conditional probability model is to learn the distribution (non-probabilistic model) or decision function (probability model)
 
b. Policy
Statistical learning goal is to select the best model from the hypothesis space.
Loss function to measure the desired degree of prediction is wrong, the loss of function is
Select the desired learning objectives is minimal risk model.
 
· Learning Strategies (selected to optimize the objective function):
1) ERM
Maximum likelihood estimate
2) structural risk minimization (regularization term is added complexity of the model on empirical risk, prevent over-fitting)
Maximum a posteriori Bayesian estimates MAP
 
c algorithm - to solve the optimization problem

1.4 model evaluation and selection

Training error and test error
Overfitting

1.5 Regularization and cross-validation (model selection method)

Regularization meet Occam's Razor; Bayesian estimation from the point of view, a priori probability models corresponding to the item regularization, having a larger complex models priori probability.
Cross-validation: simple, S and leaving a fold cross validation;

1.6 generalization

If the learned model is f, then the prediction error is unknown data generalization error (expected risk):
For binary classification problem, a small error model training, generalization error will be smaller?

1.7 generative model and discriminant model

Generating method (generative approach) data for learning from the joint probability distribution P (X, Y), then the conditional probability distribution is obtained P (Y | X) as a predictive model of P (Y | X) = P (X, Y) / P (X)
X given input model to produce the output Y and the Relationship. Typically there is Naive Bayes and Hidden Markov
 
Identification method (discriminate approach) learning data directly from the decision-making function or conditional probability distribution. Typically there are k-nearest neighbor, machine perception, decision trees, maximum entropy, SVM, etc.
 

Guess you like

Origin www.cnblogs.com/geo-will/p/10306809.html