1. Introduction to machine learning (statistical learning)

1. Introduction to machine learning methods

Statistical learning or machine learning is a neighborhood with a wide range, a variety of content, and a wide range of applications.

  1. Object of machine learning: data with certain machine statistical regularity;
  2. The basic classification of machine learning can be divided into:
    * Supervised learning : Train the model from the labeled training data. Mainly include: classification task, regression task, sequence labeling task.
    * Unsupervised learning : Train the model from unlabeled training data. Mainly include: clustering tasks, dimensionality reduction tasks.
    * Reinforcement learning : training models from a large amount of interactive knowledge of the system and the environment.
    * There are also semi-supervised learning and active learning .
  3. According to the types of algorithms, machine learning can be divided into:
    * Traditional machine learning : machine learning methods based on mathematical models. Such as SVM, logistic regression, decision tree, etc. This type of algorithm is based on strict mathematical reasoning, has the characteristics of strong interpretability, fast running speed, and can be applied to small-scale data sets.
    * Deep learning : based on neural network learning method. Including feedforward neural network, convolutional neural network, recurrent neural network, etc. This type of algorithm is based on neural networks, has poor interpretability, and strongly depends on the size of the data set. But such algorithms are very successful in the fields of speech, vision, and natural language.

2. Basic terminology of machine learning

Suppose we have collected a batch of watermelon data, for example: (color = green; root = curled up; knocking = voiced sound), (color = black; root = slightly curled up; knocking = dull), (color = light Since; root = stiff; knocking = crisp)...... Each pair of brackets is a record of a watermelon, the definition:

  • The collection of all records is: data set.
  • Each record: an instance or sample.
  • For example: color = cyan, a single feature is a feature or attribute.
  • For example: (color = cyan; root = curled up; knocking = voiced sound), that is, each watermelon is a feature vector (feature vector).
  • The number of features of a sample: dimensionality, the watermelon has a dimensionality of 3. When the dimensionality feature is large, it is called the "curse of dimensionality".
  • The collection of all training samples is: training set.
  • The collection of all test samples is: test set.
  • The ability of the machine learning model to adapt to new samples is: generalization, that is, from special to general.
  • The problem of predicting discrete values ​​is: classification. For example: Judge whether watermelon is good or bad.
  • The problem of predicting a continuous value is: regression. For example: Judging, judging the future population based on the population data over the years, and the population size is a continuous value.

Three, the three elements of machine learning

The three elements of machine learning are: model, strategy, and algorithm.

3.1 Model

  1. Model definition understands the space. In supervised learning, the model is the conditional probability distribution or decision function to be learned.
  2. Once the representation of the solution is determined, the solution space and the scale of the solution space are determined.
  3. Regarding the learning process as a process of searching in the solution space, the search goal is to find a solution matching the training set.

3.2 Strategy

The strategy considers what kind of criteria are used to learn to define the optimization goal.

3.2.1 Loss function

For a given input data set x, the model predicts the output value y ^ \hat{y}and^And true value y ~ \tilde{y}and~May be inconsistent. Use a loss function or cost function to measure the degree of prediction error.

Common loss function :
Loss function
In order to correct the empirical risk function, this is related to two basic strategies of supervised learning: empirical risk minimization and structural analysis minimization .
Insert picture description here

3.2.2 Maximum likelihood estimation-empirical risk minimization

Maximum likelihood estimation is an example of empirical risk minimization. That is: maximum likelihood estimation = empirical risk minimization .
Insert picture description here

3.2.3 Maximum a posteriori estimation-structural risk minimization

The maximum a posteriori estimate is an example of structural risk minimization. That is: maximum posterior estimation = structural risk minimization.
Insert picture description here

3.3 Algorithm

Algorithm refers to the specific calculation method of the learning model. It is usually solved by numerical calculation methods, such as gradient descent method.

Fourth, model evaluation and selection

There are many factors that lead to over-fitting. The most common situation is that the learning ability is too strong, so that the special characteristics of the training sample are learned, while under-fitting is usually caused by the weak learning ability of the model. Among them, under-fitting is easy to overcome, such as increasing the number of training rounds in deep neural networks, and over-fitting is very troublesome. In order to prevent over-fitting during learning, the optimal model is selected, that is, a model with appropriate complexity is selected to achieve the learning goal of minimizing test errors. Two commonly used model selection methods: regularization and cross-validation .

4.1 Regularization

The typical method of model selection is regularization . Regularization is the realization of structural risk minimization optimization strategy, which adds a regularization term (penalty term) to empirical risk .

Regularization generally has the following form:
Regularization general formula
Among them, the first term is empirical risk, and the second term is regularization term, λ \lambdaλ is a coefficient that adjusts the relationship between the two.

Regularization can take different forms. For example, in the regression problem, the loss function is the average loss, and the regularization can be L 2 L_{2} of the parameter vectorL2Norm:

Insert picture description here

Regularization can also be L 1 L_{1}L1Norm:
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_41044112/article/details/108012190