Basics of machine learning - loss function and risk function

The goal of machine learning is to select the optimal model from the hypothesis space of the model, and its specific strategies include empirical risk minimization and structural risk minimization. The following briefly introduces the related concepts of loss function and risk function. Refer to Li Hang's "Statistical Learning Methods".

loss function

The loss function is used to predict the quality of a single prediction of the model, that is, the difference between the predicted value of the model f(X)and the real value Y, denoted as L(Y,f(x)). The smaller the loss function, the more accurate the model prediction is.

The loss functions commonly used in machine learning are:
(1) 0-1 loss function
Please add a picture description
(2) square loss function
Please add a picture description
(3) absolute loss function
Please add a picture description
(4) logarithmic loss function
Please add a picture description

risk function

The risk function, also known as expected loss , is the expectation of the loss function, which measures the quality of the model prediction in the average sense. The goal of machine learning is to choose the model with the least expected risk.

The expression of the risk function is:
Please add a picture description

Because P(X,Y)it is unknown, the risk function of the above formula cannot be calculated, and the machine learning uses the risk function to minimize the learning model itself needs to use the joint probability distribution, so P(X,Y)supervised learning itself is an ill-conditioned problem, and the risk function represented by the above formula is not practical. used in machine learning. This requires the use of empirical risk or structural risk .

empirical risk

The empirical risk is the average loss of the model f(X) with respect to the training dataset. Its expression is:
Please add a picture description

According to the law of large numbers, when the sample size N tends to infinity, the empirical risk tends to the expected risk. Therefore, when the training data sample is large enough, the model can be optimized and selected using the strategy of empirical risk minimization .
The empirical risk minimization (ERM) strategy believes that the model with the smallest empirical risk is the optimal model.

structural risk

When the training data samples are small, the empirical risk minimization strategy often does not work well, and overfitting occurs. So the minimization of structural risk is proposed , which is equivalent to regularization . Structural risk adds a regularization term representing model complexity to empirical risk :
Please add a picture description

Among them, J(f)is the complexity of the model, the more complex the model is, J(f)the larger it will be. Therefore, the structural risk minimization strategy requires both empirical risk and model complexity to be small.

Guess you like

Origin blog.csdn.net/qq_43673118/article/details/123353814