Machine Learning Essay One

1. Positive semi-definite matrix

Positive semi-definite and positive definite matrices agree to use a positive semi-definite matrix as an example:
First, a positive semi-definite matrix is ​​defined as: write picture description here
where X is a vector and M is a transformation matrix.
Let’s look at this problem from a different way of thinking. In matrix transformation, MX represents the transformation of the vector X, we Suppose the transformed vector is Y, denoted as Y=MX. So the semi-positive definite matrix can be written as: write picture description here
Is this very familiar? It is the inner product of two vectors. At the same time, we also have the formula:
write picture description here
||X||, ||Y|| represent the lengths of the vectors X and Y, and \theta is the angle between them. So the semi-positive definite matrix means cos(θ) >= 0, do you understand?

The intuition of positive definite and semi-positive definite matrices means that the angle between a vector after its change and itself is less than or equal to 90 degrees.

Baidu Encyclopedia explains: Semi-positive definite matrix is ​​a generalization of positive definite matrix. A real symmetric matrix A is called positive semi-definite, if the quadratic type X'AX is positive semi-definite, that is, for any real column vector X that is not 0, there is X'AX≥0.

2. The role of training set, validation set and test set in machine learning

Usually, when training a supervised machine learning model, the data is divided into training set and validation set test set, and the division ratio is generally 0.6:0.2:0.2. The purpose of dividing the original data into three sets is to select the model with the best effect (which can be understood as the accuracy rate) and the best generalization ability.

Training set

The function is to fit the model and train the classification model by setting the parameters of the classifier. When combined with the validation set, different values ​​of the same parameter will be selected to fit multiple classifiers.

Validation set (Cross ValidaDon set)

The function is to use each model to predict the validation set data and record the model accuracy in order to find the model with the best effect after training multiple models through the training set. The parameters corresponding to the model with the best effect are selected, that is, used to adjust the model parameters. Such as the parameter c and kernel function in svn.

Test set

After the optimal model is obtained from the training set and the validation set, the model prediction is performed using the test set. It is used to measure the performance and classification ability of the optimal model. That is, the test set can be regarded as a data set that never existed. After the model parameters have been determined, the test set can be used to evaluate the model performance.

The division of the original data into three data sets is also to prevent the model from overfitting. When using all the original data to train the model, the result is likely to be that the model fits the original data to the greatest extent, that is, the model exists to fit all the original data. When new samples appear, and then use the model to make predictions, the effect may not be as good as the model trained with only a part of the data.

3. Overfitting

The metaphor of the two images of Zhihu God:

  1. I don't know if the coaches taught you a serial trick when you were learning to drive and park sideways: similar to when the XX frame of the car window is full when the XX rod is cut, and when the XX is cut, it is back to normal, etc. This The formula can successfully let you pass the subject 2, but if you change the car or the venue, you will find that it is useless... We say that this is just overfit a certain car and a certain venue (training data), in the new test set (new car new field) with a generalization performance of 0.
  2. It's like you want to learn to chase girls. First, ask your cousin what she likes. The cousin said that she likes clean and handsome boys. She also said that she likes Jay Chou, hot pot, and pickled fish. A total of 100 rules. You study according to the requirements, and finally meet all the requirements of your cousin, 0 Error, the training is completed, and you are super confident and ready to go out and try to chase a girl. But I changed a girl and found that what I learned did not seem to be useful as I imagined. The second girl as long as you are clean and handsome. She doesn't care about the next 98 items, she even hates eating hot pot, then the last 98 items will only increase the error. This is overfitting. How to prevent overfitting? Should use cross validation, cross comparison. The explanation is, what you learn from your cousin, test it on your cousin. Learn it from your cousin, test it on your second sister. Cross back and forth with different test and training subjects. In this way, the rules will not be overfitted. In the comments, a small partner mentioned that adding Regularization can solve overfit, and I will also say it vividly here. Still learning to chase girls. But I also have the dignity of a man! Bottom line! What can't the girl say is what it is! My bottom line today is that you can't infinitely increase the number of rules to learn! Women can't get used to it! So you can introduce Lasso to penalize the number of rules. In layman's terms, girl, if you ask me to learn three rules, I will endure it, and let me learn one hundred rules, uncle will quit. This Regularization can have different forms, Lasso is one. Therefore, regularization can be introduced to increase information to help find the optimal solution.

4. What is the difference between 0 norm, 1 norm and 2 norm

The definitions of commonly used vector norm and matrix norm are listed below.

Vector norm
1-norm:
write picture description here
that is, the sum of the absolute values ​​of the vector elements, matlab calls the function norm(x, 1).
2-norm:
write picture description here
Euclid norm (Euclidean norm, commonly used to calculate the length of a vector), that is, the square sum of the absolute values ​​of the vector elements and then the square root, matlab calls the function norm(x, 2).
∞-norm:
write picture description here
that is, the maximum value of the absolute values ​​of all vector elements, matlab calls the function norm(x, inf).
-∞-norm:
write picture description here
that is, the minimum value among the absolute values ​​of all vector elements, matlab calls the function norm(x, -inf).
p-norm:
write picture description here
that is, the p-th power of the absolute value of the vector elements and the 1/p-th power, matlab calls the function norm(x, p).
Matrix norm
1-norm:
write picture description here
column sum norm, that is, the maximum value of the sum of the absolute values ​​of all matrix column vectors, matlab calls the function norm(A, 1).
2-norm:
write picture description here
Spectral norm, which is the square root of the largest eigenvalue of the A'A matrix. matlab calls the function norm(x, 2).
∞-norm:
write picture description here
row sum norm, that is, the maximum value of the sum of the absolute values ​​of all matrix row vectors, matlab calls the function norm(A, inf).
F-norm:
write picture description here
Frobenius norm, that is, the squared sum of the absolute values ​​of the matrix elements and then squared, matlab calls the function norm(A, 'fro').
Kernel norm:
write picture description here
the sum of singular values.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324532189&siteId=291194637