"Machine Learning with Graphics" Notes

From the book "Illustrated Deep Learning", we can learn about supervised and unsupervised, but deep neural networks are supervised or unsupervised on the basis of recognition.
For speech or these,
if the data is trained and then the memory in the unsupervised recognition will not be too large
Regression
#Various machine learning algorithms mostly focus on how to make specific functions approximate to the data set.
Basis function:
Multi-dimensional basis function adopts the method of multiplying or adding multiple one-dimensional basis functions
2. Kernel function
Gaussian kernel function is generally used to reduce the dimensionality and avoid the disaster of dimensionality by approximating the surroundings.
The deep neural network is a layer model. After comparison, it is concluded that the layer model is more flexible than the kernel model.
#Use the stochastic gradient algorithm to learn the Gaussian kernel model by the least squares method.
The simple least squares method has the weakness of overfitting to the learning process with noise , so the least squares method with restrictions is carried out. To control the number of his features.
How does he limit the range of the parameter θ here? He selects different bandwidth h and regularization parameter λ in different scenarios through the orthogonal projection matrix P and the regularization parameter λ.
Model Selection Process

Evaluate the model:
&Cross-validation method: Because in actual training, the training results are usually good for the training set (initial conditions are sensitive), but the fit for the data outside the training set Usually less satisfying. Therefore, we usually do not use all the data sets for training, but separate a part (this part does not participate in training) to test the parameters generated by the training set, and relatively objectively judge the impact of these parameters on data outside the training set. degree of compliance. This idea is called cross-validation.

For trapping: Robustness
For samples with outliers
Minimum absolute value deviation
Huber loss minimum
Robust learning in sparse learning
Chapter 6 is for evaluating differences

#Classifier
With the binary values ​​of -1 and +1,
let’s start with the simplest second category

#0/1 loss I don’t quite understand the least squares method Chapter 7
Comparison of various losses: 0/1 loss l2 loss What is proxy loss

eg Recognize multiple letters The first type of one-to-many may have too many samples in the image below
insert image description here
The second type of one-to-one may not be accurate

Vector machine classifier - for pattern recognition
#Based on the principle of maximum interval What does this mean

If it is a linearly separable classifier, then use a hard-margin SVM classifier.
If the classifier is non-linearly separable, then use a soft-margin SVM classifier. Some errors are allowed.

The SVM algorithm is a learning mechanism, which is proposed by Vapnik to improve the theoretical weakness of the traditional neural network learning method. The support vector machine network was first proposed from the optimal classification surface problem .
The algorithm transforms the actual problem into a high-dimensional feature space through nonlinear transformation, and constructs a linear discriminant function in the high-dimensional space to realize the nonlinear discriminant function in the original space. This special property can ensure that the machine has better generalization At the same time, it cleverly solves the problem of the curse of dimensionality, making its algorithm complexity independent of the sample dimension.

When the Hinge loss is negative, the loss increases linearly, and the
loss function with the upper bound - the Ramp loss function: enhances the robustness to abnormal values

Ensemble classification
Weak classifier
insert image description here
Multiple weak learners can be turned into a strong learner by averaging. A weak classifier with a smaller classification error rate em has a larger weight αm. Therefore, a weak classifier with a smaller classification error rate has a greater role in the final classifier.
This effect can be achieved:
insert image description here

Probability classification method: Probability classification has a better effect on the identification of multiple categories. Compared with the above, can the above be used in multiple categories or use one-to-one in multiple categories? Logistic regression model learning
, This is judged by the posterior probability and the maximum likelihood method.
When there are many training samples, the least squares probability classification method is used; when the training samples are relatively small, the logistic regression method is used.

Text Recognition Language Word Processing

Classification of sequence data

Guess you like

Origin blog.csdn.net/Carol_learning/article/details/98871778