Summary of Keywords and Algorithms for Machine Learning

       With the introduction and development of data governance and intelligent assistance for digital transformation in various industries around the world, machine learning (including deep learning) is gradually penetrating into all walks of life. Therefore, it is necessary to understand the common terms, classic algorithms and application scenarios of machine learning. A summary, in fact, the purpose of the rise of machine learning is to solve various classification and regression problems of human beings. Through the intelligent design of human beings, the automation or intelligence of machines can be realized, and the machine mainly learns the inductive and comprehensive logic methods of human beings to solve problems. , but deductive logic is not yet possible. Especially some data-intensive scenarios that require fast and timely accuracy, such as face recognition, license plate equipment, pedestrian detection, safety warning, spam recognition, etc.
Several key terms of machine learning : training set, test set, feature value, supervised, unsupervised, semi-supervised, classification and regression.
Training set: The combination of data sets used to train the algorithm to form a model, generally labeled data sets.
Test set: The test verification data set used to test the model formed by training, generally the ratio of the training set to the training set is 2:8.
Feature value: All feature data in the entire data set go out to predict the attribute or attribute value.
Supervised, unsupervised, semi-supervised, classification and regression are explained in detail below.
big class of machine learning
Classification problems: target labels are categorical data
Regression problem: Target markers are continuous values
Three Types of Machine Learning Problems
Supervised learning: the training set is labeled with categories
Unsupervised Learning: Unlabeled Training Set
Semi-supervised learning: labeled training set + unlabeled training set
Deep learning: the extension and expansion of machine learning, emphasizing feature extraction and combination
Reinforcement learning: Learning from environment to behavior mapping to maximize the reward signal (reinforcement signal) function value, emphasizing feedback
Transfer learning: is the ability to generalize to new situations (different from the training set). And the ability to transfer knowledge learned elsewhere to new scenarios. Human beings have a strong transfer learning ability. After learning to ride a bicycle, it is very easy for humans to ride a motorcycle. After learning to play badminton, it is much easier for humans to learn to play tennis. The emphasis is on adaptability, finding commonality (the biggest difficulty), commonality: the driver's seat is always close to the middle of the road
Characteristics of transfer learning: small-scale data, reliability and personalization.
What are the model evaluations formed after algorithm training?
1. Accuracy or precision
2. speed
3. Robustness
4. Scalability
5. Interpretability

The difference between deep learning and classical machine learning

The classic machine learning algorithm is as follows:
1. Decision Tree (DecisionTree)
Information entropy: Uncertainty of information, measure the amount of information, the greater the uncertainty of information, the greater the entropy, the feature with the largest information gain is taken as the root node when modeling (the conditional entropy is the smallest)
Information gain or information gain (Information Gain) or mutual information: the measurement method of feature attribute acquisition. Gain(A)=Info(D)-Info_A(D)
Advantages: easy to understand, intuitive, effective for small-scale data sets
Disadvantages: It is not good to deal with continuous variables. When there are many categories, the error increases faster, and the usable scale is average.
For scikit-learn, you need to install graphviz and execute the following command to visualize the decision tree diagram.
Dot is converted into a pdf file: dot -Tpdf test.dot -o output.pdf to generate a decision tree
2. The nearest neighbor rule classification algorithm (KNN for short)
Instance-based learning, lazy learning, did not build extensive models to begin with
In order to judge the category of an unknown instance, it is necessary to classify it with reference to all known instances
1. Select parameter K
2. Calculate the distance between the unknown instance and all known instances
3. Select the last K known instances
4. The minority obeys the majority, and the classification of unknown instances is the category with the largest number of K adjacent samples
KD tree construction:
Calculate the variance of n feature values ​​respectively, and use the k-th dimension feature $n{k}$ with the largest variance as the root node. For this feature, we choose the sample corresponding to the median $n{kv}$ of the value of feature $n{k}$ as the dividing point
Distance measurement method: square root of coordinate difference, cos, correlation, Manhattan distance
Advantages: Simple and easy to understand, easy to implement, and can increase robustness through the selection of K
Disadvantages: Large space and high algorithm complexity. When samples are unbalanced and a certain type of samples dominates, new instances are easily classified into this dominant category. but it's actually wrong
Improvement: Add a weight based on the distance, such as 1/d (d is the distance)
3. Support Vector Machine (SVM)
General Framework for Machine Learning
Training set -- "Extract feature vectors --" Combine certain algorithms (classifiers: such as decision trees, KNN) --> get model results
SVM is to find two types of hyperplanes that maximize the margin
Features:
1. The complexity of the algorithm is determined by the number of support vectors, not by the dimension of the data, and it is not easy to generate overfitting
2. The model is completely dependent on the support vector, even if all non-support vectors are removed, the model is the same when repeated training
3. If the number of support vectors obtained by training an SVM is relatively small, the model is easy to be generalized
4. Artificial neural network
Multilayer feed-forward neural network
input layer hidden layer output layer
Cross-validation method (cross-validation): Divide into ten parts, take out one part at a time, find ten models, and then go to the average
Neural networks can be used to solve classification problems as well as regression problems
backpropagation algorithm
1. Process the data in the training set by iteration
2. The gap between the predicted value of the input layer and the real value after comparing the neural network
3. In the opposite direction (from output--"hidden--"input) to minimize the error to update the weight and bias of each connection
Termination condition
Weights are updated below a certain threshold
The error rate of the prediction is below a certain threshold
Reach a preset number of cycles
5. Simple linear regression
Mean, median, mode, variance and standard deviation (remind the degree of dispersion of the data, the larger the degree of dispersion, the greater the degree of dispersion)
The regression Y variable is a continuous numerical type: such as house price, number of people,
Categorical Y variables are categorical: color type, rock type, computer brand, reputation
A problem involving one independent variable and one dependent variable is called a simple linear regression problem
Containing more than two independent variables is called a multiple regression problem.
Simple linear regression model: y=B0+B1x+E
Simple linear regression equation: E(y)=B0+B1x
Estimated simple linear regression equation: y=b0+b1x
Estimation method: make the sum of squares the smallest
Multiple regression analysis
multiple independent variables
Nonlinear regression (logistic regression)
Probability: the measure of the possibility of an event happening, 0<P<1
Conditional Probability
gradient descent algorithm
Correlation and R-squared values ​​in regression: Pearson correlation coefficient
R squared value: coefficient of determination, the proportion of all the variation of the response dependent variable that can be explained by the independent variable through the regression relationship, for example, the R squared value is 0.8, which means that the regression relationship can explain 80% of the variation of the dependent variable, in other words, if we can control The independent variable remains the same, and the degree of variation of the dependent variable will be reduced by 80%.
The R square will become larger as the independent variable increases, and the R square is related to the sample size.
rer
6. Clustering
Kmeans algorithm: one of the ten classic algorithms in data mining
The algorithm receives the parameter K, and then divides the n data objects input in advance into k clusters, the similarity of the same cluster is high, and the similarity of different clusters becomes smaller
Pros: fast, simple
Disadvantages: The final result is related to the selection of the initial point, and it is easy to fall into a local optimum, and the K value needs to be known in advance
Hierarchical clustering
Initialization classifies each sample into one category, and calculates the distance between each two categories, that is, the similarity between samples
Find the two closest classes between each class and classify them into one class, which is equivalent to one less class than the total number of classes.
Learning Resources

Guess you like

Origin blog.csdn.net/hhue2007/article/details/132007791
Recommended