The previous basically covers the content of conventional data analysis algorithms. In real scenarios, feature engineering + model evaluation are the two most important links. The former determines the quality of the model, the latter completes the scoring and selection of the model, and classifies in the model evaluation process Problems, ranking problems, and regression problems often need to be evaluated using different indicators. Among the many evaluation indexes, most indexes can only reflect part of the performance of the model one-sidedly. If the evaluation indicators cannot be used reasonably, not only the problems of the model itself cannot be found, but also wrong conclusions will be drawn.
1. Feature Engineering
2. Model selection and evaluation
Accuracy (accuracy): is the number of samples divided by the number of all samples, generally speaking, the higher the accuracy rate, the better the classifier
Precision (precision): Represents the proportion of all positive examples that are matched, and measures the classifier’s ability to recognize positive examples
Recall rate/sensitivity (recall): It represents the proportion of all positive examples that are matched, and measures the classifier’s ability to recognize positive examples
ROC curve: The point on the ROC curve closest to the upper left corner is the best threshold with the least classification error, and the total number of false positives and false negatives is the least
to be continued. . .