2022 Eleventh Little Beauty Contest C-question full nanny tutorial and complete code

Hello everyone, this time I will provide you with the complete problem-solving ideas and codes for Xiaomei Contest C. All the videos and codes are written by myself. The codes and videos are marked with originality, and plagiarism must be investigated. Please recognize the original: unknown mathematician P.

The Q&A and assist content documents for this question are as follows:

2022 Little Beauty Contest C-question nanny-level code

The video idea has been released

click here to watch

Problem C: Classification of Human Activities

2.  Analysis of problems and ideas
1. Please design a set of features and an effective algorithm to classify 19 types of human behaviors from the data of these wear sensors.
The data is organized, merged, and summarized into X and Y, as shown below, which is the data I have summarized. I added 2 more variables, among which people is used to identify experimenters, and Behavior is used to identify human behavior.


2. Due to the high cost of data, we need to make the model have good generalization ability under the limited data set. We need to study and evaluate this question concretely. Please devise a feasible method to evaluate the generalization ability of your model.
Generalization ability is the predictive ability of the model to unknown data. In plain language, after the model is trained and used in the actual scene, will it lose the chain, or can achieve the same effect as the training. The essence of generalization ability is to reflect whether the model has a true description of the objective world or whether overfitting has occurred. Generally, there are two ways to evaluate the generalization ability of the model:
1. Leave out method.
For the division of the training set and the test set, it is necessary to maintain the consistency of the data distribution as much as possible, that is, to maintain the original category ratio. Generally, 8/2 to 7/3 samples are used for training, and the remaining samples are used for testing.
2. Cross-validation method
Cross-validation, as the name implies, needs to divide the data set multiple times. Compared with the one-time splitting of the data set into training set and test set that we introduced before, cross-validation is more stable. We generally use k-fold cross-validation.
When using k-fold cross-validation, we divide the entire data set into k parts, and k is usually 5 or 10.
Then use the first set as the test set, and the other sets as the training set; then use the second set as the test set, and the other sets as the training set; repeat the above steps until each piece of data has been used as a test set.
Here I use the hold-out method


3. Please study and overcome the overfitting problem so that your classification algorithm can be widely used in human action classification problems. Overfitting : the performance of the training set
is very good, but the performance of the test set is poor The accuracy of the set is very good, so we can use two models to compare and solve the problem, that is, we first use a garbage model, such as logistic regression, Bayesian classification, etc., and then find that the accuracy of the model is low, and then Use excellent models, such as XGBOOST, neural network, etc., to show that we have solved the overfitting problem by changing to a more suitable model.

Supongo que te gusta

Origin blog.csdn.net/weixin_44099072/article/details/128148827
Recomendado
Clasificación