1 Introduction to Kaggle
Classification of competition questions one:
Featured: For business or scientific research problems, the bonus is generally more generous
Recruitment: The reward for the competition is yi
Research: scientific research and academic competitions generally require strong field and professional knowledge
Playground: Provides some simple tasks for getting familiar with the platform and competition
Getting Started: Provides some simple tasks to familiarize yourself with the platform and the game
In Class: For classroom project assignments or exams
Classification method two:
Online submission and offline submission
Classification method three:
Data classification: data mining, image, speech, natural language
2 General procedure of the competition
1) EDA data analysis
Look at what the data looks like, think about how to solve the problem from the structure and distribution of the data, and use some trips to make up for the data problems
2) Feature engineering
Now there are more CV competitions, but this part is less
3) Model training
Choose a baseline and choose a model framework to train. There are many trikes in the trained model, which may be during training or during construction.
4) Offline verification
Analyze why the BadCase effect is not good through the verification set, and find ways to improve the model
Find more tools to reuse to meet the needs of the game
3 Data sample analysis:
The training set sample bbox distribution. Look at the number of samples without bbox, single sample size, and whether the distribution conforms to the normal distribution.
Some samples have a large bbox. There are two solutions. 1. Remove this part of the sample directly; 2. Keep this part as noise, which may improve the generalization ability. As for which solution to decide
4 Introduction to Baseline Ideas
1) Basic data enhancement (commonly used in CV)
HSV channel color transformation, brightness, contrast transformation, horizontal flip, vertical flip, grayscale conversion, random cropping
2) Advanced data enhancement
Cutout: Randomly cut out some areas in the sample and fill them with 0 pixel values. The result of the classification remains unchanged, simulating the effect of being occluded, imitating the effect of dropout, randomly discarding some neurons, and randomly discarding some pixels.
Cutmix: Cut off a part of the area but do not fill in 0 pixels but randomly fill in the area pixel values of other data in the training set, and the classification results are distributed according to a certain ratio
3) Training strategy
K-fold training
Learning rate policy:
ReduceLROnPlateau adaptively adjusts the learning rate, when an indicator no longer changes (decreases or increases), adjusts the learning rate
LambadLR: Set the learning rate of each parameter group to a function of the initial learning rate lr