How Kaggle Competitions Start

1 Introduction to Kaggle

Classification of competition questions one:

Featured: For business or scientific research problems, the bonus is generally more generous

Recruitment: The reward for the competition is yi

Research: scientific research and academic competitions generally require strong field and professional knowledge

Playground: Provides some simple tasks for getting familiar with the platform and competition

Getting Started: Provides some simple tasks to familiarize yourself with the platform and the game

In Class: For classroom project assignments or exams

Classification method two:

Online submission and offline submission

Classification method three:

Data classification: data mining, image, speech, natural language

2 General procedure of the competition

1) EDA data analysis

Look at what the data looks like, think about how to solve the problem from the structure and distribution of the data, and use some trips to make up for the data problems

2) Feature engineering

Now there are more CV competitions, but this part is less

3) Model training

Choose a baseline and choose a model framework to train. There are many trikes in the trained model, which may be during training or during construction.

4) Offline verification 

Analyze why the BadCase effect is not good through the verification set, and find ways to improve the model

Find more tools to reuse to meet the needs of the game

3 Data sample analysis:

The training set sample bbox distribution. Look at the number of samples without bbox, single sample size, and whether the distribution conforms to the normal distribution.

Some samples have a large bbox. There are two solutions. 1. Remove this part of the sample directly; 2. Keep this part as noise, which may improve the generalization ability. As for which solution to decide

4 Introduction to Baseline Ideas

1) Basic data enhancement (commonly used in CV)

HSV channel color transformation, brightness, contrast transformation, horizontal flip, vertical flip, grayscale conversion, random cropping

2) Advanced data enhancement

Cutout: Randomly cut out some areas in the sample and fill them with 0 pixel values. The result of the classification remains unchanged, simulating the effect of being occluded, imitating the effect of dropout, randomly discarding some neurons, and randomly discarding some pixels.

Cutmix: Cut off a part of the area but do not fill in 0 pixels but randomly fill in the area pixel values ​​​​of other data in the training set, and the classification results are distributed according to a certain ratio

3) Training strategy

K-fold training

Learning rate policy:

ReduceLROnPlateau adaptively adjusts the learning rate, when an indicator no longer changes (decreases or increases), adjusts the learning rate

LambadLR: Set the learning rate of each parameter group to a function of the initial learning rate lr

Guess you like

Origin blog.csdn.net/qq_40016005/article/details/127723240