Practical tutorial | Top 10 deep learning skills to dominate Kaggle

By Samuel Lynn-Evans

Source丨Qubit

Edit丨Gokushi Platform

On the leaderboards of various Kaggle competitions, there are many programmers who have just entered the field of deep learning, and most of them have one thing in common:

All have taken courses from Fast.ai.

These free, hands-on courses strongly encourage students to take part in Kaggle competitions to test their abilities. Of course, students are also taught a lot of deep learning techniques that dominate Kaggle.

What's the secret for beginners to quickly master and build state-of-the-art DL algorithms in a short period of time? A French trainee named Samuel Lynn-Evans summed up ten lessons.

7029dcb0e671768506502df9fbb9f04a.gif

His article was published on the official FloydHub blog, because in addition to the tips from Fast.ai, he also used FloydHub's setup-free deep learning GPU cloud platform.

Next, let's take a look at the top ten skills he learned from fast.ai:

1. Using the Fast.ai library

This one is the most straightforward.

from fast.ai import *

The Fast.ai library is a novice-friendly deep learning toolbox and is currently the number one choice for reproducing the latest algorithms.

Whenever the Fast.ai team and AI researchers find an interesting paper, they test it on various datasets and determine the appropriate tuning method. They will add better model implementations to this library, and users can quickly load these models.

As a result, the Fast.ai library has become a powerful toolbox that can quickly load some of the latest algorithm implementations, such as the stochastic gradient descent algorithm with restart, differential learning rate and test-time enhancement, etc., which will not be mentioned one by one here. .

The following sections describe each of these techniques and show how to use the Fast.ai library to quickly use them.

This library is built on PyTorch and can be used fluently when building models.

Fast.ai library address:
https://github.com/fastai/fastai

2. Use multiple learning rates instead of a single learning rate

f6cbf054e26008a162348c23b9beda5c.png

Differential Learning rates mean that it is more important to transform the network layers during training than to increase the network depth.

Training deep learning networks based on existing models is a proven method that can achieve better results in computer vision tasks.

Most of the existing networks (such as Resnet, VGG and Inception, etc.) are trained on the ImageNet dataset, so we need to appropriately change the network weights according to the similarity of the dataset used to the ImageNet images .

When modifying these weights, we usually modify the last few layers of the model, since these layers are used to detect basic features (such as edges and contours), which differ from dataset to dataset.

First, to use the Fast.ai library to get a pre-trained model, the code is as follows:

from fastai.conv_learner import *

# import library for creating learning object for convolutional #networks
model = VVG16()

# assign model to resnet, vgg, or even your own custom model
PATH = './folder_containing_images' 
data = ImageClassifierData.from_paths(PATH)

# create fast ai data object, in this method we use from_paths where 
# inside PATH each image class is separated into different folders

learn = ConvLearner.pretrained(model, data, precompute=True)

# create a learn object to quickly utilise state of the art
# techniques from the fast ai library

After creating the learn object, solve the problem by quickly freezing the front layers and fine- tuning the latter layers:

learn.freeze()

# freeze layers up to the last one, so weights will not be updated.

learning_rate = 0.1
learn.fit(learning_rate, epochs=3)

# train only the last layer for a few epochs

When later layers produce good results, we apply a differential learning rate to change earlier layers. In practice, the reduction factor of the learning rate is generally set to 10 times:

learn.unfreeze()

# set requires_grads to be True for all layers, so they can be updated

learning_rate = [0.001, 0.01, 0.1]
# learning rate is set so that deepest third of layers have a rate of 0.001, # middle layers have a rate of 0.01, and final layers 0.1.

learn.fit(learning_rate, epochs=3)
# train model for three epoch with using differential learning rates

3. How to find the right learning rate

Learning rate is one of the most important hyperparameters in neural network training, but it has been difficult to choose the best learning rate for neural networks in practical applications.

A cyclical learning rate paper by Leslie Smith found the answer, a relatively unknown discovery that wasn't widely used until it was popularized by the Fast.ai course.

The paper is: Cyclical Learning Rates for Training Neural Networks

https://arxiv.org/abs/1506.01186

In this method, we try to train the neural network with a lower learning rate, but increasing exponentially in each batch, the corresponding code is as follows:

learn.lr_find()
# run on learn object where learning rate is increased  exponentially

learn.sched.plot_lr()
# plot graph of learning rate against iterations
5f86140d542bd45845acf4f62ff31760.png
△ The learning rate increases exponentially after each iteration

At the same time, record the Loss value corresponding to each learning rate, and then draw the relationship between the learning rate and the Loss value:

learn.sched.plot()
# plots the loss against the learning rate
ff289e59217bc418423b3b7891159f73.png
△ Find the point where the Loss value is decreasing but still not stable

The optimal learning rate is determined by finding the value where the learning rate is the highest and the Loss value is still decreasing. In the above case, the value would be 0.01.

4. Cosine Annealing

When using batch stochastic gradient descent , the neural network should get closer and closer to the global minimum of the Loss value. As it gets closer to this minimum, the learning rate should get smaller so that the model doesn't overshoot and gets as close to this as possible.

Cosine annealing uses the cosine function to reduce the learning rate to solve this problem, as shown in the following figure:

4cb171bce3ffe428dd1950f36e585aeb.png
△ Cosine value decreases as x increases

As can be seen from the above graph, as x increases, the cosine value first decreases slowly , then accelerates , and then decreases slowly again . This mode of descent works well with the learning rate in a very computationally efficient way.

learn.fit(0.1, 1)
# Calling learn fit automatically takes advantage of cosine annealing

We can use the **learn.fit()** function in the Fast.ai library to quickly implement this algorithm, continuously reducing the learning rate throughout the cycle, as shown in the following figure:

9c146c85e3f8b4854b1ff667722c24d8.png
△ The learning rate is continuously reduced in a cycle that requires 200 iterations

At the same time, based on this method, we can further introduce a restart mechanism.

5. SGD algorithm with restart

During training, gradient descent algorithms can get stuck in local minima, rather than global minima.

836c532b9107a50eb141f5809132588d.png
△ Gradient descent algorithm stuck in local minimum

Gradient descent algorithms can "jump out" of the local minimum and find a path to the global minimum by suddenly increasing the learning rate . This approach is called stochastic gradient descent with restarts (  SGDR ) , and it was shown to work well in the ICLR paper by Loshchilov and Hutter.

The paper is: SGDR: Stochastic Gradient Descent with Warm Restarts
https://arxiv.org/abs/1608.03983

The SGDR algorithm can be quickly imported with the Fast.ai library. When the learn.fit(learning_rate, epochs) function is called, the learning rate is reset at the beginning of each epoch to the initial value when the parameters were input, and then gradually decreases as described in the cosine annealing section above.

67bff2d477f8d6e164f7c2572ccef394.png

Whenever the learning rate drops to a minimum point, which is every 100 iterations in the above figure, we call it a cycle.

cycle_len = 1
# decide how many epochs it takes for the learning rate to fall to
# its minimum point. In this case, 1 epoch

cycle_mult=2
# at the end of each cycle, multiply the cycle_len value by 2

learn.fit(0.1, 3, cycle_len=2, cycle_mult=2)
# in this case there will be three restarts. The first time with
# cycle_len of 1, so it will take 1 epoch to complete the cycle.
# cycle_mult=2 so the next cycle with have a length of two epochs, 
# and the next four.
64a880b2ea81b7ca412d8a3daa43b3e3.png
△ The period contained in each cycle is 2 times of the previous cycle

Taking advantage of these parameters, and using differential learning rates, these tricks are key to Fast.ai users getting good results on image classification problems.

The Fast.ai forum has a thread dedicated to the Cycle_mult and cycle_len functions at:
http://forums.fast.ai/t/understanding-cycle-len-and-cycle-mult/9413/8

More details on learning rate can refer to this Fast.ai course:
http://course.fast.ai/lessons/lesson2.html

6. Personalize your activation function

Softmax only likes to choose one thing;

Sigmoid wants to know where you are on the [-1, 1] interval, and doesn't care how much you increase beyond those values;

Relu is a club bouncer trying to keep negative numbers out.

……

It may seem silly to treat activation functions in this way, but assigning a character ensures that they are used for the correct task.

As fast.ai founder Jeremy Howard pointed out, many academic papers also use the Softmax function in multi-classification problems. I've also seen it used inappropriately many times in papers and blogs during DL learning.

7. Transfer learning is very effective in NLP problems

Just as pretrained models are effective in computer vision tasks, it has been shown that natural language processing (NLP) models can also benefit from this approach.

In Lesson 4 of Fast.ai, Jeremy Howard used transfer learning to build a model to judge whether movie reviews on IMDB are positive or negative.

The effect of this approach was immediate, and he achieved an accuracy that surpassed all previous models shown in the Salesforce paper:
https://einstein.ai/research/learned-in-translation-contextualized-word-vectors.

ed6dc16e367146ec6fe1eb4ed5fe26e7.png
△ Pre-existing architecture provides state-of-the-art NLP performance

The key to this model is to first train the model to gain some understanding of the language, and then use this pretrained model as part of a new model to analyze sentiment.

To create the first model, we trained a recurrent neural network (RNN) to predict the next word in a text sequence, which is called language modeling . When the trained network reaches a certain level of accuracy, the pattern it encodes for each word is passed to a new model for sentiment analysis.

In the example above, we saw this language model integrated with another model for sentiment analysis, but this approach can be applied to any other NLP task , including translation and data extraction .

Also, some techniques from computer vision, such as freezing the network layers and using differential learning rates mentioned above, can also achieve better results here.

The use of this method on NLP tasks involves a lot of details, so the code will not be posted here, and the corresponding courses and codes can be accessed.

Course:
http://course.fast.ai/lessons/lesson4.html

Code: https://github.com/fastai/fastai/blob/master/courses/dl1/lesson4-imdb.ipynb

8. The advantages of deep learning in processing structured data

The Fast.ai course demonstrates the outstanding performance of deep learning for processing structured data without resorting to feature engineering and domain-specific knowledge.

This library takes full advantage of the embedding function in PyTorch, allowing fast conversion of categorical variables into embedding matrices.

The technique they showed is relatively straightforward, converting categorical variables to numbers and assigning an embedding vector to each value:

3d3319cd0ba9707384da5d876cc7264c.png
△ Four values ​​are embedded for each day of the week

On such tasks, the traditional practice is to create dummy variables, i.e. one-hot encoding. In contrast, the advantage of this approach is to use four values ​​instead of one to describe each day, thus obtaining higher data dimensions and richer relationships.

This approach won third place in the Rossman Kaggle competition, narrowly losing out to two domain experts who used their expertise to create many additional features.

Related courses:
http://course.fast.ai/lessons/lesson4.html

Code:
https://github.com/fastai/fastai/blob/master/courses/dl1/lesson3-rossman.ipynb

This idea of ​​using deep learning to reduce the dependence on feature engineering has also been confirmed by Pinterest. He also mentioned that they are trying to get better results with less work through deep learning models.

9. More built-in functions: Dropout layer, size setting, TTA

On April 30, the Fast.ai team won the classification task based on Imagenet and CIFAR10 in the DAWNBench competition held by Stanford University. In Jeremy's win summary, he attributes the success to some extra functions in the fast.ai library.

One of them is the Dropout layer, proposed by Geoffrey Hinton in a seminal paper two years ago. It was initially popular, but seems to have been neglected in recent computer vision papers. The paper is:

Dropout: A Simple Way to Prevent Neural Networks from Overfitting:

https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

However, the PyTorch library makes its implementation simple, and loading it with the Fast.ai library is even easier.

7a437e9ef02b8b31edfd37160a7c88e1.png
△ Space indicates the point of action of the Dropout function

The dropout function reduces overfitting effects, so it is important to win on a relatively small dataset like CIFAR-10. When creating a learn object, the Fast.ai library will automatically add the dropout function, and you can use the ps variable to modify the parameters, as shown below:

learn = ConvLearner.pretrained(model, data, ps=0.5, precompute=True)
# creates a dropout of 0.5 (i.e. half the activations) on test dataset. 
# This is automatically turned off for the validation set

A very simple and effective method, often used to deal with overfitting effects and improve accuracy, is to train on small images , then increase the size and train the same model again.

# create a data object with images of sz * sz pixels 
def get_data(sz): 
    tmfs = tfms_from_model(model, sz)
    # tells what size images should be, additional transformations such
    # image flips and zooms can easily be added here too

    data = ImageClassifierData.from_paths(PATH, tfms=tfms)
    # creates fastai data object of create size

    return data

learn.set_data(get_data(299))
# changes the data in the learn object to be images of size 299
# without changing the model.

learn.fit(0.1, 3)
# train for a few epochs on larger versions of images, avoiding overfitting

Another advanced technique that can improve accuracy by several percentage points is test time augmentation (  TTA ). Here, multiple different versions of the original image will be created, including cropping different regions and changing the zoom level, etc., and input them into the model; then the multiple versions will be calculated to get the average output, which is the final output score of the image, which can be called learn.TTA() to use the algorithm.

preds, target = learn.TTA()

This technique works because the original image shows areas that may be missing some important features. Feeding multiple versions of the image into the model and taking the average solves this problem.

10. Innovation is key

In the DAWNBench competition, the model proposed by the Fast.ai team is not only the fastest, but also has low computational cost. Understand that building a successful DL application is not just a computational task that utilizes massive GPU resources, but a problem that requires creativity, intuition, and innovation.

Some of the breakthroughs discussed in this paper, including Dropout layers, cosine annealing, and SGD with restarts, are actually different solutions that researchers have come up with for some problems. Compared with simply increasing the training data set, it can improve the accuracy better .

Many big companies in Silicon Valley have a lot of GPU resources, but don't think that their advanced effects are out of reach, you can also rely on innovation to come up with some new ideas to challenge the effect rankings.

In fact, sometimes the limitation of computing power is also an opportunity, because demand is the driving force of innovation.

About the author

Samuel Lynn-Evans has been teaching life sciences courses for the past 10 years, and after noticing the great potential of machine learning in scientific research, he started studying artificial intelligence at Paris 42 School, wanting to apply NLP techniques to biological and medical problems.

Original: https://blog.floydhub.com/ten-techniques-from-fast-ai/

This article is for academic sharing only, if there is any infringement, please contact to delete the article.

Dry goods download and study

Backstage reply: Barcelona Autonomous University courseware, you can download the 3D Vison high-quality courseware accumulated by foreign universities for several years

Background reply: computer vision books, you can download the pdf of classic books in the field of 3D vision

Backstage reply: 3D vision courses, you can learn excellent courses in the field of 3D vision

3D visual quality courses recommended:

1. Multi-sensor data fusion technology for autonomous driving

2. A full-stack learning route for 3D point cloud target detection in the field of autonomous driving! (Single-modal + multi-modal/data + code)
3. Thoroughly understand visual 3D reconstruction: principle analysis, code explanation, and optimization and improvement
4. The first domestic point cloud processing course for industrial-level combat
5. Laser-vision -IMU-GPS fusion SLAM algorithm sorting
and code
explanation
Indoor and outdoor laser SLAM key algorithm principle, code and actual combat (cartographer + LOAM + LIO-SAM)

9. Build a structured light 3D reconstruction system from scratch [theory + source code + practice]

10. Monocular depth estimation method: algorithm sorting and code implementation

11. The actual deployment of deep learning models in autonomous driving

12. Camera model and calibration (monocular + binocular + fisheye)

13. Heavy! Quadcopters: Algorithms and Practice

14. ROS2 from entry to mastery: theory and practice

15. The first 3D defect detection tutorial in China: theory, source code and actual combat

Heavy! Computer Vision Workshop - Learning Exchange Group has been established

Scan the code to add a WeChat assistant, and you can apply to join the 3D Vision Workshop - Academic Paper Writing and Submission WeChat exchange group, which aims to exchange writing and submission matters such as top conferences, top journals, SCI, and EI.

At the same time , you can also apply to join our subdivision direction exchange group. At present, there are mainly ORB-SLAM series source code learning, 3D vision , CV & deep learning , SLAM , 3D reconstruction , point cloud post-processing , automatic driving, CV introduction, 3D measurement, VR /AR, 3D face recognition, medical imaging, defect detection, pedestrian re-identification, target tracking, visual product landing, visual competition, license plate recognition, hardware selection, depth estimation, academic exchanges, job search exchanges and other WeChat groups, please scan the following WeChat account plus group, remarks: "research direction + school/company + nickname", for example: "3D vision + Shanghai Jiaotong University + Jingjing". Please remark according to the format, otherwise it will not be approved. After the addition is successful, the relevant WeChat group will be invited according to the research direction. Please contact for original submissions .

1c4cb6628e0aea0c0ce0d56ced902c52.png

▲Long press to add WeChat group or contribute

86194acc3f784adf52541e2f11c0cacc.png

▲Long press to follow the official account

3D vision from entry to proficient knowledge planet : video courses for 3D vision field ( 3D reconstruction series , 3D point cloud series , structured light series , hand-eye calibration , camera calibration , laser/vision SLAM, automatic driving, etc. ), summary of knowledge points , entry and advanced learning route, the latest paper sharing, and question answering for in-depth cultivation, and technical guidance from algorithm engineers from various large factories. At the same time, Planet will cooperate with well-known companies to release 3D vision-related algorithm development positions and project docking information, creating a gathering area for die-hard fans that integrates technology and employment. Nearly 4,000 Planet members make common progress and knowledge to create a better AI world. Planet Entrance:

Learn the core technology of 3D vision, scan and view the introduction, unconditional refund within 3 days

d170f6b22f7d67b50e753aa1e2d31d99.png

 There are high-quality tutorial materials in the circle, which can answer questions and help you solve problems efficiently

I find it useful, please give a like and watch~

Guess you like

Origin blog.csdn.net/qq_29462849/article/details/124162190