Deep Learning - Large Scale Image Classification Experience

Reference blog

Large-Scale Image Classification (Chinese Translation of MXNet Tutorial)

Notes: Troubleshooting Guide

Validation set accuracy

Achieving reasonable validation accuracy is often straightforward, but achieving the results in recent papers can sometimes be very difficult. In order to achieve this goal, you can try a few suggestions listed below.

  • Using data augmentation can often reduce the difference between training accuracy and validation accuracy. As we approach the end of training, data augmentation should be reduced.
  • Use a larger learning rate at the beginning of training and maintain it for a longer period of time. For example, when training on CIFAR10, you could use a learning rate of 0.1 for the first 200 epochs and then reduce it to 0.01.
  • Do not use batches that are too large, especially if the batch size far exceeds the number of categories.

speed

  • While distributed training greatly improves the training speed, the computational cost of each batch is also very high. Therefore, make sure your workload is not very small (such as training LeNet on the MNIST dataset) and make sure the batch size is reasonably large.
  • Make sure data reading and preprocessing are not bottlenecks. Set the --test-io option to 1 to check how many images per second the worker cluster can process.
  • Increase the number of threads for data processing by setting --data-nthreads (default is 4).
  • Data preprocessing is implemented through opencv. If the opencv you are using is compiled from source, please make sure it works correctly.
  • By setting --benchmark to 1 to generate data randomly, compared with using real ordered data, it can reduce some bottlenecks.
  • On this page you can get more information about it.

Video memory

If the batch size is too large, the GPU capacity will be exceeded. When this happens, you can see an error message similar to "cudaMalloc failed: out of memory". There are several ways to solve this problem.

  • Reduce batch size.
  • Set the environment variable MXNET_BACKWARD_DO_MIRROR to 1, which can reduce memory consumption at the expense of speed. For example, when the training batch size is 64, inception-v3 will use 10G of video memory, and can train approximately 30 images per second on the K80 GPU. When mirroring is enabled, we can use a batch size of 128 when training inception-v3 using 10G of video memory, but at the cost the number of images that can be processed per second drops to about 27 images per second.

Guess you like

Origin blog.csdn.net/s000da/article/details/89486004