Want to improve your convolutional neural network? Check out these 14 design patterns!

Abstract: These 14 original design patterns can help inexperienced researchers try to combine deep learning with new applications, and are a good starting point for those without a PhD in machine learning.

For more in-depth articles, please pay attention to the cloud computing channel:

https://yq.aliyun.com/cloud
screenshot
Since 2011, deep convolutional neural networks (CNN) have significantly outperformed humans in image classification. They have become a standard in the field of computer vision, such as image segmentation, object detection, scene labeling, tracking, text detection, etc.
However, it is not so easy to become proficient at training neural networks. As with previous machine learning thinking, the details make or break. However, training a neural network has more details to deal with. What are your data and hardware limitations? Which network should you start with? How many dense layers should you build as opposed to convolutional layers? How is your activation function set up? Even if you use the most popular activation functions, you must use regular activation functions.
Learning rate is the most important hyperparameter for tuning neural network training, and one of the most difficult to optimize. Too small and you might never get a solution; too big and you might just miss the optimal solution. If you use an adaptive learning rate method, this means that you have to spend a lot of money on hardware resources to meet the computational needs.
Design choices and hyperparameter settings greatly affect the training and performance of CNNs, but for new entrants in the deep learning field, the cultivation of design architecture intuition may require scarcity and dispersion of resources.
screenshot
"Neural Networks: Tradeoff Techniques" is a book mainly focused on practical tuning, published in 2003 and republished in 2012. The popularity of deep learning began in 2012 when the New York Times reported on the surprising success of Geoffrey Hinton's team at the Merck Drug Discovery Challenge. However, the most advanced research in recent years has disappeared.
Fortunately, Leslie Smith, a researcher at the U.S. Naval Research Laboratory, published a systematic study on CNN architecture improvements and technical improvements. Here are some of the most important design patterns he highlights

14 CNN Design Patterns for Image Classification
According to Smith, "These 14 original design patterns can help inexperienced researchers try to combine deep learning with new applications". While advanced AI researchers can rely on intuition, experience, and targeted experimentation, these recommendations are a good starting point for those without a PhD in machine learning.
1) Architecture follows application
You may be drawn to the dazzling new models invented by imaginative labs like Google Brain or Deep Mind, but many of them are either impossible or impractical for your need. Perhaps you should use the model that makes the most sense for your particular application, which may be very simple, but still powerful, such as VGG.
screenshot

2) Proliferation of paths
Each year the winner of the ImageNet Challenge uses a deeper network than the previous year's winner. From AlexNet to Inception to Resnets, Smith and his team also observed a trend of "multiplied the number of paths in the network", and that ResNet can be an exponential collection of networks of different lengths.
3) Pursue simplicity
Bigger is not necessarily better. In a paper titled "Bigger is not necessarily better," Springenberg et al. demonstrate how to achieve state-of-the-art results with fewer cells.
4) Increase Symmetry Symmetry is considered a hallmark of quality and craftsmanship,
both architecturally and biologically. Smith attributes the elegance of FractalNet to the symmetry of the network.
5) Pyramid shape
You are always making a tradeoff between representational power and reducing redundant or useless information. CNNs usually downsample the activation function and increase the connection channels from the input layer to the final layer.
6) Transition training
Another trade-off is training accuracy and generalization ability. Using regularization methods like drop-out or drop-path to improve generalization ability is an important advantage of neural networks. Train your network with harder problems than the actual use case to improve generalization performance.
7) Cover the problem space
To expand your training data and improve generalization, use noise and artificially increase the size of the training set, such as random rotations, cropping, and some image manipulation.
8) Incremental functional structures
As architectures become successful, they simplify the "job" of each layer. In very deep neural networks, each layer only incrementally modifies the input. In ResNets, the output of each layer may be similar to the input. So, in practice, use short skip lengths in ResNet.
9) Normalize the input of the layer
Normalization is a shortcut that can make the work of the computational layer easier, and in practice can improve the accuracy of training. The inventors of batch normalization think normalization works because of dealing with internal covariates, but Smith argues that "normalization puts the input samples of all layers on an equal basis (similar to unit conversion), which allows backpropagation Can train more efficiently."
10) Input Transformation
Studies have shown that in Wide ResNets, the performance improves with the number of channels, but there is a trade-off between training cost and accuracy. AlexNet, VGG, Inception and ResNets all perform input transformations in the first layer to guarantee multiple ways to check the input data.
11) Available resources determine layer
width The number of outputs to choose from is not obvious, and accordingly, depends on your hardware capabilities and desired accuracy.
12) Summation joining
Summation is a popular way of merging branches. In ResNets, using summation as the connection mechanism allows each branch to compute residuals and overall approximations. If the input skip connection is always present, then Summation will make each layer learn the right thing (eg: the difference of the input). In networks where any branch can be dropped (eg FractalNet), you should use this to keep the output smooth.
13) Downsampling transformation
In aggregation, cascade connections are used to increase the number of outputs. When using a stride greater than 1, this handles joining and increasing the number of channels at the same time.
14) Maxout for competition
Maxout is used in local competition networks where only one activation function needs to be chosen. The method used includes all activation functions, except that maxout selects only one "winner". An obvious use case for Maxout is to have different sized kernels per branch, while Maxout can be scale-invariant.

Tips & Tricks
In addition to these design patterns, there are several tips and tricks to reduce architectural complexity and training time.
1) Use a fine-tuned pre-trained network Mike Tung, CEO of
machine learning company Diffbot, said, "If your visual data is similar to ImageNet, then using a pre-trained network will help you learn faster." Low-level CNNs can often be reused because they are mostly able to detect common patterns like lines and edges. Replace the classification layers with your own and train the last few layers with your specific data.
2) Use freeze-drop-path
Drop-path randomly drops some branches during iterative training. Smith tested an opposite approach, called freeze-path, where the weights of some paths are fixed and non-trainable, rather than being removed as a whole. The network may achieve higher accuracy because the next branch contains more layers than the previous branch and the correction term is easier to come by.
3) Use a recurrent
learning rate Experimenting with learning rates can consume a lot of time and will leave you with errors. Adaptive learning rates can be computationally expensive, but recurrent learning rates are not. When using a recurrent learning rate, you can set a set of max and min bounds and change it within that range. Smith provides a method for calculating the maximum and minimum values ​​of the learning rate in the paper.
4) Use bootstrapping in noisy labels
In practice, a lot of data is messy, labels are subjective or missing, and targets may never be encountered. In their article, Reed et al. describe a method to inject consistency into the network's predicted target. Intuitively, this can be achieved by filtering input data that may have inconsistent training labels by the network's known representation of the environment (implicit in the parameters), and cleaning that data at training time.
5) Use ELUs with Maxout instead of ReLUs
ELUs are a relatively smooth version of ReLUs that can speed up convergence and improve accuracy. Research investigations show that, unlike ReLUs, ELUs have negative values, which allows them to push average unit activations closer to 0 with less computational complexity, just like batch normalization. They are especially effective if you use Maxout with fully connected layers.
About the Author:
Mariya is Director of Research and Design at TOPBOTS. screenshot
Linkedin: https://www.linkedin.com/in/mariyayao
Twitter: http://www.twitter.com/thinkmariya The
above is the translation.
This article was recommended by the teacher of Beiyou @爱可可-爱生活, and translated by the Aliyun Yunqi community organization.
The original title of the article "14 DESIGN PATTERNS TO IMPROVE YOUR CONVOLUTIONAL NEURAL NETWORKS", author: Mariya Yao, translator: Yuan Hu, proofreader: I am the theme song brother.
The article is a simplified translation. For more detailed content, please check the original text.

Copyright notice: The content of this article is contributed by Internet users. If you find any content suspected of plagiarism in this community, please send an email to: [email protected] to report and provide relevant evidence. Once verified, this community will immediately delete the allegedly infringing content.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326282226&siteId=291194637