Dry! Seven deep learning practical skills

Depth learning has become a way to solve many challenging real-world problems. Target detection, speech recognition and language translation, this is by far the best performance of the method. Many people deep neural network (DNNs) regarded as magical black box, we put in a bunch of data, it is our solution! In fact, things are not so simple.
DNN in design and application to a specific problem you may encounter many challenges. In order to achieve the required performance standard real-world application, correct design and implementation is critical to all stages of the data required for network design, training and other inferred. Here, I will share 7 practical tips with you to make your depth neural networks play the biggest role.

1-- data, data, data

It's not a big secret. Has been working very well depth learning machine requires fuel - lots of fuel; fuel data. The more we have the marker data, the better the performance of the model. More data leads to better performance, have explored the image data set confirms massive 300 million by Google!
When you deploy Deep Learning model in practical application, you should continue to provide more data and to fine-tune its performance to continue to improve. Feed the monster: If you want to improve the performance of your model, it would get more data!

Dry!  Seven deep learning practical skills
Increasing data yields better performance

2 - What you should optimize the use?

Over the years, we have developed a number of gradient descent optimization algorithm, each with its advantages and disadvantages. Some of the most popular include:
· stochastic gradient descent (SGD) + Momentum Method
· Adam
· RMSprop
· Adadelta
RMSprop, Adadelta and Adam is considered to be an adaptive optimization algorithm, because they will automatically update the learning rate. When you use SGD, you must manually select the learning rate and momentum parameters, usually over time and reduce the learning rate.
In practice, the adaptive optimizer tends to converge faster than SGD, however, their final performance is usually somewhat less. SGD will usually achieve better minimum, so as to obtain a better final accuracy, but it may take much longer than some of the optimizer. It is also more dependent on powerful initialization and learning attenuation rate schedule, which can be very difficult in practice.
So, if you need some quick results, or just want to test a new technology, select adaptive optimizer. I found Adam is easy to use, because it is you choose the perfect learning rate is not very sensitive. If you want to get the absolute best final performance, use the SGD + Momentum, and use the learning rate, attenuation, and momentum values to maximize performance.
Best of both worlds there are wood!
Recent studies have shown it, you can do the best of both worlds: high-speed train top-notch performance by switching from Adam to SGD! The idea is that the early stages of training SGD is actually very sensitive to parameter adjustment and initialization time. Therefore, we can start our training by the use of Adam, which will save you a considerable amount of time, without having to worry about initialization parameters and adjust. Well, Adam, once up and running, we can switch to SGD + momentum optimized to achieve the best performance!

Dry!  Seven deep learning practical skills
Adam vs SGD performance. Due to the robustness and adaptive learning rate, Adam in the beginning did well, SGD ultimately achieve better global minimum.

3 - How to handle data imbalance

在很多情况下,您将处理不平衡的 数据,特别是在现实世界的应用程序中。举一个简单而实际的例子:为了安全起见,您正在训练您的深度网络以预测视频流中是否有人持有致命武器。但是在你的训练数据中,你只有50个拿着武器的人的视频和1000个没有武器的人的视频!如果你只是用这些数据来训练你的网络,那么你的模型肯定会非常偏向于预测没有人有武器!
有几件事你可以做到这一点:
· 在损失函数中使用类权重:本质上,代表性不足的类在损失函数中获得更高的权重,因此对该特定类的任何错误分类将导致损失函数中的非常高的误差。
· 过度抽样:重复一些包含代表性不足的训练样例,有助于平衡分配。如果可用的数据很小,这最好。
· 欠采样:您可以简单地跳过一些包含过度表示类的训练示例。如果可用数据非常大,这最好。
· 数据增加为少数类:您可以综合创建更多的代表性不足的训练示例!例如,在前面检测致命武器的例子中,你可以改变属于具有致命武器的类别的视频的一些颜色和光照。

4 - 迁移学习

正如我们在第一个提示中所看到的,深层网络需要大量的数据。不幸的是,对于许多新的应用程序来说,这些数据可能很难并且花费很高。如果我们希望我们的模型表现良好,我们可能需要数十或数十万个新的训练样例进行训练。如果数据集不易获取,则必须全部收集并手动标记。
这就是迁移学习的起点。通过迁移学习,我们不需要太多的数据!这个想法是从一个以前在数百万图像上训练过的网络开始的,比如在ImageNet上预训练的ResNet。然后,我们将通过仅重新训练最后几个层并使其他层独立来微调ResNet模型。那样的话,我们正在将ResNet从数百万图像中学到的信息(图像特征)进行微调,以便我们可以将它应用于不同的任务。这是可能的,因为跨域的图像的特征信息通常非常相似,但是这些特征的分析可以根据应用而不同。

Dry!  Seven deep learning practical skills
一个基本的迁移学习通道

5 - 快速简单的数据增强,以提高性能

我们现在说过几次:更多的数据=更好的表现。除了迁移学习之外,另一种快速而简单的方法来提高模型的性能,即数据增强。数据增强涉及通过在使用原始类别标签的同时通过改变数据集中的一些原始图像来生成合成训练示例。例如,图像数据增强的常见方式包括:
· 水平和垂直旋转或翻转图像
· 改变图像的亮度和颜色
· 随机模糊图像
· 随机从图像裁剪补丁
基本上,你可以进行任何改变,改变图像的外观,但不是整体的内容,即你可以制作一个蓝色的狗的照片,但你仍然应该能够清楚地看到,照片上是一个狗。

Dry!  Seven deep learning practical skills
数据裂变!

6 - 训练模型的合奏!

在机器学习中,集合训练多个模型,然后将它们组合在一起以获得更高的性能。因此,这个想法是在相同的数据集上训练同一个任务上的多个深度网络模型。模型的结果然后可以通过投票方案来进行组合,即具有最高票数的胜出。
为了确保所有模型不同,可以使用随机权重初始化和随机数据增强。众所周知,由于使用了多个模型,因此集合通常比单个模型精确得多,因此从不同角度接近任务。在现实世界的应用中,尤其是挑战或竞争中,几乎所有顶级模型都使用合奏。

Dry!  Seven deep learning practical skills
合奏模型

7 - 加快修剪

我们知道模型精度随深度而增加,但速度又如何呢?更多的图层意味着更多的参数,更多的参数意味着更多的计算,更多的内存消耗和更慢的速度。理想情况下,我们希望在提高速度的同时保持高精度。我们可以通过修剪来做到这一点。

Dry!  Seven deep learning practical skills
Step depth neural network pruning
The idea is that in a network of many parameters, some are redundant, not much contribution to the output. If you can arrange neural network according to the contribution of the network, it can be removed from the network of neurons low ranking to form smaller, faster networks. According to L1 / L2 neurons weight average weights, the average activation, the frequency and number of neurons in the validation set are not zero, and other creative methods to sort. Faster / smaller networks is very important for mobile devices running on the depth learning network.
The most basic method is to simply trim the network to give up some convolution filter. In a recent article, this is quite successful. Neurons in the rankings this work is quite simple: it is the weight of each filter weights L1 norm. In each iteration trim, they were all sort of filter, trimming the m lowest-ranked of all filter layer, re-training and re-!
When deletion of a layer, having a network (e.g. ResNets) compared with the remaining fast connection without using any network (e.g., VGG or AlexNet) quick connector, retaining good precision and is much more robust. This interesting discovery has important practical significance, because it tells us when to deploy and use in trimming network, network design is essential (use ResNets!). So the best way to always use the latest good!

Want more information on artificial intelligence
can be added to V ,, letter: hcgx0904 (Remarks "artificial intelligence")
Click: "Deep Learning & computer vision succinctly" , start learning! ! !

Guess you like

Origin blog.51cto.com/14352303/2412565