Paper Reading - 基础系列 - Tricks for Image Classification with CNN - 代码天地

Paper Reading - 基础系列 - Tricks for Image Classification with CNN

企业开发 2022-05-09 20:47:27 阅读次数: 0

Bag of Tricks for Image Classification with Convolutional Neural Networks

更多可见计算机视觉-Paper&Code - 知乎

Abstract

本文由AWS发出，不愁卡，18年训练用到了8张V100。对于各个有效的训练Trick都做了对比和单一分析。通过将这些改进组合在一起，能够改进各种 CNN 模型，甚至魔改ResNet50都好过当时最新的SE-ResNeXt-50

Algorithm

Efficient Training（训练效率）

源于google发的一篇文件，浓浓的王霸之气，不过随后被何凯明团队表示当bs达到2048就不会再有更好的性能提升了，更不是越大越好

Batch_size太小的话，模型在200个epoch内并不会收敛。
随着Batch_size增大，处理相同数据量的速度会越来越快。
随着Batch_size 增大，达到相同精度所需要的epoch数量越来越多。

那么怎么样提高训练速度呢？一直传承到现在的方法来了，虽然现在看起来都是很常用的操作，但是前人不容易哇

Linear scaling learning rate ：文中提到选择0.1作为batch_size=256的初始学习率。当第n个batch时，学习率线性增加到0.1×n/256
Learning rate warmup facebook提出的预热策略，就是从0到设定学习率之间采用线性关系形式。
Zero γ resenet中由于block后都会接一个BN层，该方法就是将BN的r初始化为0而不是随机数，相当于模型的网络层数较少，可以使得模型在初始化阶段更容易训练。对于BN的作用就是说，BN将normalize后的数据再进行线性变换。是为了让神经网络能够优化参数γ和平移参数β，如果前面的normalization操作没有起到优化的作用，通过梯度回传就可以更新γ和β来抵消一些normalization的操作。
No bias decay 现在来看，weight decay不能提升模型性能，仅仅是避免过拟合，tx提出了只对卷积层和全连接层的weight做L2中正则化，不对bias，BN层的γ和β进行正则化衰减。文章还提及了分布式训练layer-wise adaptive learning，不过文章仅仅关注2k以内的batchsize训练。batch变大时学习率也需要变大，这样会导致收敛不稳定，LARS通过给LR乘上权重与梯度的norm比值来解决这个问题

Model Tweaks

对于Resnet的一些模型结构魔改

ResNet-B 改变了下采样块，将路径A中负责完成下采样功能的模块由1x1卷积换成3x3卷积，因为步幅为2的1x1卷积会忽略特征图的3/4区域。

ResNet-C 改变了输入主干的7x7卷积，用3个3x3卷积替代，不仅减少了参数量，也增加了非线性，前2个3x3的输出通道数为32，最后一个3x3的输出通道数为64。

ResNet-D 文章作者受到ResNet-B的启发，注意到在下采样块的路径B中的1x1卷积也忽略了特征图的3/4区域，因此在卷积之前添加一个2x2的平均池化，将1x1卷积的步幅变为1

Low-precision training

目前的计算设备都增加了算术逻辑模块来提升低精度的计算性能，因此使用低精度训练能够大幅提升训练速度，减少显存占用，就可以采用更大的batchsize。不过尽管有性能优势，但低精度可能使结果更有超出范围，其中一个改进是f32用于update，f16用于存储。再将f32 * constant用于对齐f16。损失缩放

For example, the previously mentioned NvidiaV100 offers 14 TFLOPS in FP32 but over 100 TFLOPS in

FP16.

Training Refinements（精度提升）

Cosine Learning Rate Decay：如图只能说Cosine更加平滑一些，最终结果差不多，主要还是考虑不同任务采用不同的rate decay

Label Smoothing one hot要求输出的分数有显著区别，从而可能导致过拟合。因此采用ls。总欧诺个有K类，e是一个比较小的常数

Knowledge Distillation 通过大模型训练小模型，让小模型尽可能拟合大模型的输出score分布，T为超参，让总Loss更加平滑，z为学术r为teacher

Mix Up 每次随机选两个样本进行叠加生成一个新的样本

Experiment

Xavier初始化目前可以使用何凯明初始化方法。可以看到，一直加trick一直爽。但是蒸馏在Inception和MobileNet上效果不好，作者认为是因为老师是ResNet，模型结构不一样，老师的预测有不同的分布。

同样可以看到在目标检测方面采用VGG-19进行Transfer学习，效果也同样优异

猜你喜欢

转载自blog.csdn.net/weixin_43953700/article/details/124218998

Paper Reading - 基础系列 - Tricks for Image Classification with CNN

Paper Reading -- 《Spectral-Spatial Attention Networks for Hyperspectral Image Classification》

Paper Reading - CNN+CNN: Convolutional Decoders for Image Captioning

Bag of Tricks for Image Classification

「Computer Vision」Note on Bag of Tricks for Image Classification

Bag of Tricks for Image Classification with Convolutional Neural Networks

Paper | Residual Attention Network for Image Classification

Paper Reading - Convolutional Image Captioning ( CVPR 2018 )

Paper Reading - 基础系列 - Batch Normalization

Paper Reading - Model系列 - LiteHRNet

【论文阅读】Bag of Tricks for Image Classification with Convolutional Neural Networks

图像分类算法优化技巧：Bag of Tricks for Image Classification

【cvpr2019】Bag of Tricks for Image Classification with Convolutional Neural Networks

【BoT】《Bag of Tricks for Image Classification with Convolutional Neural Networks》

[深度学习] Image Classification图像分类之Bag of Tricks for Image Classification with Convolutional Neural Net

阅读笔记（paper+code）：Residual Attention Network for Image Classification

【Paper Reading】R-CNN（V5）论文解读

Paper Reading - Show and Tell: A Neural Image Caption Generator ( CVPR 2015 )

Paper Reading - Learning a Recurrent Visual Representation for Image Caption Generation

Paper Reading - Learning to Evaluate Image Captioning ( CVPR 2018 ) ★

paper reading- Feb 25 about optimization problem used in image

Feb.27~image super-resolution reconstruction， paper reading

[Paper Reading] Show and Tell: A Neural Image Caption Generator

Guided Diffusion/Diffusion Models Beat GANs on Image Synthesis (Paper reading)

Cascaded Diffusion Models for High Fidelity Image Generation (Paper reading)

Generative Diffusion Prior for Unified Image Restoration and Enhancement (Paper reading)

DriftRec: Adapting diffusion models to blind image restoration tasks (Paper reading)

Paper Reading - Model系列 - ShuffleNet Chanel Attention

Paper Reading - 基础系列 - Rethinking ImageNet Pre-training

Paper Reading - 基础系列 - 常用评价指标 ROC、PR、mAP

今日推荐

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

面壁智能发布 Eurux-8x22B 开源大模型 —— 堪称「理科状元」

开源日报 | 谷歌扶持鸿蒙上位；开源Rabbit R1；Docker加持的安卓手机；微软的焦虑和野心；海尔电器把开放平台关了

中国码农的“35岁魔咒”

蘭雅 CorelDRAW 插件 2024.5.1 国际劳动节版，免费下载

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

周排行

Java自定义时间格式

同步整形电路

在开发中最最最常用的字符串的属性大集合

Linux 查看端口占用并杀掉

Java基础四：ArrayList

多线程之死锁就是这么简单

mysql 基础命令集

awk 命令详解

Centos6.3编译安装nginx+php步骤

OCR （Optical Character Recognition，光学字符识别）

每日归档

更多

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)