Training Neural Networks, part I - 代码天地

Training Neural Networks, part I

其他 2020-01-27 11:30:33 阅读次数: 0

Part I :

- Activation Functions

- Data Preprocessing

- Weight Initialization

- Batch Normalization

- Babysitting the Learning Process

- Hyperparameter Optimization

Activation Functions

Sigmoid

问题：

1.梯度消失

2.非零中心 ------> 梯度更新低效

3.指数函数的计算代价高

tanh

以零为中心，但还是没有解决梯度消失的问题

ReLU

计算速度快，收敛速度快，在正半轴不会饱和

不是以零为中心，在负半轴饱和（包括0）

dead relu不会被激活或更新，导致的原因：

（1）权值初始化不好

（2）学习率太高 ---> 权值波动大

Leaky ReLU

PReLU

ELU

Maxout

In practice：

Data Preprocessing

一般会做零中心化处理

不总是做归一化

仅作中心化的方法：

Weight Initialization

1. 初始化为0时（相同的值）

神经元会做相同的事情，没有打破参数对称问题

2. 初始化为很小的随机数

梯度会趋近于0

3. 初始化在高斯分布(-1, 1)中

会饱和（在tanh中）

4. Xavier (tanh)

5. Xavier (ReLU)

Batch Normalization
减少坏初始化的影响
加快模型的收敛速度
可以用大些的学习率
能有效地防止过拟合

归一化 + 缩放，平移

Babysitting the Learning Process

Step 1: Preprocess the data

Step 2: Choose the architecture
Step 3: Double check that the loss is reasonable

Step 4: try to train

1. Make sure that you can overfit very small portion of the training data

Very small loss, train accuracy 1.00, nice!

2. Start with small regularization and find learning rate that makes the loss go down.

loss not going down : learning rate too low

3. try a big learning rate

loss exploding : learning rate too high

4. Rough range for learning rate we should be cross-validating is somewhere [1e-3 … 1e-5]

Hyperparameter Optimization

问题：不能在空间上充分寻找，寻找范围是有限的

Random Search: More sample

Hyperparameters to play with:

- network architecture

- learning rate, its decay schedule, update type

- regularization (L2/Dropout strength)

Summay

_likyoo

发布了55 篇原创文章 · 获赞 22 · 访问量 4万+

私信关注

猜你喜欢

转载自blog.csdn.net/li_k_y/article/details/86695758

Training Neural Networks, part I

Lecture 6: Training Neural Networks, Part I

Training Neural Networks, part II

CNN笔记（CS231N）——训练神经网络I（Training Neural Networks, Part I）

CS231n Lecture6-Training Neural Networks, part I学习笔记

A Recipe for Training Neural Networks [中文翻译, part 1]

Population Based Training of Neural Networks

（转）A Recipe for Training Neural Networks

[Lecture 6 ] Training Neural Networks I（训练神经网络I）

【CS231n】Lecture 6：Training Neural Networks,Part 2

CNN笔记（CS231N）——训练神经网络II（Training Neural Networks, Part 2）

CS231n课程笔记：Leture6 Training Neural Networks I

Bag of Freebies for Training Object Detection Neural Networks

Mixed-Precision Training of Deep Neural Networks

Domain-Adversarial Training of Neural Networks

A Recipe for Training Neural Networks 博客翻译

Machine Learning - Neural Networks Representation Part II

Deformable Part Models are Convolutional Neural Networks

On the difficulty of training Recurrent Neural Networks中RNN完美复现

1506.01186-Cyclical Learning Rates for Training Neural Networks

【阅读笔记】Differentiable plasticity: training plastic neural networks with backpropagation

【阅读笔记】Training Deep Neural Networks on Imbalanced Data Sets

《Understanding the difficulty of training deep feedforward neural networks》笔记

Training Neural Networks with Weights and Activations Constrained to +1 or -1论文阅读

DiracNets: Training Very Deep Neural Networks Without Skip-Connections

MLCC笔记15 - 训练神经网络 (Training Neural Networks)

《Bag of Freebies for Training Object Detection Neural Networks》论文理解

Xavier——Understanding the difficulty of training deep feedforward neural networks

9 Tips For Training Lightning-Fast Neural Networks In Pytorch

论文笔记:Bag of Freebies for Training Object Detection Neural Networks 论文笔记:Bag of Freebies for Training Object Detection Neural Networks

今日推荐

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

周排行

浏览器对同一域名进行请求的最大并发连接数

React Hook之自定义Hook

【转】MyBatis缓存机制

-Java-泛型

自动化测试常用脚本-发送邮件

LeetCode#859: Buddy Strings

java、Python处理字符串

第二篇の博客

Hadoop伪分布式环境安装

SQL Server进阶（十一）临时表、表变量

每日归档

更多

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)