Pytorch_RNN_gradient exploring(梯度爆炸)

gradient exploring(梯度爆炸)现象常常存在于RNN的模型训练当中,对于CNN网络,由于其是前向传输,不会对某一层进行重复使用。而RNN的层数少,训练会重复使用现有的网络,我们在训练的时候会使用到梯度进行优化器的更迭,如果梯度值(方向不变)存在一定的误差,则在循环迭代时会产生极大的误差,(如0.9的20次方(产生梯度弥散)与1.1的20次方(产生梯度爆炸))针对这种情况有了以下的优化方法。

(由于边写代码,边写中文注释比较繁琐,所以开始训练自己英语表达能力,如有表达或者科学性错误请指正)

#!usr/bin/env python
# -*- coding:utf-8 _*-
"""
@author: 1234
@file: clipping.py
@time: 2020/07/13
@desc:
    格式化代码 :Ctrl + Alt + L 
    运行代码 : Ctrl + Shift + F10
    注释代码/取消注释 : Ctrl + /
"""
import torch
from torch import nn

# when we are training our model ,we use gradient optimizer
# the number of our gradient can limit to desperately large by recurrent training
# we call this phenomenon  is gradient exploring
# to handle this issue
# in this case ,wo can refer the paper about gradient clipping 2013
# the algorithm is:
# given a threshold (we set the threshold about 8-15 by common experience )
# if the gradient of the weight over the threshold
# we do  [tensor(gradient)/the number of gradient] to control the size of the gradient
# it isn't change the direction of our gradient
# and base of that we can get more suitable results

# due to the importance of the phenomenon
# pytorch had include this algorithm
# the following is the basic procedure
loss = criterion(output, y)
model.zero_grad()
loss.backward()
for p in model.parameters():
    print(p.grad.norm())
    # to check out if the num is exploring
torch.nn.utils.clip_grad_norm_(p, 10)
# put in the parameter and set threshold is 10
otimizer.step()

猜你喜欢

转载自blog.csdn.net/soulproficiency/article/details/107327699
今日推荐