梯度下降算法变化率曲线

为了学习matplotlib 画图,同时也为了看看各种优化算法下变化率曲线

先看最好的RMSprop 算法(350次)

import math
import matplotlib  #导入matplotlib库
from numpy import *
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FuncFormatter

def f(x):
        return x**3-x+x**2

def derivative_f(x):
         return 3*(x**2)+2*x-1


x=0.0
y=0.0
learning_rate = 0.001
gradient=0
e=0.00000001
sum = 0.0

d = 0.9

Egt=0
Edt = 0


delta = 0


xx=[]
dd=[]
gg=[]
yy=[]
for i in range(100000):

    print('x = {:6f}, f(x) = {:6f},gradient={:6f}'.format(x,y,gradient))
    if(abs(gradient)>0.000001 and (abs(gradient)<0.00001)):
        print("break at "+str(i))
        break
    else:
        xx.append(x)

        gradient = derivative_f(x)

        gg.append(gradient)

        Egt = d * Egt + (1-d)*(gradient**2)

        delta = learning_rate*gradient/math.sqrt(Egt + e)

        dd.append(delta)
        x=x-delta

        y=f(x)

        yy.append(y)



fig = plt.figure()

ax1 = fig.add_subplot(111)
ax1.plot(xx,dd, label='YR', color='red')



ax2 = ax1.twinx()  # this is the important function
ax2.plot(xx, gg,label='YL', color='blue')


plt.savefig('latex-rms.png', dpi=75)
plt.show()


 

蓝色是导数变化情况,红色的差值的变化率,注意红色的有一段平滑的阶段,刚好是learning_rate

再看看 ADADELTA (760次)



 


红色的是来回的碗口,说明动量的此起彼伏

再看看ADAGRAD (1454)



 

 再看adam(861次)



 

最后再看看原生的 (3018次)



 

猜你喜欢

转载自dikar.iteye.com/blog/2391374