一丶作业要求
- 实现线性反向传播
- 题目链接
二、解决方法
1) 在每次迭代中都重新计算Δb,Δw的贡献值:
import numpy as np
def func_z(w, b):
return (2*w + 3*b)*(2*b + 1)
z_true = 150
w = 3
b = 4
count = 0
while((func_z(w, b)-z_true)>=1e-5):
count+=1
z = func_z(w, b)
dz = np.abs(z - z_true)/2
y = 2*b + 1
x = 2*w + 3*b
dw = dz/(2*y)
db = dz/(2*x + 3*y)
print("w=%f, b=%f, z=%f, delta_z=%f, delta_b=%f" %(w, b, z, 2*dz, db))
w = w - dw
b = b- db
print("w=%f, b=%f, z=%f, delta_z=%f" %(w, b, func_z(w, b), func_z(w, b)-z_true))
print(f"Interation counts:{count}, final_w:{w}, final_b:{b}")
#output:******************************************************
w=3.000000, b=4.000000, z=162.000000, delta_z=12.000000, delta_b=0.095238
w=2.666667, b=3.904762, z=150.181406, delta_z=0.181406, delta_b=0.001499
w=2.661519, b=3.903263, z=150.000044, delta_z=0.000044, delta_b=0.000000
w=2.661517, b=3.903263, z=150.000000, delta_z=0.000000
Interation counts:3, final_w:2.661517402927456, final_b:3.9032629057674404
2)梯度下降
实际上在极低学习率下,精度很难提高,不过损失是减少的。
w = 3
b = 4
loss = []
eta = 1e-6
while(np.abs(func_z(w, b)-z_true)>=1e-3):
y = 2*b + 1
x = 2*w + 3*b
gradient_w = 2*y
gradient_b = 2*x + 3*y
w = w - eta*gradient_w
b = b - eta*gradient_b
loss.append(func_z(w, b)-z_true)
if np.abs(func_z(w, b)-z_true)<=1e-3:
print(func_z(w, b)-z_true)
3)tensorflow自动求导
import tensorflow as tf
w = tf.Variable(3.0, dtype=tf.float64, name='w')
b = tf.Variable(4.0, dtype=tf.float64, name='b')
f = (2*w + 3*b)*(2*b + 1)
loss = tf.abs(f-150.0)
#grads_w = tf.gradients(f, [w])
#grads_b = tf.gradients(f, [b])
#training_op1 = tf.assign(w, w-learning_rate*grads_w)
los = np.infty
training_op = tf.train.AdamOptimizer(0.0001).minimize(loss)
init = tf.global_variables_initializer()
with tf.Session() as sess:
init.run()
while(los>=1e-5):
_, los = sess.run([training_op, loss])
print("Final w, b", sess.run([w, b]))
print(sess.run(loss))
print(los)
#output:******************************************************
Final w, b [2.8489226246747603, 3.8490755625488835]
0.0002418711751772662
7.760580075455437e-06
loss 和 los 理论上应该相同的,这里不知道为啥不同,不过梯度下降还是很难求得精确解的。如果将损失改为tf.square(f - 150), 解更精确了,rmse损失函数为凸函数??