版权声明:本文为博主([email protected])原创文章,未经博主允许不得转载。 https://blog.csdn.net/z_feng12489/article/details/90033237
6.3 损失函数及其梯度
MSE
- 这里
导数:
x=tf.random.normal([1,3])
w=tf.ones([3,2])
b=tf.ones([2])
y = tf.constant([0, 1])
with tf.GradientTape() as tape:
tape.watch([w, b])
logits = tf.sigmoid(x@w+b)
loss = tf.reduce_mean(tf.losses.MSE(y, logits))
grads = tape.gradient(loss, [w, b])
print('w grad:', grads[0])
print('b grad:', grads[1])
w grad: tf.Tensor(
[[-0.0309717 0.04595036]
[-0.1207474 0.17914379]
[ 0.01667333 -0.02473696]], shape=(3, 2), dtype=float32)
b grad: tf.Tensor([ 0.09684253 -0.14367795], shape=(2,), dtype=float32)
cross entropy
softmax:
- 转移成概率值
- 拉大值之间的差距
Softmax 求导:
得到:
x=tf.random.normal([1,3])
w=tf.random.normal([3,2])
b=tf.random.normal([2])
y = tf.constant([0, 1])
with tf.GradientTape() as tape:
tape.watch([w, b])
logits = (x@w+b)
loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y, logits, from_logits=True))
grads = tape.gradient(loss, [w, b])
print('w grad:', grads[0])
print('b grad:', grads[1])
w grad: tf.Tensor(
[[ 0.23242529 -0.23242529]
[ 0.9089024 -0.9089024 ]
[ 0.58031267 -0.58031267]], shape=(3, 2), dtype=float32)
b grad: tf.Tensor([ 0.6567008 -0.6567008], shape=(2,), dtype=float32)