Exploring the automatic derivation mechanism of Tensorflow2.2

This column mainly explores the use of Tensorflow2.2 on 64-bit mac. In the first article of this column, the author listed several learning websites that are relatively clear for Tensorflow users. Interested learners can explore by themselves. There are many ways to learn Tensorflow from different angles, so I won't go into details here.


  • quotation

In the previous article of this column, when using Tensorflow to calculate the derivative of ddx(y=x2)|x=2,

We get the following output:

tf.Tensor(4.0, shape=(), dtype=float32) tf.Tensor(4.0, shape=(), dtype=float32)

Among them, we can see that this string of codes includes the correct answer 4.0 for derivation, but there are two attributes behind it: shape() and dtype. How to use Tensorflow to perform derivative calculations, and what do these two attributes in the answer mean? The author will describe it in detail in later chapters.

  • in-depth

1.1 tensor (Tensor)

TensorFlow uses tensor (Tensor) as the basic unit of data, and the important properties of tensor are its shape (shape) and type (dtype).

import tensorflow as tf
A = tf.constant([[1., 2., 3.], [4., 5., 6.]])
print(A)

After mining tensors, we get the following attribute expressions for tensors

tf.Tensor(
[[1. 2. 3.]
 [4. 5. 6.]], shape=(2, 3), dtype=float32)

Therefore, we can conclude that the value in the attribute shape is (number of rows, number of columns). So TensorFlow's tensor is conceptually equivalent to a multidimensional array, and we can use it to describe scalars (0-dimensional arrays), vectors (1-dimensional arrays), matrices (2-dimensional arrays), etc. in mathematics.

The tensor attribute Dtype refers to the data type, and the tensor type represented by the 32-bit float type is a floating-point data type (float32).

1.2 Mathematical operations (Operation)

1) Addition operation of Tensorflow2

Addition:
A = tf.constant([[1., 2., 3.], [4., 5., 6.]])
B = tf.constant([[7., 8., 9.], [1., 2., 3.]])
C = tf.add(A, B)    # 计算矩阵A和B的和
print(C)

Conclusion:
tf.Tensor(
[[ 8. 10. 12.]
 [ 5.  7.  9.]], shape=(2, 3), dtype=float32)

2) Multiplication operation of Tensorflow2

Multiplication:
import tensorflow as tf
A = tf.constant([[1., 2., 3.], [4., 5., 6.]])
B = tf.constant([[7., 8.], [9., 1.], [2., 3.]])
C = tf.matmul(A, B) # 计算矩阵A和B的乘积
print(C)

Conclusion:
tf.Tensor(
[[31. 19.]
 [85. 55.]], shape=(2, 2), dtype=float32)

2.1 Automatic derivation mechanism

2.1.1 Derivation mechanism

In machine learning, we often need to calculate the derivative of a function. TensorFlow provides a powerful automatic differentiation mechanism to calculate derivatives. TensorFlow introduces tf.GradientTape(), a "derivation recorder" to achieve automatic derivation. There needs to be an initialization process when using variables, so the initial value can be specified by specifying the initial_value parameter in tf.Variable().

The following code shows how to use tf.GradientTape() to calculate the derivative of the function y=x2 at x=4.

import tensorflow as tf
x = tf.Variable(initial_value=4.)
with tf.GradientTape() as tape:
    y = tf.square(x)
y_grad = tape.gradient(y, x)
print(y, y_grad)

output:

tf.Tensor(16.0, shape=(), dtype=float32) tf.Tensor(8.0, shape=(), dtype=float32)

2.1 Partial derivative derivation mechanism

For some complex functions, we need to calculate their partial derivatives with respect to a single variable. In machine learning, it is more common to take partial derivatives of multivariate functions, and to take derivatives of vectors or matrices. Therefore, it is also very important to use the partial derivative derivation mechanism in Tensorflow.

Compute the partial derivatives of the function L(w,b)=||Xw+b−y||2 with respect to , w, and b respectively when w=(2,3)T , b=2.

where x=[1234] , y=[12]

X = tf.constant([[1., 2.], [3., 4.]])
y = tf.constant([[1.], [2.]])
w = tf.Variable(initial_value=[[2.], [3.]])
b = tf.Variable(initial_value=2.)
with tf.GradientTape() as tape:
    L = tf.reduce_sum(tf.square(tf.matmul(X, w) + b - y))
w_grad, b_grad = tape.gradient(L, [w, b])        # 计算L(w, b)关于w, b的偏导数
print(L, w_grad, b_grad)

output:

tf.Tensor(405.0, shape=(), dtype=float32) tf.Tensor(
[[126.]
 [180.]], shape=(2, 1), dtype=float32) tf.Tensor(54.0, shape=(), dtype=float32)

As can be seen from the output, TensorFlow helped us calculate

L(w=(2,3)T,2)=405

∂L(w,b)∂w|w=(2,3)T,b=2=[126180]

∂L(w,b)∂b|w=(2,3)T,b=2=54


epilogue

This article mainly introduces the automatic derivation mechanism of Tensorflow, and transforms the derivation mechanism from theoretical expression to implementation in code. These simple and clear mechanisms are the basis for using Tensroflow to perform linear fitting or implement other algorithms.

PS: I am not talented, welcome to correct me! If you think this chapter is helpful to you, please pay attention, comment, and praise!

Guess you like

Origin blog.csdn.net/zhaomengsen/article/details/130874984