Vector-Jacobian product of autograd in pytorch

autograd is an important part of Pytorch, vector-Jacobian product is the key

Take the three-dimensional vector value function as an example:

X=\left[x_{1}, x_{2}, x_{3}\right] Y=X^{2}

According to the Tensor, Element-Wise mechanism, it actually means:

Y=\left[y_{1}=x_{1}^{2}, y_{2}=x_{2}^{2}, y_{3}=x_{3}^{2}\right]

The derivative of Y to X is not 2X but a Jacobian matrix (because X and Y are vectors, not one-dimensional real numbers):

J=\left(\begin{array}{ccc} \frac{\partial y_{1}}{\partial x_{1}} & \frac{\partial y_{1}}{\partial x_{2}} & \frac{\partial y_{1}}{\partial x_{3}} \\ \frac{\partial y_{2}}{\partial x_{1}} & \frac{\partial y_{2}}{\partial x_{2}} & \frac{\partial y_{2}}{\partial x_{3}} \\ \frac{\partial y_{3}}{\partial x_{1}} & \frac{\partial y_{3}}{\partial x_{2}} & \frac{\partial y_{3}}{\partial x_{3}} \end{array}\right)=\left(\begin{array}{ccc} 2 x_{1} & 0 & 0 \\ 0 & 2 x_{2} & 0 \\ 0 & 0 & 2 x_{3} \end{array}\right)

Among them y_{1}=f_{1}\left(x_{1}, x_{2}, x_{3}\right)=x_{1}^{2}, it is about   \left(x_{1}, x_{2}, x_{3}\right)the function, not just about x1, the particularity here is caused by the Element-Wise operation mechanism, the same reason y2, y3

The derivative (rate of change ) of d (Y) for each component  is the partial derivative of each component function , y = 1,2,3 pairs

The accumulation of v along a certain direction. In general, the default direction of v is   .

You can also pass in different directions, which is the easy feed external gradient claimed in official documents.

Here, it can be understood as:  the projection  of the partial derivative vector in the v direction ;

It may be understood as: a function of each component of   the partial derivative of the weights .

v Once determined,  the weights for each  are determined and are the same.

Experiment with:

One-simple implicit Jacobian

>>> x = torch.randn(3, requires_grad = True)
>>> x
tensor([-0.9238,  0.4353, -1.3626], requires_grad=True)
>>> y = x**2
>>> y.backward(torch.ones(3))
>>> x.grad
tensor([-1.8476,  0.8706, -2.7252])
>>> x
tensor([-0.9238,  0.4353, -1.3626], requires_grad=True)
>>>

Two-simple display of Jacobian verification

>>> x1=torch.tensor(1, requires_grad=True, dtype = torch.float)
>>> x2=torch.tensor(2, requires_grad=True, dtype = torch.float)
>>> x3=torch.tensor(3, requires_grad=True, dtype = torch.float)
>>> y=torch.randn(3) # produce a random vector for vector function define
>>> y[0]=x1**2+2*x2+x3 # define each vector function
>>> y[1]=x1+x2**3+x3**2
>>> y[2]=2*x1+x2**2+x3**3
>>> y.backward(torch.ones(3))
>>> x1.grad
tensor(5.)
>>> x2.grad
tensor(18.)
>>> x3.grad
tensor(34.)

The Jacobian matrix in the above code is: J=\left(\begin{array}{ccc} 2 x_{1} & 2 & 1 \\ 1 & 3 x_{2}^{2} & 2 x_{3} \\ 2 & 2 x_{2} & 3 x_{3}^{2} \end{array}\right)

The component functions are: \left\{\begin{array}{l} y_{1}=x_{1}^{2}+2 x_{2}+x_{3} \\ y_{2}=x_{1}+x_{2}^{3}+x_{3}^{2} \\ y_{3}=2 x_{1}+x_{2}^{2}+x_{3}^{3} \end{array}\right.

Projection direction: v = (1,1,1)

v \circ J=\left[2 x_{1}+1+2,2+3 x_{2}^{2}+2 x_{2}, 1+2 x_{3}+3 x_{3}^{2}\right]=[5,18,34]   Code results and analysis confirm each other

 

Three --- projection to different directionsv = (3,2,1) 

Prior analysis:

v \circ J=\left[3 * 2 x_{1}+2 * 1+1 * 2,3 * 2+2 * 3 x_{2}^{2}+1 * 2 x_{2}, 3 * 1+2 * 2 x_{3}+1 * 3 x_{3}^{2}\right] = [10,34,42]

 

Code verification:

>>> x1=torch.tensor(1, requires_grad=True, dtype = torch.float)
>>> x2=torch.tensor(2, requires_grad=True, dtype = torch.float)
>>> x3=torch.tensor(3, requires_grad=True, dtype = torch.float)
>>> y=torch.randn(3)
>>> y[0]=x1**2+2*x2+x3
>>> y[1]=x1+x2**3+x3**2
>>> y[2]=2*x1+x2**2+x3**3
>>> v=torch.tensor([3,2,1],dtype=torch.float)
>>> y.backward(v)
>>> x1.grad
tensor(10.)
>>> x2.grad
tensor(34.)
>>> x3.grad
tensor(42.)

Completely suitable

Published 469 original articles · praised 329 · 600,000 views

Guess you like

Origin blog.csdn.net/qq_32146369/article/details/105372307