autograd is an important part of Pytorch, vector-Jacobian product is the key
Take the three-dimensional vector value function as an example:
According to the Tensor, Element-Wise mechanism, it actually means:
The derivative of Y to X is not 2X but a Jacobian matrix (because X and Y are vectors, not one-dimensional real numbers):
Among them , it is about the function, not just about x1, the particularity here is caused by the Element-Wise operation mechanism, the same reason y2, y3
The derivative (rate of change ) of d (Y) for each component is the partial derivative of each component function , y = 1,2,3 pairs
The accumulation of v along a certain direction. In general, the default direction of v is .
You can also pass in different directions, which is the easy feed external gradient claimed in official documents.
Here, it can be understood as: the projection of the partial derivative vector in the v direction ;
It may be understood as: a function of each component of the partial derivative of the weights .
v Once determined, the weights for each are determined and are the same.
Experiment with:
One-simple implicit Jacobian
>>> x = torch.randn(3, requires_grad = True)
>>> x
tensor([-0.9238, 0.4353, -1.3626], requires_grad=True)
>>> y = x**2
>>> y.backward(torch.ones(3))
>>> x.grad
tensor([-1.8476, 0.8706, -2.7252])
>>> x
tensor([-0.9238, 0.4353, -1.3626], requires_grad=True)
>>>
Two-simple display of Jacobian verification
>>> x1=torch.tensor(1, requires_grad=True, dtype = torch.float)
>>> x2=torch.tensor(2, requires_grad=True, dtype = torch.float)
>>> x3=torch.tensor(3, requires_grad=True, dtype = torch.float)
>>> y=torch.randn(3) # produce a random vector for vector function define
>>> y[0]=x1**2+2*x2+x3 # define each vector function
>>> y[1]=x1+x2**3+x3**2
>>> y[2]=2*x1+x2**2+x3**3
>>> y.backward(torch.ones(3))
>>> x1.grad
tensor(5.)
>>> x2.grad
tensor(18.)
>>> x3.grad
tensor(34.)
The Jacobian matrix in the above code is:
The component functions are:
Projection direction:
Code results and analysis confirm each other
Three --- projection to different directions
Prior analysis:
Code verification:
>>> x1=torch.tensor(1, requires_grad=True, dtype = torch.float)
>>> x2=torch.tensor(2, requires_grad=True, dtype = torch.float)
>>> x3=torch.tensor(3, requires_grad=True, dtype = torch.float)
>>> y=torch.randn(3)
>>> y[0]=x1**2+2*x2+x3
>>> y[1]=x1+x2**3+x3**2
>>> y[2]=2*x1+x2**2+x3**3
>>> v=torch.tensor([3,2,1],dtype=torch.float)
>>> y.backward(v)
>>> x1.grad
tensor(10.)
>>> x2.grad
tensor(34.)
>>> x3.grad
tensor(42.)
Completely suitable