1. x*y is element-wise multiplication, also called Hadamard product x**y is also
2. cat dim=0 stacks rows together, there are a lot more rows, dim=1 columns pile up a lot more columns
3.print(z.sum())#The sum of all values finally becomes a value, print(z.numel())#The number of all values
The full mean of z.mean() is equal to z.sum()/z.numl()
4. Matrix paradigm matrix is pulled into a vector to calculate the length f long
5. Matrix transpose A=AT
6. The sum will lose a dimension, in order not to lose the sum () parameter keepdims=True
7.A.cumsum cumulative sum is interesting
8. The inner product torch.dot is multiplied element by element and finally added to return a scalar
9.torch.mv
10. One dimension in torch must be a row vector, and the column vector must be a matrix
11. When the sub-derivative is not differentiable, the derivative at this point is between the left and right ranges
12.<x,w> inner product of x and w
13.
x=torch.orange(4)
y1 = 2 * torch.dot(x, x)#y1 is a scalar that x is multiplied by itself and then multiplied by 2 28
y2=x * x #y2 is multiplied by itself, vector [0,1,4,9]
for each element in the vector. Non-linear, so the nonlinear partial derivative after sum is equivalent to ignoring other elements? So backward can also be used as a scalar
14. The print parameter end = '\r' leads to no printing and you need to press enter once. This is due to the fact that the feature '\r' of linux is different from win10.
15. detach from the calculation graph, where it can be detached y
to return a new variable u
with y
the same value as , but discard y
any information about how to calculate in the calculation graph. In other words, gradients do not flow backwards u
through x
.
u = y.detach() The gradient is gone
so there will be
16. I can't understand the gradient calculation of Python control flow