Just for the record, please skip
Matrix multiplication (x*w1)
# numpy
h = x.dot(w1)
# torch
h = x.mm(w1)
The ones greater than 0 are reserved, the ones less than 0 are set to 0-the effect of the activation function ReLU
# numpy
h_relu = np.maximum(h, 0)
# torch
h_relu = h.clamp(min=0)
Subtract two arrays, the sum of the squares of each element after subtraction
# numpy
loss = np.square(y_pred - y).sum()
# torch
loss = (y_pred - y).pow(2).sum().item()
Note that you also need to take.item()
Take the transpose of the matrix
# numpy
grad_w2 = h_relu.T.dot(grad_y_pred)
# torch
grad_w2 = h_relu.t().mm(grad_y_pred)
torch is used.t()
Copy array
# numpy
grad_h = grad_h_relu.copy()
# torch
grad_h = grad_h_relu.clone()