Learn Pytorch by Example

Simple understanding of gradient descent method

Gradient descent is widely used in machine learning. Whether in linear regression or logistic regression, its main purpose is to find the minimum value of the objective function through iteration, or converge to the minimum value.

insert image description here

The basic process of gradient descent is very similar to the downhill scene.

First, we have a differentiable function. This function represents a mountain. Our goal is to find the minimum value of this function, which is the bottom of the mountain. According to the previous scenario assumptions, the fastest way to go downhill is to find the steepest direction of the current position, and then walk down this direction, which corresponds to the function, which is to find the gradient of a given point, and then move in the direction opposite to the gradient. It can make the function value drop the fastest! Because the direction of the gradient is the direction in which the function changes fastest.

Warming up: NumPy

Before introducing PyTorch, we will first implement the network using numpy .

Numpy provides an n-dimensional array object, and many functions for manipulating these arrays. Numpy is a general framework for scientific computing. It doesn't know anything about computational graphs, deep learning, or gradients. However, we can easily use numpy to fit a third order polynomial to the sine function by manually implementing the forward and backward passes of the network using numpy operations:

# -*- coding: utf-8 -*-
import numpy as np
import math

# Create random input and output data
x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)

# Randomly initialize weights
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6
for t in range(2000):   # 随着迭代次数增加，损失值就会越小，但变化趋近于零
    # Forward pass: compute predicted y
    # y = a + b x + c x^2 + d x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    # numpy.square（）函数返回一个新数组，该数组的元素值为源数组元素的平方，源阵列保持不变
    loss = np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    # 反向传播来计算a, b, c, d关于损失的梯度
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')

Results of the:

Pytorch: Tensors

Numpy is a great framework, but it doesn't take advantage of the GPU to accelerate its numerical calculations. For modern deep neural networks, GPUs typically provide speedups of 50x or more , so unfortunately numpy is not sufficient for modern deep learning.

Here, we introduce the most fundamental PyTorch concept: tensors . PyTorch tensors are conceptually the same as numpy arrays: tensors are n-dimensional arrays, and PyTorch provides a number of functions that operate on these tensors. Behind the scenes, tensors keep track of computational graphs and gradients, but they also serve as general tools for scientific computing.

Unlike numpy, PyTorch tensors can leverage the GPU to accelerate their numerical computations. To run PyTorch tensors on the GPU, you just need to specify the correct device.

Here we fit a third order polynomial to a sine function using PyTorch tensors. Like the numpy example above, we need to manually implement the forward and backward passes through the network:

# -*- coding: utf-8 -*-

import torch
import math

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # Uncomment this to run on GPU

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

Tensors and Autograd

Here we prepare a third - order polynomial, train by minimizing the squared Euclidean distance, and predict the value of the function on y = sin(x) .-pipi

This implementation uses PyTorch tensor operations for forward propagation and PyTorch Autograd for computing gradients.

A PyTorch tensor represents a node in a computational graph. If xa tensor, and x.requires_grad=True, x.gradis another tensor that holds xthe gradient with respect to some scalar value.

import torch
import math

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0")  # Uncomment this to run on GPU

# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.randn((), device=device, dtype=dtype, requires_grad=True)
b = torch.randn((), device=device, dtype=dtype, requires_grad=True)
c = torch.randn((), device=device, dtype=dtype, requires_grad=True)
d = torch.randn((), device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

Use torch.nn

A third-order polynomial that is trained by minimizing the squared Euclidean distance and predicts the value of the function y = sin(x) on .-pipi

This implementation uses PyTorch's nnpackage to build neural networks. PyTorch Autograd makes it easy for us to define computational graphs and calculate gradients, but the original Autograd may be too low-level for defining complex neural networks. This is where nnbags can help. nnThe package defines a set of modules that you can think of as a layer of neural network that takes input, produces output, and may have some trainable weights.

import torch
import math

# Create Tensors to hold input and outputs
x = torch.linspace(-math.pi,math.pi,2000)
y = torch.sin(x)

# tensor (x, x^2, x^3)
p = torch.tensor([1,2,3])
xx = x.unsqueeze(-1).pow(p)

model = torch.nn.Sequential(
    torch.nn.Linear(3,1),
    torch.nn.Flatten(0,1)
)

loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):
    y_pred = model(xx)
    loss = loss_fn(y_pred,y)
    if t%100==99:
        print(t,loss.item())

    model.zero_grad()

    loss.backward()

    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

use optim

Continuing the example above:

Instead of manually updating the model's weights as before, let's use optimthe package to define an optimizer that will update the weights for us. optimThe package defines many optimization algorithms commonly used in deep learning, including SGD + momentum, RMSProp, Adam, etc.

import torch
import math

# Create Tensors to hold input and outputs
x = torch.linspace(-math.pi,math.pi,2000)
y = torch.sin(x)

# tensor (x, x^2, x^3)
p = torch.tensor([1,2,3])
xx = x.unsqueeze(-1).pow(p)

model = torch.nn.Sequential(
    torch.nn.Linear(3,1),
    torch.nn.Flatten(0,1)
)

loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(2000):
    y_pred = model(xx)
    loss = loss_fn(y_pred,y)
    if t%100==99:
        print(t,loss.item())

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

Custom nn module

This implementation defines the model as a custom Modulesubclass. Anytime you want a more complex model than the simple sequence of existing modules, you need to define your model in this way.

import torch
import math

class SZTU(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self):
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

model = SZTU()

criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr = 1e-6)

for t in range(2000):
    y_pred = model(x)

    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')