02|Li Mu hands-on learning deep learning v2 (notes)

Basic optimization algorithm


gradient descent

  • No solution is shown for our model. (It is rare to have a completely consistent linear model in life, and most models do not show solutions)
  • Gradient: The direction in which the value of a function increases the fastest.
  • Negative gradient: The direction in which the value of a function decreases fastest.
  • Learning rate η: How far along the direction η goes at a time. (eta is pronounced as: yita)
  • (-η * reciprocal) is where the function decreases the fastest. Then w0+(-η * reciprocal) is the position of w1.
    Insert image description here

1.1 Mini-batch stochastic gradient descent

  • Each time you find the gradient, you need to derive the entire loss function. This loss function is the average loss over all our samples. Means: 求一次梯度,需要对整个样本算一遍. about this 开销很大,很贵.
    Insert image description here
    Insert image description here

1.2 Summary

Insert image description here

Linear regression implementation

1. Process data

  • If you do not have the d2l package, you need to enter cmd and run it as an administrator. Enter: pip install -U d2l -i https://mirrors.aliyun.com/pypi/simple/Download.
  • If it reports: ModuleNotFoundError: No module named ‘torchvision’, enter directly in jupyter notebook:pip install torchvision -i https://mirrors.aliyun.com/pypi/simple/
    Insert image description here
    Insert image description here

This code defines a synthetic_datafunction called which is used to generate synthetic data.

This function accepts three parameters:

  • w: A one-dimensional tensor (vector) representing the weight of the model.
  • b: A scalar representing the bias term of the model.
  • num_examples: Integer, indicating the number of data samples to be generated.

The main steps of the function are as follows:

  1. Use to torch.normal(0, 1, (num_examples, len(w)))generate (num_examples, len(w))a random tensor with a standard normal distribution of shape X, with mean 0 and standard deviation 1.
  2. Use the matrix multiplication operator torch.matmul(X, w)to multiply Xthe weights wand then add the bias term bto get the predicted value y.
  3. Use random noise torch.normal(0, 0.01, y.shape)that is generated with the ysame shape and obeys the standard normal distribution and added yto it to simulate the noise of real data.
  4. Finally, use y.reshape((-1, 1))will ybe converted to (-1, 1)a 2D tensor of shape , where -1the size of that dimension is automatically calculated based on the size of the other dimensions.简单来说,reshape((-1,1))就是将数组转换成只有一列,行数不确定的二维数组。

The function returns the generated synthetic data Xand converted labels y.

Insert image description here
This code is an example of using the d2l library to draw a scatter plot.

d2l.set_figsize()Used to set the size of graphics, you can specify width and height.

d2l.plt.scatter(features[:,1].detach().numpy(), labels.detach().numpy(), 1)This line of code draws a scatter plot, where:

  • features[:,1]Indicates taking the second column of data from the feature matrix as the x coordinate;
  • labelsRepresents label data;
  • 1The radius of the scatter point is 1.

detach()The method detaches the tensor from the computational graph and returns an independent tensor in memory, which avoids modifying the original data when drawing. numpy()Method converts a tensor to a NumPy array for use in plotting functions.

1.3 Generate a mini-batch of size batch_size

Insert image description here
This code defines a data_iterfunction called an iterator that generates batches of training data.

The function accepts three parameters: batch_size, featuresand labels. Among them, batch_sizerepresents the number of samples in each batch, featuresis the feature matrix, and labelsis the label vector.

First, the function counts the total number of samples num_examplesand then creates a list containing the indices of all samples indices. Next, use random.shuffle()the method to randomly shuffle the index.

Next, the function uses a loop to generate the batch data. In each loop, it takes indices from the shuffled index list batch_sizeand converts these indices into a tensor batch_indices. Then, the function uses yieldthe statement to return the feature matrix and label vector corresponding to the current batch.

Due to the use of yieldthe statement, this function is a generator function that can generate batches of data one by one in a loop without loading all the data into memory at once. This can effectively reduce memory usage and improve training efficiency.

2. Processing models

Insert image description here
This code defines two PyTorch tensors wand b, used in linear regression models in neural networks.

w = torch.normal(0, 0.01, size=(2,1), requires_grad=True)This line of code creates a tensor of shape (2,1) wwhose elements are randomly sampled from a normal distribution with mean 0 and standard deviation 0.01. requires_grad=TrueIndicates that the gradient of this tensor needs to be calculated in order to update the parameters during backpropagation.

b = torch.zeros(1, requires_grad=True)This line of code creates a tensor of shape (1,) bwhose elements are all initialized to 0. requires_grad=TrueIndicates that the gradient of this tensor needs to be calculated in order to update the parameters during backpropagation.

In neural networks, wand busually represent the weight and bias terms of the linear regression model, respectively. By continuously iterating optimization algorithms (such as stochastic gradient descent), the values ​​of wand can be updated b, making the prediction results of the model closer and closer to the true value.

3. Model evaluation

Insert image description here
This code defines a sgdfunction called to perform the mini-batch stochastic gradient descent algorithm.

The function accepts three parameters: params, lrand batch_size. where paramsis a list or tensor containing model parameters, lris the learning rate, and batch_sizeis the number of samples in each batch.

The function uses torch.no_grad()a context manager to disable gradient calculations to avoid taking up too much memory during backpropagation.

Next, the function uses a loop to iterate over paramseach parameter in . For each parameter, it updates the parameter by first param.graddividing the current batch's gradient by the batch size batch_size, then multiplying by the learning rate lr, and subtracting this value from the original parameter value. Finally, use param.grad.zero_()to zero out the gradient for that parameter so that the new gradient value is used in the next iteration.

In summary, this code implements a simple stochastic gradient descent algorithm for training neural network models.

4. Training process

Insert image description here
The blocked code is: print(f'epoch{epoch+1}, loss{float(train_l.mean()):f}')
This code is a complete process for training a neural network model, including forward propagation, calculation loss, back propagation and parameter update.

First, a loop is defined forto iterate over multiple epochs. In each epoch, use data_iter()the function to generate a batch of data X,y, where batch_sizeis the number of samples in each batch, featuresis the feature matrix, and labelsis the label vector.

Next, for each data point X, use net(X,w,b)forward propagation to get the predicted value. Then, ythe loss between the predicted value and the true label is calculated loss(net(X,w,b),y).

Next, call l.sum().backward()backpropagation on the loss to calculate the gradient of each parameter. Then, the stochastic gradient descent algorithm is used sgd([w,b], lr, batch_size)to update the parameters wand b.

After each epoch, use with torch.no_grad():the context manager to turn off the gradient calculation to avoid taking up too much memory during the output training process. Then, use train_l = loss(net(features, w, b), labels)to calculate the loss of the current model on the test set, and print out the current epoch and average loss.

In short, this code implements a standard neural network training process, improving the performance of the model through continuous iterative optimization algorithms.

Insert image description here

Guess you like

Origin blog.csdn.net/qq_41714549/article/details/132651132
Recommended