Write a deep learning framework by hand (1) Build a neural network with pytorch

small target

Starting today, I will try to use numpy to implement a deep neural network framework, and refer to the pytorch implementation to achieve a deeper understanding of neural networks.

objective.png

References

  • George Hotz Share about tinygrad Share
  • pytorch official documentation

reference.jpeg

basic requirements

  • Learn about deep learning in general
  • Familiar with the python programming language
  • Proficient in mainstream python libraries such as numpy and matplotlab
  • Get familiar with pytorch

requirement.jpeg

Preparation

For the time being, we are going to build a deep learning framework based on numpy, numpyproviding good information on matrix operations and operations, which can save a lot of time in making too basic wheels. The API design of the deep learning framework will draw lessons from torch, an object-oriented and modular deep learning framework, and use torch as a teacher to compare and imitate step by step.

%pylab inline
import numpy as np
from tqdm import trange
np.set_printoptions(suppress=True)
import torch
import torch.nn as nn
# torch.set_printoptions(precision=2)
torch.set_printoptions(sci_mode=False)

Prepare the dataset

First use pytorch to implement a simple neural network, use neural network to recognize handwritten numbers, the input is a vector flattened by a digital image, and the output is a numerical value representing the number.

Before starting, prepare a suitable dataset. This time, we choose the classic entry-level dataset - MNIST dataset, which has 60k training samples and 10k test samples. After having the data set, pytorch first defines a 2-layer neural network and trains it so that the network can recognize the MNIST data set. Then try to implement a numpy-based neural network that implements a class, including forward propagation and back propagation.

def fetch(url):
  import requests, gzip, os, hashlib, numpy
  fp = os.path.join("/tmp", hashlib.md5(url.encode('utf-8')).hexdigest())
  if os.path.isfile(fp):
    with open(fp, "rb") as f:
      dat = f.read()
  else:
    with open(fp, "wb") as f:
      dat = requests.get(url).content
      f.write(dat)
  return numpy.frombuffer(gzip.decompress(dat), dtype=np.uint8).copy()
X_train = fetch("http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz")[0x10:].reshape((-1, 28, 28))
Y_train = fetch("http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz")[8:]
X_test = fetch("http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz")[0x10:].reshape((-1, 28, 28))
Y_test = fetch("http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz")[8:]

minist_dataset.png

X_train.shape #(60000, 28, 28)

It can be seen that there are a total of 60k images in the training data set, each image size is 28 x 28, check the effect of a lower data sample image

imshow(X_train[0],cmap='gray')

001.png

Define the model

写一个简单 2 层的神经网络,通常输入层是不会计入神经网络的层数中。输入向量维度 784 向量是将图像 28 x 28 展平为 784 输入,输入样本形状(m,784) ,这里 m 表示一批次样本的数量。第一层神经网络参数(784x128) 经过第一层后形状(m,128)。这里激活函数选择 ReLU 这个激活函数。第二层神经网络参数(128,10) 输出形状(m,10) 10 对应于 10 个类别。

neural_network.png

class ANet(torch.nn.Module):
    def __init__(self):
        super(ANet,self).__init__()
        self.l1 = nn.Linear(784,128)
        self.act = nn.ReLU()
        self.l2 = nn.Linear(128,10)
    def forward(self,x):
        x = self.l1(x)
        x = self.act(x)
        x = self.l2(x)
        return x
model = ANet()

需要输入数据格式和类型需要满足模型要求

  • 最后两个维度进行展平输入维度(m,784)
  • 类型为 pytorch 提供的 tensor 类型,数值类型为浮点类型的数据
model(torch.tensor(X_train[0:10].reshape((-1,28*28))).float())

epochs = 10
tbar = trange(epochs)
for i in tbar:
    tbar.set_description(f"iterate {i}\n")

每次迭代随机从数据集中抽取一定数量的样本,这里使用 np.random,.randint随机在指定区间内生成 size 个整数。

epochs = 10
batch_size = 32
tbar = trange(epochs)
for i in tbar:
    samp = np.random.randint(0,X_train.shape[0],size=(batch_size))
    print(samp)

训练

training.jpeg

开始训练,我们需要定义 epochs 也就是迭代次数,而不是 epoch,名字起的有点容易产生歧义,epoch 是将数据集所有数据都参与到训练一次,每次迭代样本数量用 batch_size 来定义也就是定义.

epochs = 10
batch_size = 32
tbar = trange(epochs)
# 定义损失函数,损失函数使用交叉熵损失函数
loss_fn = nn.CrossEntropyLoss()
# 定义优化器
optim = torch.optim.Adam(model.parameters())
for i in (t:=trange(epochs)):
    #对数据集中每次随机抽取批量数据用于训练
    samp = np.random.randint(0,X_train.shape[0],size=(batch_size))
    X = torch.tensor(X_train[samp].reshape((-1,28*28))).float()
    Y = torch.tensor(Y_train[samp]).long()
    # 将梯度初始化
    optim.zero_grad()
    
    # 模型输出
    out = model(X)
    #计算损失值
    loss = loss_fn(out,Y)
    # 计算梯度
    loss.backward()
    # 更新梯度
    optim.step()
    t.set_description(f"loss {loss.item():0.2f}")

定义损失函数

数据经过神经网络输出为 10 维数据,这里 10 就是分类数量,也可以 C 来表示分类数量,那么就是 C 维,也就是输出为 (m,C) 数据,m 表示样本数量,C 表示每一个样本会对每一个类别输出一个值,表示属于某一个类别可能性。所以需要对这些数字进行一个标准化,也就是让这些输出为一个概率分布,概率值大表示属于某一个类别可能性大。通常用 softmax

exp ( x i ) j = 1 n exp ( x j ) \frac{\exp(x_i)}{\sum_{j=1}^n \exp(x_j)}

损失函数采用多分类 nn.CrossEntropyLoss() ,其实 CrossEntropyLoss 包括将输出进行 softmax 将输出标准化为概率,然后在用负对数似然来计算两个概率之间距离, Y i log ( s i ) -y_i \log(s_i) 这是 y i y_i 正确标签

l = ( x , y ) = L { l 1 , , l N } T l=(x,y) = L\{l_1,\cdots,l_N\}^T
l n = w y n exp ( x n , y n ) c = 1 C exp ( x n , c ) l_n = -w_{y_n} \frac{\exp(x_{n,y_n})}{\sum_{c=1}^C \exp(x_{n,c})}
epochs = 10
batch_size = 32
tbar = trange(epochs)
# 定义损失函数,损失函数使用交叉熵损失函数
loss_fn = nn.CrossEntropyLoss()
# 定义优化器
optim = torch.optim.Adam(model.parameters())

losses,accs = [],[]

for i in (t:=trange(epochs)):
    #对数据集中每次随机抽取批量数据用于训练
    samp = np.random.randint(0,X_train.shape[0],size=(batch_size))
    X = torch.tensor(X_train[samp].reshape((-1,28*28))).float()
    Y = torch.tensor(Y_train[samp]).long()
    # 将梯度初始化
    optim.zero_grad()
    
    # 模型输出
    out = model(X)
    #计算准确度
    pred = torch.argmax(out,dim=1)
    acc = (pred == Y).float().mean()
    
    #计算损失值
    loss = loss_fn(out,Y)
    # 计算梯度
    loss.backward()
    # 更新梯度
    optim.step()
    # 
    loss, acc = loss.item(),acc.item()
    losses.append(loss)
    accs.append(acc)
    t.set_description(f"loss:{loss:0.2f}, acc: {acc:0.2f}")

Before starting to calculate the gradient, you need to clear the previously calculated gradient, optim.zero_grad()otherwise gradient will be accumulated. Then input the real label Y and the predicted output out to the loss function to calculate the loss value, and then call the loss.backward()method to perform backpropagation, which will calculate the derivative of each model parameter, and then use the gradient to update each parameter once, This is optim.step()the work to do.

plot(losses)

002.png

I am participating in the recruitment of the creator signing program of the Nuggets Technology Community, click the link to register and submit .

Guess you like

Origin juejin.im/post/7116868104113618957