Python implementation of deep learning series [forward propagation and back propagation]

Preface

Before understanding the deep learning framework, we need to understand or even implement a process of network learning and parameter adjustment by ourselves, and then understand the mechanism of deep learning;

For this reason, the blogger here provides an example written by himself to lead everyone to understand the process of forward propagation and back propagation of online learning;

In addition, in order to achieve batch reading, I also designed and provided a simple DataLoader class to simulate the sampling of data iterators in deep learning; and provided a function to access the model;


It is worth noting that it is only implemented in python , so the demand for the environment is not very large. I hope you can star my blog and github more to learn more useful knowledge! !


table of Contents

One, achieve the effect

Second, the overall code framework

Three, detailed code description

1. Data processing

2. Network design

3. Activation function

4. Training

Four, training demonstration

Five, summary

 


One, achieve the effect

Implement a network composed of multiple Linear layers to fit the function, project address: https://github.com/nickhuang1996/HJLNet , run:

python demo.py

The fitting function is :y = \sin (2\pi x),0\leqslant x\leqslant 2

The following results are from left to right (learning rate is 0.03, batchsize is 90):

Epoch: 400, 1000, 2000, 10000 or more


Second, the overall code framework


Three, detailed code description

1. Data processing

Dataset.py

x is the data between 0 and 2, and the step size is 0.01, so it is 200 data;

y is the objective function, the amplitude is 20;

length is the data length;

_build_items() is to build a dict to store x and y;

_transform() is to transform x and y data;

import numpy as np


class Dataset:
    def __init__(self):

        self.x = np.arange(0.0, 2.0, 0.01)
        self.y = 20 * np.sin(2 * np.pi * self.x)
        self.length = len(list(self.x))
        self._build_items()
        self._transform()

    def _build_items(self):
        self.items = [{
            'x': list(self.x)[i],
            'y': list(self.y)[i]
        }for i in range(self.length)]

    def _transform(self):
        self.x = self.x.reshape(1, self.__len__())
        self.y = self.y.reshape(1, self.__len__())

    def __len__(self):
        return self.length

    def __getitem__(self, index):
        return self.items[index]

DataLoader.py

Similar to the DataLoader in Pytorch, the blogger also passes in two parameters for initialization here: dataset and batch_size

__next__() is the function executed in each iteration, using __len__() to get the length of the dataset, and using __getitem__() to get the data in the dataset;

_concate() is to concatenate the data of a batch;

_transform() is to transform the data form of a batch;

import numpy as np


class DataLoader:
    def __init__(self, dataset, batch_size):
        self.dataset = dataset
        self.batch_size = batch_size
        self.current = 0

    def __next__(self):
        if self.current < self.dataset.__len__():
            if self.current + self.batch_size <= self.dataset.__len__():
                item = self._concate([self.dataset.__getitem__(index) for index in range(self.current, self.current + self.batch_size)])
                self.current += self.batch_size
            else:
                item = self._concate([self.dataset.__getitem__(index) for index in range(self.current, self.dataset.__len__())])
                self.current = self.dataset.__len__()
            return item
        else:
            self.current = 0
            raise StopIteration

    def _concate(self, dataset_items):
        concated_item = {}
        for item in dataset_items:
            for k, v in item.items():
                if k not in concated_item:
                    concated_item[k] = [v]
                else:
                    concated_item[k].append(v)
        concated_item = self._transform(concated_item)
        return concated_item

    def _transform(self, concated_item):
        for k, v in concated_item.items():
            concated_item[k] = np.array(v).reshape(1, len(v))
        return concated_item

    def __iter__(self):
        return self

2. Network design

Linear.py

Similar to Linear in Pytorch, the blogger also passes in three parameters for initialization here: in_features, out_features, bias

_init_parameters() is the initialization weight weight and bias bias , the weight size is [out_features, in_features] , and the bias size is [out_features, 1]

forward is forward propagation:y = wx+b

import numpy as np


class Linear:
    def __init__(self, in_features, out_features, bias=False):
        self.in_features = in_features
        self.out_features = out_features
        self.bias = bias
        self._init_parameters()

    def _init_parameters(self):
        self.weight = np.random.random([self.out_features, self.in_features])
        if self.bias:
            self.bias = np.zeros([self.out_features, 1])
        else:
            self.bias = None

    def forward(self, input):
        return self.weight.dot(input) + self.bias

*network.py

A simple multi-layer Linear network

_init_parameters() stores the weights and paranoias in the Linear layer in a dict;

forward() is forward propagation, the last layer does not pass through Sigmoid;

backward() is back propagation, using gradient descent to achieve error transfer and parameter tuning: for example, the back propagation of a two-layer Linear layer is as follows

dz <[1]} = a <[1]} - y}

dW ^ {[1]} = dz ^ {[1]} a ^ {[1]} ^ {T}}

db^{[1]}=dz^{[1]}

dz ^ {[0]} = W ^ {[1]} ^ {T} dz ^ {[1]} \ ast S ^ {[0]} '(z ^ {[0]})}

dW^{[0]}=dz^{[0]}x^{T}}

db^{[0]}=dz^{[0]}

update_grads() is to update the weight and bias;

# -*- coding: UTF-8 -*-
import numpy as np
from ..lib.Activation.Sigmoid import sigmoid_derivative, sigmoid
from ..lib.Module.Linear import Linear

class network:
    def __init__(self, layers_dim):
        self.layers_dim = layers_dim
        self.linear_list = [Linear(layers_dim[i - 1], layers_dim[i], bias=True) for i in range(1, len(layers_dim))]
        self.parameters = {}
        self._init_parameters()

    def _init_parameters(self):
        for i in range(len(self.layers_dim) - 1):
            self.parameters["w" + str(i)] = self.linear_list[i].weight
            self.parameters["b" + str(i)] = self.linear_list[i].bias

    def forward(self, x):
        a = []
        z = []
        caches = {}
        a.append(x)
        z.append(x)

        layers = len(self.parameters) // 2

        for i in range(layers):
            z_temp = self.linear_list[i].forward(a[i])
            self.parameters["w" + str(i)] = self.linear_list[i].weight
            self.parameters["b" + str(i)] = self.linear_list[i].bias
            z.append(z_temp)
            if i == layers - 1:
                a.append(z_temp)
            else:
                a.append(sigmoid(z_temp))
        caches["z"] = z
        caches["a"] = a
        return caches, a[layers]

    def backward(self, caches, output, y):
        layers = len(self.parameters) // 2
        grads = {}
        m = y.shape[1]

        for i in reversed(range(layers)):
            # 假设最后一层不经历激活函数
            # 就是按照上面的图片中的公式写的
            if i == layers - 1:
                grads["dz" + str(i)] = output - y
            else:  # 前面全部都是sigmoid激活
                grads["dz" + str(i)] = self.parameters["w" + str(i + 1)].T.dot(
                    grads["dz" + str(i + 1)]) * sigmoid_derivative(
                    caches["z"][i + 1])
            grads["dw" + str(i)] = grads["dz" + str(i)].dot(caches["a"][i].T) / m
            grads["db" + str(i)] = np.sum(grads["dz" + str(i)], axis=1, keepdims=True) / m
        return grads

    # 就是把其所有的权重以及偏执都更新一下
    def update_grads(self, grads, learning_rate):
        layers = len(self.parameters) // 2
        for i in range(layers):
            self.parameters["w" + str(i)] -= learning_rate * grads["dw" + str(i)]
            self.parameters["b" + str(i)] -= learning_rate * grads["db" + str(i)]

3. Activation function

Sigmoid.py

Formula definition:S(x)=\frac{1}{1+e^{-x}}

The derivative can be represented by itself:S'(x)=\frac{e^{-x}}{(1+e^{-x})^2}=S(x)(1-S(x))

import numpy as np


def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))


def sigmoid_derivative(x):
    return sigmoid(x) * (1 - sigmoid(x))

4. Training

demo.py

The entry file of the training model, including training , testing and storage models

from code.scripts.trainer import Trainer
from code.config.default_config import _C


if __name__ == '__main__':
    trainer = Trainer(cfg=_C)
    trainer.train()
    trainer.test()
    trainer.save_models()

default_config.py

Configuration file :

layers_dim represents the input and output dimensions of the Linear layer;

batch_size is the size of the batch;

total_epochs is the overall training time, one training x is an epoch;

Resume is to judge to continue training;

result_img_path is the path of result storage;

ckpt_path is the path where the model is stored;

from easydict import EasyDict


_C = EasyDict()
_C.layers_dim = [1, 25, 1] # [1, 30, 10, 1]
_C.batch_size = 90
_C.total_epochs = 40000
_C.resume = True  # False means retraining
_C.result_img_path = "D:/project/Pycharm/HJLNet/result.png"
_C.ckpt_path = 'D:/project/Pycharm/HJLNet/ckpt.npy'

trainer.py

I won’t go into details here, I mainly use train() for training and test() for testing.

from ..lib.Data.DataLoader import DataLoader
from ..scripts.Dataset import Dataset
from ..scripts.network import network
import matplotlib.pyplot as plt
import numpy as np


class Trainer:
    def __init__(self, cfg):
        self.ckpt_path = cfg.ckpt_path
        self.result_img_path = cfg.result_img_path
        self.layers_dim = cfg.layers_dim
        self.net = network(self.layers_dim)
        if cfg.resume:
            self.load_models()
        self.dataset = Dataset()
        self.dataloader = DataLoader(dataset=self.dataset, batch_size=cfg.batch_size)
        self.total_epochs = cfg.total_epochs
        self.iterations = 0
        self.x = self.dataset.x
        self.y = self.dataset.y
        self.draw_data(self.x, self.y)

    def train(self):
        for i in range(self.total_epochs):

            for item in self.dataloader:
                caches, output = self.net.forward(item['x'])
                grads = self.net.backward(caches, output, item['y'])
                self.net.update_grads(grads, learning_rate=0.03)
                if i % 100 == 0:
                    print("Epoch: {}/{} Iteration: {} Loss: {}".format(i + 1,
                                                                       self.total_epochs,
                                                                       self.iterations,
                                                                       self.compute_loss(output, item['y'])))
                self.iterations += 1

    def test(self):
        caches, output = self.net.forward(self.x)
        self.draw_data(self.x, output)
        self.save_results()
        self.show()

    def save_models(self):
        ckpt = {
            "layers_dim": self.net.layers_dim,
            "parameters": self.net.linear_list
        }
        np.save(self.ckpt_path, ckpt)
        print('Save models finish!!')

    def load_models(self):
        ckpt = np.load(self.ckpt_path).item()
        self.net.layers_dim = ckpt["layers_dim"]
        self.net.linear_list = ckpt["parameters"]
        print('load models finish!!')

    def draw_data(self, x, y):
        plt.scatter(x, y)

    def show(self):
        plt.show()

    def save_results(self):
        plt.savefig(fname=self.result_img_path, figsize=[10, 10])

    # 计算误差值
    def compute_loss(self, output, y):
        return np.mean(np.square(output - y))

Four, training demonstration

During training, the training time, number of iterations and loss changes will be output, and the model and results will be stored after training.

1. Start training

2. After training, read the last model to continue training

3. Results display


Five, summary

In this way, I know the forward and back propagation process of a basic network training process. Later, more detailed codes and principles will be updated to help you learn the knowledge and concepts of deep learning~

Guess you like

Origin blog.csdn.net/qq_36556893/article/details/109224708