Write a simple fully connected network from scratch (python version) - fully connected neural network (1)

Get into the habit of writing together! This is the 7th day of my participation in the "Nuggets Daily New Plan April Update Challenge", click to view the event details .

motivation

Everything is motivated, that is, we need to find a reason to do it. So why do we have to write a simple python version of a neural network from scratch. There are so many good neural network frameworks out there that allow us to define the network however we want. Then my writing must not be as good as them. Although everything is not absolute, it is basically impossible for me to surpass them. Haha, in fact, there is no comparison. But why do I still want to write, isn't it a waste of time? In fact, it is for Andrei Capas's suggestion that learning deep learning is best to write a network by yourself, which is the motivation for me to write a network from scratch.

design model

First of all, our model is a set of parameters. In our function set, that is, our network structure is determined, that is, this function set is determined. Through training, we need to find a suitable function in this function set. Each function corresponds to a parameter, also It is these parameters that are used to distinguish the difference between the functions in the function set.

premise

We need to understand neural networks, but this is all theoretical, we need to implement neural networks from scratch, using functional programming,

  • In order to facilitate the explanation of each step, such as building the model, forward propagation, back propagation, and updating parameters, we did not code based on object-oriented thinking, but used a functional way to code.
  • In the sharing, we will focus on the process of back propagation, which is a difficult point in neural network implementation.
  • By using numpy to achieve gradient descent to optimize parameters, so that everyone will have a deeper understanding of gradient descent

First, the model model is stored in the form of a dictionary, and the parameters are stored in the form of a dictionary,

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
复制代码

Here we only use numpy, a library for manipulating arrays, and matplotlib to do some visualization work.

Initialize the model

First of all, it should be explained that the network implemented here is still relatively elementary. It is a simple, series-structured network, and there is no complex structure. But I also hope that this series of sharing will not stop there.

def initialize_model(dimensions):
    '''
    输入 dimension 是一个定义网络各个层神经元个数的列表,返回是一个具有以下字段的字典结构的
    通过这个字典结果我们可以还原出一个网络大概结构或者说是形状
      model['nlayers'] : 神经网络的层数
      model['weights'] : 神经网络各层的权重(weight)矩阵的集合
      model['biases'] :  神经网络各层的偏置(bias)向量的集合
    '''
    weights, biases = [], []
    L = len(dimensions) - 1 # number of layers (i.e., excludes input layer)
    for l in range(L):
        W = np.random.randn(dimensions[l+1], dimensions[l])
        b = np.random.randn(dimensions[l+1], 1)
        weights.append(W)
        biases.append(b)
    return dict(weights=weights, biases=biases, nlayers=L)
复制代码

根据输入,我们可以看到输入是一个 784 维度向量,隐含层是 15 神经网元的全连接层,输出层 10 维向量。看到这里做过神经网络一定露出笑容,这不是 hello MNIST 吗,神经网络中的 hello world,784 是数字尺寸 28 x 28 大小图像展平的维度,输出 10 是 0 到 9 10 个数字的类别。

dimensions = [784, 15, 10]
model = initialize_model(dimensions)
for k, (W, b) in enumerate(zip(model['weights'], model['biases'])):
    print(f'Layer {k+1}:\tShape of W{k+1}: {W.shape}\tShape of b{k+1}: {b.shape}')
复制代码
Layer 1: Shape of W1: (15, 784) Shape of b1: (15, 1) 
Layer 2: Shape of W2: (10, 15) Shape of b2: (10, 1)
复制代码

其实这里只有 2 层全连接,其中第一层权重参数 W 15 × 784 W_{15 \times 784} 而输入是一个 784 维向量

Y 15 × 1 = W 15 × 784 x 784 × 1 + b 15 × 1 y_{15 \times 1} = W_{15 \times 784}x_{784 \times 1} + b_{15 \times 1}

而对于 2 层,也就是所谓输出层

我们可以同过下面代码来查看具体权重随机初始化的值

print(f'W2:\n\n{model["weights"][1]}')  
print(f'b2:\n\n{model["biases"][1]}')   
复制代码

实现激活函数、loss 函数以及其求导

激活函数

这里激活函数采用 sigmoid

σ ( x ) = 1 1 + exp ( x ) = exp ( x ) 1 + exp ( x ) \sigma(x) = \frac{1}{1 + \exp(-x)} = \frac{\exp(x)}{1 + \exp(x)}
import math

def sigmoid(x):
  return 1 / (1 + math.exp(-x))
复制代码

sigmoid 函数求导

σ ( x ) = σ ( x ) ( 1 σ ( x ) ) \sigma^{\prime}(x) = \sigma(x)(1 - \sigma(x))

为了增加 sigmoid 函数对于较大整数或者较大负数的 robust 修改如下

σ ( x ) = { 1 1 + exp ( x ) x 0 1 1 1 + exp ( x ) otherwise. \sigma(x) = \begin{cases} \frac{1}{1 + \exp(-x)} & x \ge 0\\ 1 -\frac{1}{1 + \exp(x)} & \text{otherwise.} \end{cases}
def sigma(x):
    '''这里输入是一个向量'''
    return np.where(x>=0, 1/(1+np.exp(-x)), 1 - 1/(1+np.exp(x))) 
def sigma_prime(x):
    return sigma(x)*(1-sigma(x)) # Derivative of logistic function
复制代码

Guess you like

Origin juejin.im/post/7083826043278229517