Python handwritten digit recognition, the corresponding mathematical formulas and Detailed procedures
First, the mathematical derivation
Special detailed mathematical derivation, easy to understand.
Two program code
Third, the code and the corresponding equation
we must first look at their mathematical derivation, the code can not see. But the derivation sure read them.
1.BP algorithm four core formula
parameter list
x = np.array(x)
y = np.array(y)
weights = [] # 权重列表
bias = [] # 偏置列表
Where x is an input variable, y is the output variable
x is 33 555 sample values, a variable x 784 corresponds to [x . 1 , x 2 , x . 3 , ..., x 784 ]
y 33555 also sample values, a y 0-9 has an output corresponding to a number of
weights are weight, bias is a bias
similar to y = ax + b, a equivalent weight, bias equivalent b.
weights for the weight, it is seen from the above picture has 2 elements, respectively corresponding to the first layer (input layer) -> the second layer (hidden layer), second layer (hidden layer) -> third layer (output layer) , it can be seen this is a three-layer perception neural network has.
Note: weight, bias, are randomly initialized. E.g
layers = [784,784,10]
for i in range(1, len(layers)): # 正态分布初始化
self.weights.append(np.random.randn(layers[i-1], layers[i]))
self.bias.append(np.random.randn(layers[i]))
Activation function
def sigmoid(x): # 激活函数采用Sigmoid
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x): # Sigmoid的导数
return sigmoid(x) * (1 - sigmoid(x))
First, the back-propagation algorithm
a = [x[k % n]] # 保存各层激活值的列表
represents the output of a neuron, it is for the input layer neuron output layer corresponding to a first sample value x is
forward propagation begins
for lay in range(len(self.weights)):
a.append(self.activation(np.dot(a[lay], self.weights[lay]) + self.bias[lay]))
Wherein activation is the activation function of the output code loop means neurons of each layer multiplied by the corresponding weights plus a bias. Function value obtained by activating
backpropagation start
label = np.zeros(a[-1].shape)#shape得到a[-1]的维度
label[y[k % n]] = 1 # 根据类号生成标签
error = label - a[-1] # 误差值
#a[-1]从后往前第一组数据
deltas = [error * self.activation_deriv(a[-1])] # 保存各层误差值的列表
layer_num = len(a) - 2 # 导数第二层开始
for j in range(layer_num, 0, -1):
deltas.append(deltas[-1].dot(self.weights[j].T) * self.activation_deriv(a[j])) # 误差的反向传播
deltas.reverse()
for iteration-1 in reverse order to derive meaning back deltas, this is precisely the reverse derivation of the point.
Forward weights are updated
for i in range(len(self.weights)): # 正向更新权值
layer = np.atleast_2d(a[i])
delta = np.atleast_2d(deltas[i])
self.weights[i] += learning_rate * layer.T.dot(delta)
self.bias[i] += learning_rate * deltas[i]
Code
self.weights[i] += learning_rate * layer.T.dot(delta)
self.bias[i] += learning_rate * deltas[i]
Solving the corresponding parameters, which refers to a layer layer.T matrix Go
End
So far the basic aspects of your program and introduce corresponding to the formula, if there is wrong place, please understand Gangster pointing or two.
Finally, thanks to provide formula deduction and write code bigwigs.