To practice before fully connected neural network handwritten numbers

Article first appeared in the personal blog

introduction

The article mentions logistics regression, the concept of multi-classification algorithm and gradient softmax, in fact, can lead to a natural depth study.

WiKi reference definition of:

Learning depth (deep learning) is a branch of machine learning, it is an attempt to use a complex structure comprising one or more process layer composed of multiple nonlinear transform algorithm for data abstraction level.

As early as 1958, proposed a model perceptron, that is the most simple linear perceptron model, at the time caused a great sensation, even made machine can replace people say, but later was questioned, it seems linear perceptron's obvious limitations.

Then in the 1980s, according to previous perceptron proposed multi-layer perceptron (also known as Neural Network), the model and the depth of today's neural network is not significantly different. 1986 proposed the concept of back-propagation, but typically greater than three-hidden layer has no effect, and neural network learning a problem with the gradient disappears.

Later, in 2006, on the above neural network algorithm model, made some improvements (RBM initialization), before the multi-layer perceptron changed names - Deep Learning again put forward in 2009 when the operation began DL using the GPU, behind that some progress has been ground-breaking applications in various fields, to fire up.

So, deep learning is nothing new, but for a slightly modified model names of old.

Before fully connected neural networks

A feedforward neural network-wide connection example shown below, which sigmod activation function is a function of the previously mentioned case through this fully connected neural networks, which are aware of the weight and bias, the input vector will be constantly changing Finally, the output of a vector.

3AE7177E-8176-41EF-B7AE-54874C0E6DE8

Generally, Fully Connect Feedforward Network architecture shown below, each input of the previous layer are connected to all the neural element in the next layer:
819348CB-888B-4214-8A58-3E9E8EE9

The input layer and output layer is a Vector, but not necessarily the same dimension, in which the hidden layer are generally multilayer, which is a Learning Deep Deep located.

The essence of the neural network is operational matrix operations, and this is the reason why the GPU can accelerate neural networks lies.

076D7461-22D5-4378-BED1-DA8F23AFB6B0

Examples

In recognition of handwritten numerals have been used previously as an example, were used to build two pytorch keras and fully connect feedforward network models, using the training data set Mnist.

First keras (Using TensorFlow backend.) The code is as follows:

#!/usr/local/bin/python3.6

import numpy as np
import os
import matplotlib.pyplot as plt
import keras
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils
from keras import backend as K

# 多核 CPU 使用设置
K.set_session(K.tf.Session(config=K.tf.ConfigProto(device_count={"CPU": 8},
                inter_op_parallelism_threads=8,
                intra_op_parallelism_threads=8,
                log_device_placement=True)))

# tensorboard 可视化        
tbCallBack = keras.callbacks.TensorBoard(log_dir='./Graph',
                                         histogram_freq=1,
                                         write_graph=True,
                                         write_images=True)

# 加载数据集
def load_data(file_path):
    f = np.load(file_path)
    x_train, y_train = f['x_train'], f['y_train']
    x_test, y_test = f['x_test'], f['y_test']
    f.close()
    return (x_train, y_train), (x_test, y_test)


# 初始化数据
(X_train, y_train), (X_test, y_test) = load_data('./mnist.npz')

X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

nb_classes = 10
# 将 label 数据转化为 one-hot,因为模型训练 loss 参数为 categorical_crossentropy
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

# 初始化一个 model
model = Sequential()
# 添加第一层,输入是784维,第一层节点为 500,激活函数为 relu
model.add(Dense(500, input_shape=(784,)))
model.add(Activation('relu'))
# model.add(Dropout(0.2))
# 添加第二层,节点为 500,激活函数为 relu
model.add(Dense(500))
model.add(Activation('relu'))
# model.add(Dropout(0.2))
# 添加输出层,输出 10 维,激活函数为 softmax
model.add(Dense(10))
model.add(Activation('softmax'))

# 配置模型训练参数,loss 使用多类的对数损失函数,optimizer 优化器使用 adam,模型性能评估使用 accuracy
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# 开始训练,batch_size为100, 10 个 epoch,callbacks调用 tensorboard
model.fit(X_train, Y_train,
          batch_size=100, epochs=10,
          validation_data=(X_test, Y_test),
          callbacks=[tbCallBack]
          )

score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])

This is a two-front fully connected neural network, trained for 10 epochs, the accuracy rate is as follows:
1F78B36F-9654-4FF3-9166-F827DD83B6B9

Without the GPU, CPU-run up too slow, the accuracy rate of 97.7%, which neural network structure as follows:
96202340-46DD-40B4-97DB-7642D377A42D

pytorch keras no use so simple, its code is as follows:

import os
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

# 多核 cpu 设置
os.environ["OMP_NUM_THREADS"] = "8"
os.environ["MKL_NUM_THREADS"] = "8"

# 设置使用 CPU
device = torch.device('cpu')

# 参数配置
input_size = 784
hidden_size = 500
num_classes = 10
num_epochs = 10
batch_size = 100
learning_rate = 0.001
# 1 MNIST dataset 加载图像数据
train_dataset = torchvision.datasets.MNIST(root='.',
                                           train=True,
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='.',
                                          train=False,
                                          transform=transforms.ToTensor())

# 2 Data loader pytorch的数据加载方式,tensorflow是没有的
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


# 3 Fully connected neural network with one hidden layer 定义网络
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out


model = NeuralNet(input_size, hidden_size, num_classes).to(device)

# 4 Loss and optimizer 定义损失和优化函数
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(),
                             lr=learning_rate)

# 5 Train the model 训练模型
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):  # batch size的大小
        # Move tensors to the configured device
        images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)

        # Forward pass 前向传播
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize 后向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

# Test the model 预测
# In test phase, we don't need to compute gradients (for memory efficiency)
with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28).to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'
            .format(100 * correct / total))

# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')

Accuracy as follows:
4E84C574-A959-4E97-9C9F-3BB5057F954F

On the whole time, slower than TF, again compiled from source to install or slow. .

Published 228 original articles · won praise 319 · views 500 000 +

Guess you like

Origin blog.csdn.net/hu1020935219/article/details/105271298