准备工作

我自己在学的时候发现使用 tensorflow 提供的 API 来下载数据集时没有网速 (不管有没有梯子), 所以我决定手动下载数据集然后再自行解析.

工程目录

fashion
- fashion.py
- mnilist_loader.py
- data
  - fashion
    - t10k-images-idx3-ubyte.gz
    - t10k-labels-idx1-ubyte.gz
    - train-labels-idx1-ubyte.gz
    - train-images-idx3-ubyte.gz

数据文件可在 MINIST github 上自行下载

mnist_loader.py

用来读取并解析数据文件

def load_mnist(path, kind='train'):
    import os
    import gzip
    import numpy as np

    """Load MNIST data from `path`"""
    labels_path = os.path.join(path,
                               '%s-labels-idx1-ubyte.gz'
                               % kind)
    images_path = os.path.join(path,
                               '%s-images-idx3-ubyte.gz'
                               % kind)

    with gzip.open(labels_path, 'rb') as lbpath:
        labels = np.frombuffer(lbpath.read(), dtype=np.uint8,
                               offset=8)

    with gzip.open(images_path, 'rb') as imgpath:
        images = np.frombuffer(imgpath.read(), dtype=np.uint8,
                               offset=16).reshape(len(labels), 784)

    return images, labels

我的模型

fashion.py

import tensorflow as tf 
import mnist_loader as ml 
from tensorflow import keras
import numpy as np 


x_train, y_train = ml.load_mnist('data/fashion', 'train')
x_test, y_test = ml.load_mnist('data/fashion', 't10k')

x_train = x_train.reshape(-1, 28, 28, 1)
x_test = x_test.reshape(-1, 28, 28, 1)

x_train = x_train / 255.0
x_test = x_test / 255.0

model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(64, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer=keras.optimizers.Adam(), 
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=20)

test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test acc: %f' % test_acc)

结果

可以看到模型在测试集上的准确率达到了 91.99%, 同时在训练集上的准确率为 92.24%, 说明该模型的拟合程度刚刚好. 原因是我在全连接层中加入了两层 Dropout 来降低过拟合, 若没有这两层 Dropout, 则经检验发现模型在测试集上准确率为 91.93%, 而在训练集上的准确率高达 97.93%, 明显过拟合.

其他模型的对比

模型	测试集准确度
$C_{32, (3,3)}P_{(2,2)}C_{64, (3,3)}P_{(2,2)}C_{64,(3,3)}F_{64}F_{10}$	91.10%
$C_{32, (3,3)}P_{(2,2)}C_{64, (3,3)}P_{(2,2)}C_{64,(3,3)}F_{64}F_{32}F_{10}$	90.08%
$C_{32, (3,3)}P_{(2,2)}C_{64,(3,3)}F_{64}F_{10}$	91.93%
$C_{32, (3,3)}C_{64, (3,3)}F_{64}F_{10}$	90.93%
$C_{32, (3,3)}P_{(2,2)}C_{64, (3,3)}P_{(2,2)}C_{64,(3,3)}D_{0.5}F_{64}D_{0.5}F_{10}$	91.99%
$F_{64}F_{10}$	88.02%
$F_{128}F_{64}F_{10}$	88.95%

其中

$C_{32, (3,3)}$ 表示该层是卷积层, 有 32 个 3*3 的卷积核
$P_{(2,2)}$ 表示该层为 MaxPooling层, 核为 2*2
$F_{64}$ 表示该层为全连接层, 参数个数有 64个
$D_{0.5}$ 表示该层位 Dropout层, 概率为 0.5

总结

在图像识别领域 CNN 比传统神经网络更优
Pooling 层可以有效降低过拟合 ( $C_{32, (3,3)}C_{64, (3,3)}F_{64}F_{10}$ 模型的训练集精度高达 99.16%), 同时减少运算量, 但同时会提高误差 (因为丢失了一定信息)
Dropout 层也可以降低过拟合, 同时对精度的损失比较小
更多的层数不一定会带来更优的表现

Keras (一): Fashion MNIST

Keras 一: Fashion MNIST