深度学习之MiniBatch

版权声明:博客为作者平时学习备忘,参考资料已在文尾列出一并表示感谢。如若转载,请列明出处。 https://blog.csdn.net/woai8339/article/details/83473558

本文是Udacity的课程学习笔记,记录以备后忘:
我们先来看一个引子:
      Mini-batching在这一节,你将了解什么是 mini-batching,以及如何在 TensorFlow 里应用它。

Mini-batching 是一个一次训练数据集的一小部分,而不是整个训练集的技术。它可以使内存较小、不能同时训练整个数据集的电脑也可以训练模型。

Mini-batching 从运算角度来说是低效的,因为你不能在所有样本中计算 loss。但是这点小代价也比根本不能运行模型要划算。

它跟随机梯度下降(SGD)结合在一起用也很有帮助。方法是在每一代训练之前,对数据进行随机混洗,然后创建 mini-batches,对每一个 mini-batch,用梯度下降训练网络权重。因为这些 batches 是随机的,你其实是在对每个 batch 做随机梯度下降(SGD)。

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

问题1

计算 train_features,train_labels,weights, 和 bias 分别占用了多少字节(byte)的内存。可以忽略头部空间,只需要计算实际需要多少内存来存储数据。

train_features Shape: (55000, 784) Type: float32
train_labels Shape: (55000, 10) Type: float32
weights Shape: (784, 10) Type: float32
bias Shape: (10,) Type: float32

train_features占用多少字节内存?
一个float328位, 4位。
55000*784*4 = 172480000

输入、权重和偏置项总共的内存空间需求是 174MB,并不是太多。你可以在 CPUGPU 上训练整个数据集。

但将来你要用到的数据集可能是以 G 来衡量,甚至更多。你可以买更多的内存,但是会很贵。例如一个 12GB 显存容量的 Titan X GPU 会超过 1000 美金。所以,为了在你自己机器上运行大模型,你需要学会用 mini-batching

让我们看下如何在 TensorFlow 下实现 mini-batching

让我们看看你的机器能否训练出 MNIST 数据集的权重和偏置项。

TensorFlow Mini-batching

要使用 mini-batching,你首先要把你的数据集分成 batch

不幸的是,有时候不可能把数据完全分割成相同数量的 batch。例如有 1000 个数据点,你想每个 batch128 个数据。但是 1000 无法被 128 整除。你得到的结果是其中 7batch128 个数据点,一个 batch104 个数据点。(7*128 + 1*104 = 1000)

batch 里面的数据点数量会不同的情况下,你需要利用 TensorFlowtf.placeholder() 函数来接收这些不同的 batch

继续上述例子,如果每个样本有 n_input = 784 特征,n_classes = 10 个可能的标签,features 的维度应该是 [None, n_input]labels 的维度是 [None, n_classes]

Features and Labels

    features = tf.placeholder(tf.float32, [None, n_input])
    labels = tf.placeholder(tf.float32, [None, n_classes])

None 在这里做什么用呢?

None 维度在这里是一个 batch size 的占位符。在运行时,TensorFlow 会接收任何大于 0batch size

回到之前的例子,这个设置可以让你把 featureslabels 给到模型。无论 batch 中包含 128,还是 104 个数据点。
问题二

下列参数,会有多少 batch,最后一个 batch 有多少数据点?

features is (50000, 400)
labels is (50000, 10)
batch_size is 128

batch数量计算是50000//128+1=391, 最后一个batch是多少?50000-128*390=80
现在你知道了基本概念,让我们学习如何来实现 mini-batching

问题三

featureslabels 实现一个 batches 函数。这个函数返回每个有最大 batch_size 数据点的 batch。下面有例子来说明一个示例 batches 函数的输出是什么。

# 4 个特征
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]

# 4 个 label
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

example_batches = batches(3, example_features, example_labels)
example_batches 变量如下:

[
    # 分 2 个 batch:
    #   第一个 batch 的 size 是 3
    #   第二个 batch 的 size 是 1
    [
        # size 为 3 的第一个 Batch
        [
            # 3 个特征样本
            # 每个样本有四个特征
            ['F11', 'F12', 'F13', 'F14'],
            ['F21', 'F22', 'F23', 'F24'],
            ['F31', 'F32', 'F33', 'F34']
        ], [
            # 3 个标签样本
            # 每个标签有两个 label
            ['L11', 'L12'],
            ['L21', 'L22'],
            ['L31', 'L32']
        ]
    ], [
        # size 为 1 的第二个 Batch 
        # 因为 batch size 是 3。所以四个样品中只有一个在这里。
        [
            # 1 一个样本特征
            ['F41', 'F42', 'F43', 'F44']
        ], [
            # 1 个 label
            ['L41', 'L42']
        ]
    ]
]

将以下文件命名为quiz.py

import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    # TODO: Implement batching
    """
    output_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        output_batches.append(batch)
        
    return output_batches
    
    """
    row = len(features)
    batches = row//batch_size + 1
    result = []
    results = []
    feature = []
    label = []
    for i in range(0, batch_size):
        feature.append(features[i*batch_size:(i+1)*batch_size])
        label.append(labels[i*batch_size:(i+1)*batch_size])
        
    for feature_i in range(0, len(feature)):
        if len(feature[feature_i])>0:
            result.append(feature[feature_i])
            result.append(label[feature_i])
            results.append(result)
            result = []
    return results

将以下内容命名为sendbox.py:

import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    # TODO: Implement batching
    """
    output_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        output_batches.append(batch)
        
    return output_batches
    
    """
    row = len(features)
    batches = row//batch_size + 1
    result = []
    results = []
    feature = []
    label = []
    for i in range(0, batch_size):
        feature.append(features[i*batch_size:(i+1)*batch_size])
        label.append(labels[i*batch_size:(i+1)*batch_size])
        
    for feature_i in range(0, len(feature)):
        if len(feature[feature_i])>0:
            result.append(feature[feature_i])
            result.append(label[feature_i])
            results.append(result)
            result = []
    return results

但是上面这个代码太冗余了,我们简化成如下形式:

import math
def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    # TODO: Implement batching
    output_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        output_batches.append(batch)
        
    return output_batches

让我们用 mini-batching 来把 MNIST 特征和目标分批给到线性模型。

设定 batch size,用 batches 函数来分配所有数据。建议的 batch size 是 128,你也可以根据自己内存大小来改变它。

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from helper import batches

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    # TODO: Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

虽然准确度不高,但是你或许知道训练集不止用来训练一次。你可以用数据集多次训练一个模型。下一章节我们会讨论 “epochs” 这个话题。

猜你喜欢

转载自blog.csdn.net/woai8339/article/details/83473558