TensorFlow deep learning (2)-image classification

This article is one of the study notes of the book "TensorFlow Deep Learning"

Theoretical part

Handwritten digits picture collection

The generalization ability of a model means that the model also performs well on new samples. In order to improve the generalization ability, we should try to increase the scale and diversity of the data set, so that the training data set we use for learning and the distribution of real handwritten digital pictures are close, for example, everyone has different habits of handwritten numbers, so we should try our best It is possible to collect pictures of different writing styles.

Insert picture description here
Let's take a look at the representation method of the
picture : a picture usually has h rows and w columns, and each point stores the pixel value. The pixel value is generally expressed as an integer value to express the intensity of the Yang color:

  • Color picture: use a one-dimensional vector of length 3 to save the R, G, and B values
  • Grayscale picture: only one value is needed to express intensity, such as 0 means pure black, 255 means pure white, as shown in the figure below
    Insert picture description here

Considering the input format, a grayscale image is stored in a matrix with a shape of h rows and w columns, and b images are stored using a tensor of [b, h, w], and then the matrix is ​​flattened into a vector.
Consider the output format and use one-hot encoding: If the object belongs to the i-th category, set the index to the position of i to 1, and set the other position to 0, as shown in the figure:

Insert picture description here

Since there is a natural size relationship between numbers, the advantage is that it is convenient to store, so digital encoding is generally used for storage, tf.one_hot()and one-hot encoding is converted to a relatively sparse one-hot encoding for calculation.

Error calculation

In the classification problem, more is the sampling cross entropy (Cross Entropy) loss function, less the MSE of the regression problem. The loss function can be defined as:
Insert picture description here
the goal of model training is optimization:
Insert picture description here
here o = W x + bo=Wx+bO=Wx+b

Non-linear model

The expressive ability of the linear model is weak, and a more suitable model
Insert picture description here
can be learned by using a quadratic polynomial. We can use an activation function to convert the linear model into a nonlinear model, such as Sigmoid function and ReLU function. Our objective function becomes
o = R e LU (W x + b) o=ReLU(Wx+b)O=ReLU(Wx+b)

Experimental part

Model building

# 利用sequential容器封装3个网络层,前一层的输入作为下一层的输出
model = keras.Sequential([
    # 创建一层网络,设置输出节点数为256,激活函数为ReLU
    layer.Dense(256, activition - 'relu'),
    layer.Dense(128, activition - 'relu'),
    layer.Dense(10)
])

Complete code

link

import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers, Sequential, metrics

# 设置GPU使用方式
# 获取GPU列表
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # 设置GPU为增长式占用
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    except RuntimeError as e:
        # 打印异常
        print(e)

(xs, ys), _ = datasets.mnist.load_data()
print('datasets:', xs.shape, ys.shape, xs.min(), xs.max())

batch_size = 32

xs = tf.convert_to_tensor(xs, dtype=tf.float32) / 255.
db = tf.data.Dataset.from_tensor_slices((xs, ys))
db = db.batch(batch_size).repeat(30)

model = Sequential([layers.Dense(256, activation='relu'),
                    layers.Dense(128, activation='relu'),
                    layers.Dense(10)])
model.build(input_shape=(4, 28 * 28))
model.summary()

optimizer = optimizers.SGD(lr=0.01)
acc_meter = metrics.Accuracy()

for step, (x, y) in enumerate(db):

    with tf.GradientTape() as tape:
        # 打平操作,[b, 28, 28] => [b, 784]
        x = tf.reshape(x, (-1, 28 * 28))
        # Step1. 得到模型输出output [b, 784] => [b, 10]
        out = model(x)
        # [b] => [b, 10]
        y_onehot = tf.one_hot(y, depth=10)
        # 计算差的平方和,[b, 10]
        loss = tf.square(out - y_onehot)
        # 计算每个样本的平均误差,[b]
        loss = tf.reduce_sum(loss) / x.shape[0]

    acc_meter.update_state(tf.argmax(out, axis=1), y)

    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    if step % 200 == 0:
        print(step, 'loss:', float(loss), 'acc:', acc_meter.result().numpy())
        acc_meter.reset_states()

Experimental results

Insert picture description here
Insert picture description here
Insert picture description here

to sum up

This experiment uses a 3-layer nonlinear neural network for image classification, and its expressive ability is stronger than a single-layer linear regression model. It is very convenient to use tensorflow to update the parameters through the layer-by-layer gradient descent, and the final effect is also very good. I initially felt the power of deep learning and tensorflow.

Guess you like

Origin blog.csdn.net/Protocols7/article/details/107953481