TensorFlow Learning: Image classification using official models and your own training data

Preface

Tutorial source:Tsinghua University boss talks about machine vision again! TensorFlow+Opencv: Practical tutorial on deep learning machine vision image processing, object detection/defect detection/image recognition

Note:

This tutorial is somewhat different from the official website tutorial. The API in the tutorial is relatively old, and the core idea has not changed.

Previous articleTensorFlow learning: using the official model for image classification and using your own data to fine-tune the model is based on the official case The classification of realization, this time is the classification of realization from another angle.

basic knowledge

I have never learned this part of basic knowledge before, so this time I just learned it briefly based on the video tutorial.

Hard

Introduction
Keras is an open source deep learning framework, which is a high-level neural network API built on Python. It provides a simple, intuitive interface that makes it easier to build, train, and deploy deep learning models.

TensorFlow 1.9 was later integrated with Keras. Its API can be used in TensorFlow.

Keras related modules

  • applications:Kears application is a fixed architecture with pre-trained weights
  • callback: Utilities called at certain points during training the model
  • datasets: Keras built-in dataset
  • initializers: Keras initializer, used to set the initial values ​​of the weights and biases of the neural network model. The initial values ​​of weights and biases have a great impact on the training and convergence speed of the model.
  • layers: Keras layer API, the layers module provides various types of layers for building different types of neural network architectures. For example: Dense (fully connected layer), Conv2D (convolutional layer)
  • losses: used to define the loss function. The loss function is a measure of the difference between the model's predictions and the true labels.
  • metrics: Used to define evaluation metrics to measure the performance of the model. For example, evaluate model performance based on accuracy
  • model:Model
  • optimizers: Built-in optimizer
  • preprocessing: Data preprocessing tool
  • regularizers: Built-in normalizer
  • utils: Some built-in tool classes

Build a neural network model

The following code is an official case:https://tensorflow.google.cn/overview?hl=zh-cn

It is recommended to watch the introduction to neural networks in the video tutorial to have a better understanding.

# 第一步,加载数据集、并进行归一化
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# 第二步,构建神经网络模型
model = tf.keras.models.Sequential([
    # 将输入的图像数据展平为一维数组
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    # 创建一个有128个神经元和ReLU激活函数的全连接层,用于提取图像特征
    tf.keras.layers.Dense(128, activation='relu'),
    # 使用Dropout层,以防止过拟合
    tf.keras.layers.Dropout(0.2),
    # 最后一层是具有10个神经元和softmax激活函数的全连接层,用于输出分类的概率分布。10 是因为有10中分类类别
    tf.keras.layers.Dense(10, activation='softmax')
])
# 第三步,配置模型的优化器、损失函数和评估指标。
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# 第四步,训练模型,训练5轮,在每一轮训练时会将所有数据进行分组,每一个组里有128张图片,批次最好是 2的次方,符合计算机2进制运算
model.fit(x_train, y_train, epochs=5, batch_size=128)
# 第五步,对模型进行测试,输出损失值、准确率
model.evaluate(x_test, y_test)

Insert image description here

Why use relu activation function
When building a neural network model, the selection of activation function is usually based on the following factors:

  • Nonlinear properties: The nonlinear properties of the activation function are the key to the neural network's ability to learn and represent complex functional relationships. Because the combination of multiple linear layers is still linear, we need to use nonlinear functions to introduce nonlinear transformations . Common nonlinear activation functions include ReLU (Rectified Linear Unit), Sigmoid, Tanh, etc.

  • Gradient disappearance and gradient explosion problems: In deep neural networks, the propagation of gradients may cause gradient disappearance or gradient explosion problems. Vanishing gradient means that during the backpropagation process, the gradient gradually decreases to close to zero, causing the underlying weights to be updated very slowly. Gradient explosion means that the gradient gradually increases, causing the underlying weights to update very quickly. Appropriate activation functions can alleviate these problems. For example, the ReLU activation function can effectively suppress gradient disappearance and gradient explosion.

  • Computational efficiency: The computational efficiency of the activation function is also a factor in the choice. The calculation of some activation functions is relatively simple and can speed up the training and inference process of the model.

According to the specific task and network structure, selecting an appropriate activation function is an experimental process. In practice, ReLU is the most commonly used activation function, but other activation functions can also be tried according to needs to improve model performance.

Why use softmax activation function

When building a classification model, the softmax function is often used as the activation function of the last layer. The softmax function converts the output of the neural network into a probability distribution for multi-category classification tasks.

The softmax function converts the input vector into a probability distribution vector, where each element represents the probability of the corresponding category. Specifically, for the output value of each neuron of the output layer, the softmax function converts it into a real number between 0 and 1, and the sum of all elements is 1. The advantage of this is that the output of the model can be directly interpreted as the confidence or probability of each category.

convolutional neural network

See the principle:https://www.bilibili.com/video/BV1ee411K7WU?p=36&vd_source=fd72ff60b43cc949b3316d103871c31c

Basic structure
Convolutional neural networks are generally used to solve image problems. Convolutional neural networks mainly have the following structures:

  • Convolutional layer: extract different features of the input
  • Pooling layer: Reduce the number of features in the image and avoid too many parameters in the fully connected layer
  • Fully connected layer: The fully connected layer usually follows the convolutional layer and the pooling layer. It flattens the output of the convolutional layer and the pooling layer, then connects it to one or more fully connected layers, and finally outputs forecast result.

Convolutional Neural Network API

  • Conv2D: Implementing convolution
  • MaxPool2D: Pooling operation

For example:

# 设置卷积核为32,卷积核大小为5*5,卷积核步长为1,采用same填充方式,通道数放在最后,使用relu激活函数
tf.keras.layers.Conv2D(32, kernel_size=5, strides=1, padding='same',
                       data_format='channels_last', activation='relu')
# 设置池化窗口为2*2,池化操作步长为2,采用same填充方式
tf.keras.layers.MaxPool2D(pool_size=2,strides=2,padding='same')

In the convolutional layer, in image classification tasks, a commonkernel_size value is 3 or 5, while in object detection tasks, a larger one is usually chosen a>kernel_size. It is generally recommended to use odd-sized oneskernel_size to ensure center alignment and avoid edge problems, etc.

In the convolution layer, the number of convolution kernels is an important hyperparameter, which will affect the performance and effect of the model. Typically, the number of convolution kernels in a convolutional layer gradually increases. A common approach is to start with a smaller number of convolution kernels and gradually increase the number of convolution kernels until a level that meets performance requirements is reached.

In the pooling layer, the pool_size parameter indicates the size of the pooling window. Commonpool_size values ​​include 2x2, 3x3, 4x4, etc.

Photo explanation

Composition Features
The feature values ​​​​that make up a picture are all pixel values, and have three dimensions: picture length, picture width, and number of picture channels.

To describe a pixel, if it is a grayscale image, then only one value is needed to describe it, which is a single channel. If a pixel has three colors, RGB, to describe it, it is a three-channel

  • Grayscale image: single channel
  • Color image: three channels

In TensorFlow, images are represented by tensors

  • Single image: (height, width, number of channels)
  • Multiple pictures: (number of pictures in a batch, height, width, number of channels)

Image reading and processing

Read pictures

import tensorflow as tf 

# 加载图片,并加图片大小设置为224 * 224
image = tf.keras.preprocessing.image.load_img('./images/flower.jpg',target_size=(224,224))

print("图片:",image)

Different models have different requirements for the size of input images. The image size needs to be adjusted to match the input of the model.
Insert image description here
Convert the image to array format
The read image cannot be used directly and needs to be converted into array format (tensor)

# 转换成数组
img_arr = tf.keras.preprocessing.image.img_to_array(image)
print("图片形状:", img_arr)

Insert image description here
Some models also normalize the array, img_arr = img_arr / 255.0 . Divided by 255 because the three primary color values ​​are 0~255.

Note: img_to_array has a second parameter which is the formatting method, the value is channels_first or channels_last. That is, whether the number of channels of the image is in the front or at the back. Different frameworks may have different requirements. TensorFlow defaults to the number of channels in the back.

Picture shape
The input of the image to the model is generally three-dimensional or four-dimensional, which can be viewed or modified to ensure that it meets the requirements of the model

# 加载图片,并加图片大小设置为224 * 224
image = tf.keras.preprocessing.image.load_img(
    './images/flower.jpg', target_size=(224, 224))

print("图片:", image)

# 转换成数组
img_arr = tf.keras.preprocessing.image.img_to_array(image)

print("图片形状:", img_arr.shape) # 三维 (224, 224, 3)

# 有些模型需要四维模型,可以进行转换
new_img = img_arr.reshape(1,img_arr.shape[0],img_arr.shape[1],img_arr.shape[2])
print("四维:", new_img.shape)  # (1, 224, 224, 3)

Insert image description here

Image classification

Here is a brief introduction to transfer learning based onmobilenet_v2. A method was introduced in TensorFlow Learning: Using the official model for image classification and using your own data to fine-tune the model. The method in the article is from in the official documentation.

The method here is derived from the video tutorial:Model definition

Training model

import tensorflow as tf
# matplotlib是用于绘制图表和可视化数据的库
import matplotlib.pylab as plt
import datetime

# 加载内置的模型,include_top=False不使用默认的分类
base_model = tf.keras.applications.mobilenet_v2.MobileNetV2(include_top=False)

# 冻结模型训练数据,冻结模型结构是为了保持预训练模型的权重不受训练的影响
# 训练数据少时只需要训练全连接层即可
for layer in base_model.layers:
    layer.trainable = False

# 初始化类,并归一化
train_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1.0/255.0)
test_generator = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1.0/255.0)
# 读取训练集
train = train_generator.flow_from_directory(
    directory='data/train',  # 文件目录
    target_size=(224, 224),  # 处理图片大小,(h,w)
    batch_size=32,  # 批次数量
    class_mode='categorical' # 设置类别模式为,根据文件夹确定类别
)
# 读取验证集
test = test_generator.flow_from_directory(
    directory='data/validation', # 文件目录
    target_size=(224, 224),  # 处理图片大小,(h,w)
    batch_size=32,  # 批次数量
    class_mode='categorical' # 设置类别模式为,根据文件夹确定类别
)

#print(train, test)
print(base_model.summary())
#print("输入:",base_model)

# 微调模型
x = base_model.outputs[0]   # 移除分类后的模型输出
#print('x:', x)
# 输出到全连接层,加上全局池化
x = tf.keras.layers.GlobalAveragePooling2D()(x)
# 添加一个有1024个神经元使用relu激活函数的全连接层
x = tf.keras.layers.Dense(1024, activation='relu')(x)
y_predict = tf.keras.layers.Dense(2, activation='softmax')(x)  # 全连接层,这里两个神经元是因为只有图片只有两类

# 新模型
new_model = tf.keras.models.Model(inputs=base_model.inputs, outputs=y_predict)
print("新模型:",new_model)

# 编译模型
new_model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# 记录训练日志
log_dir = "logs/fit/" + datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
# 用于在训练过程中收集模型指标和摘要数据,并将其写入TensorBoard日志文件中
tensorboard_callback = tf.keras.callbacks.TensorBoard(
  log_dir= log_dir,
  histogram_freq=1
)
history = new_model.fit_generator(train,epochs=10,validation_data=test,callbacks=[tensorboard_callback])

# 导出模型
export_path = 'tmp/cat_dog_model'
new_model.save(export_path)

This method needs to follow the fixed directory results, as follows
Insert image description here
The exported model
Insert image description here
Use the trained model to make predictions

from matplotlib.font_manager import FontProperties
import tensorflow as tf
# matplotlib是用于绘制图表和可视化数据的库
import matplotlib.pylab as plt
import numpy as np

#1、加载本地图片,并将其处理为224*224
image = tf.keras.preprocessing.image.load_img('./images/cat.png',target_size=(224,224))
# 2、转成数组
image = tf.keras.preprocessing.image.img_to_array(image)
print("图片形状:",image.shape)
# 3、扩展维度
image = image.reshape(1,image.shape[0],image.shape[1],image.shape[2])
# 4、处理输入,因为我们是基于mobilenet_v2训练的,因此可以使用mobilenet_v2处理图片
image = tf.keras.applications.mobilenet_v2.preprocess_input(image)
# 5、加载模型
model = tf.keras.models.load_model('./tmp/cat_dog_model')
# 6、预测
predictions = model.predict(image)
index  = np.argmax(predictions,axis=1)[0]
label = ['猫','狗'][index]
print("预测结果:",predictions,index,label)
#7、可视化显示
font = FontProperties()
font.set_family('Microsoft YaHei')
plt.figure() # 创建图像窗口
plt.xticks([])
plt.yticks([])
plt.grid(False) # 取消网格线
plt.imshow(image[0]) # 显示图片
plt.xlabel(label[0],fontproperties=font)
plt.show() # 显示图形窗口

Insert image description here
Insert image description here

Guess you like

Origin blog.csdn.net/weixin_41897680/article/details/133839428