TensorFlow entry image classification - cat and dog classification - MobileNet optimization

        In the previous article "Tensorflow Getting Started Image Classification-Cat and Dog Classification-Android" , I introduced the whole process of using TensorFlow to train a cat and dog image classifier model and use it on Android applications.

        In this post, MobileNet will be used to retrain a cat and dog image classifier.

1. Introduction to MobileNet 

        MobileNet is a lightweight neural network architecture primarily intended for computer vision applications on mobile and embedded devices. It was developed by the Google Brain team and aims to achieve efficient image classification, object detection, and semantic segmentation by reducing the number of model parameters and computational complexity.

        MobileNet uses depthwise separable convolution to replace traditional convolution operations, which greatly reduces computational costs. Depthwise separable convolution splits the convolution operation into two steps: first a separate spatial convolution is performed on each input channel, and then a pointwise convolution is performed on the result between the channels. This method can significantly reduce the number of parameters and calculations in the model, and can compress the model to a small part of the original model while maintaining a high accuracy rate.

        MobileNet also uses a global average pooling layer instead of a fully connected layer to further reduce model size and computational complexity. In addition, it introduces a linear bottleneck structure and batch normalization techniques to improve performance and stability.

        Overall, MobileNet is a very effective neural network architecture that enables efficient computer vision applications on mobile and embedded devices.

2. Data preparation

        This example still uses the cat and dog dataset introduced in the previous article. Link: Download Kaggle Cats and Dogs Dataset from Official Microsoft Download Center

        It is also necessary to clear the dirty data in the dataset:

import os
from PIL import Image
 
# Set the directory to search for corrupted files
directory = 'path/to/directory'
 
# Loop through all files in the directory
for filename in os.listdir(directory):
 
    # Check if file is an image
    if filename.endswith('.jpg') or filename.endswith('.png'):
        
        # Attempt to open image with PIL
        try:
            img = Image.open(os.path.join(directory, filename))
            img.verify()
            img.close()
        except (IOError, SyntaxError) as e:
            print(f"Deleting {filename} due to error: {e}")
            os.remove(os.path.join(directory, filename))

        Next, the data set needs to be divided into: training set, test set, and verification set.

        Reference Code:

# 将数据集分为训练集、测试集、验证集
import os
import shutil
import numpy as np
 
train_dir = os.path.join(os.path.dirname(dataset_dir), 'PetImages_train')
val_dir = os.path.join(os.path.dirname(dataset_dir), 'PetImages_validation')
test_dir = os.path.join(os.path.dirname(dataset_dir), 'PetImages_test')
 
if not os.path.exists(train_dir):
    train_ratio = 0.7  # 训练集比例
    val_ratio = 0.15   # 验证集比例
    test_ratio = 0.15  # 测试集比例
 
    classfiers = ['Cat', 'Dog']
    for cls in classfiers:
        # 获取数据集中所有文件名
        filenames = os.listdir(os.path.join(dataset_dir, cls))
        
        # 计算拆分后的数据集大小
        num_samples = len(filenames)
        num_train = int(num_samples * train_ratio)
        num_val = int(num_samples * val_ratio)
        num_test = num_samples - num_train - num_val
        
        # 将文件名打乱顺序
        shuffle_indices = np.random.permutation(num_samples)
        filenames = [filenames[i] for i in shuffle_indices]
        
        os.makedirs(os.path.join(train_dir, cls), exist_ok=True)
        os.makedirs(os.path.join(val_dir, cls), exist_ok=True)
        os.makedirs(os.path.join(test_dir, cls), exist_ok=True)
    
        # 拆分数据集并复制文件到相应目录
        for i in range(num_train):
            src_path = os.path.join(dataset_dir, cls, filenames[i])
            dst_path = os.path.join(train_dir, cls, filenames[i])
            shutil.copy(src_path, dst_path)
 
        for i in range(num_train, num_train+num_val):
            src_path = os.path.join(dataset_dir, cls, filenames[i])
            dst_path = os.path.join(val_dir, cls, filenames[i])
            shutil.copy(src_path, dst_path)
 
        for i in range(num_train+num_val, num_samples):
            src_path = os.path.join(dataset_dir, cls, filenames[i])
            dst_path = os.path.join(test_dir, cls, filenames[i])
            shutil.copy(src_path, dst_path)

3. Model training

3.1 Preparation

import os
import tensorflow as tf
from tensorflow.keras import layers

3.2 Data loading and data augmentation

# 加载数据集
train_dir = 'I:/数据集/kagglecatsanddogs_5340/PetImages_train'
val_dir = 'I:/数据集/kagglecatsanddogs_5340/PetImages_validation'
test_dir = 'I:/数据集/kagglecatsanddogs_5340/PetImages_test'
batch_size = 32
image_size = 224

train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(image_size, image_size),
    batch_size=batch_size,
    class_mode='categorical')

val_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)


validation_generator = val_datagen.flow_from_directory(
    val_dir,
    target_size=(image_size, image_size),
    batch_size=batch_size,
    class_mode='categorical')

        This code is mainly used for data preprocessing and creation of data generators for training and validating deep learning models.

        First, the folder paths where the training, validation and test datasets are located are specified. The batch_size variable represents the number of images per batch, and the image_size variable represents the uniform size to resize the images to.

        Then, create an ImageDataGenerator object, train_datagen, to augment our training dataset by applying data augmentation (rescale, shear_range, zoom_range, and horizontal_flip) to the input image.

        Next, use the train_datagen.flow_from_directory function to create a training set data generator train_generator, which will automatically read images from the train_dir directory and convert them into tensor batches in real time for training the model. The class_mode parameter is set to 'categorical' to use categorical labels.

        Similarly, an ImageDataGenerator object val_datagen is also created, and only the image scaling operation is applied. Then use the val_datagen.flow_from_directory function to create a verification set data generator validation_generator, the class_mode parameter is also set to 'categorical', in order to evaluate the accuracy of the model.

3.3 Model Design

input_shape = (image_size, image_size, 3)
num_classes = 2

# 加载预训练模型
base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=False, weights='imagenet')

# 冻结前面的层
for layer in base_model.layers:
    layer.trainable = False

# 添加新的全连接层
x = base_model.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dropout(0.5)(x)
predictions = layers.Dense(num_classes, activation='sigmoid')(x)

# 构造完整模型
model = tf.keras.models.Model(inputs=base_model.input, outputs=predictions)
    
# 编译模型
model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.0001), loss='binary_crossentropy', metrics=['accuracy'])

        First, the shape of the input image (input_shape) and the number of classes (num_classes) are specified.

        Then, use the tf.keras.applications.MobileNetV2 function to load a pre-trained MobileNetV2 model, which serves as the basic architecture of the neural network. The include_top parameter is set to False, indicating that only the convolutional part of the model is required, and its fully connected layer is not required. The weights parameter is set to 'imagenet', which means to use the weight value pre-trained on ImageNet.

        Next, freeze all layers in the MobileNetV2 model, that is, set their trainable attribute to False, so that their weights will not be updated during training.

        Add a new fully connected layer, where x = base_model.output indicates that the output of MobileNetV2 is used as the input of the new model; layers.GlobalAveragePooling2D() averages the values ​​​​of all positions in each feature map to obtain a fixed-length vector; Dense(128, activation='relu') means to add a fully connected layer containing 128 neurons, and use the ReLU activation function for nonlinear transformation; layers.Dropout(0.5) means to add a dropout layer after the fully connected layer , to reduce the risk of overfitting; the last layer layer.Dense(num_classes, activation='sigmoid') uses the sigmoid activation function to output the probability value according to our classification task.

        Finally, use tf.keras.models.Model to splice the MobileNetV2 model and the newly added fully connected layer together to generate a complete deep learning model. Compile the generated model, set the optimizer (Adam) and loss function (binary_crossentropy), and specify the evaluation index (accuracy).

3.4 Model training

# 训练模型
epochs = 5
steps_per_epoch = train_generator.n // batch_size
validation_steps = validation_generator.n // batch_size

history = model.fit(
    train_generator,
    steps_per_epoch=steps_per_epoch,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=validation_steps)

        First, the epochs parameter is specified to represent the number of times to traverse the entire training set. steps_per_epoch indicates the number of steps to be performed in each epoch, which is determined here by train_generator.n // batch_size. Similarly, validation_steps is also determined by validation_generator.n // batch_size.

        Next, use the model.fit function to start training the model. train_generator and validation_generator are the data generators of the training set and validation set respectively; steps_per_epoch, epochs and validation_steps are the previously defined parameters; validation_data indicates which data set is used for validation; history = model.fit returns a History object, including Information such as the loss value and evaluation indicators during the training process.

        During model training, each epoch will send all the samples in the training set to the model for training, and return the results of the verification set to us so that we can check the performance of the model. Finally, we can use the obtained History object to analyze the performance of the model during the training and verification phases, and tune accordingly.

Code output:
Epoch 1/10 546/546 [=======] - 135s 239ms/step - loss: 0.1517 - accuracy: 0.9488 - val_loss: 0.0612 - val_accuracy: 0.9797

Epoch 2/10 546/546 [=======] - 125s 230ms/step - loss: 0.0720 - accuracy: 0.9749 - val_loss: 0.0544 - val_accuracy: 0.9826

Epoch 3/10 546/546 [=======] - 128s 234ms/step - loss: 0.0615 - accuracy: 0.9788 - val_loss: 0.0531 - val_accuracy: 0.9818

Epoch 4/10 546/546 [=======] - 124s 228ms/step - loss: 0.0557 - accuracy: 0.9805 - val_loss: 0.0525 - val_accuracy: 0.9810

Epoch 5/10 546/546 [=======] - 126s 231ms/step - loss: 0.0510 - accuracy: 0.9802 - val_loss: 0.0499 - val_accuracy: 0.9829

illustrate:

  • From the above output, at the end of the first epoch of training, the accuracy of the model has reached 94%. As a comparison, in the previous article, in the case of complete training from scratch, 15 epochs can be achieved 90% accuracy.
  • This shows that using MobileNet as the base model can effectively improve the accuracy and shorten the training time.

        Since the pre-trained MobileNet model is used as the basis, its weights have been fully trained and adjusted, and it has strong feature extraction capabilities, so it can be expected to perform well in cat and dog image classification tasks. Whereas if the model is trained from scratch, more samples and longer training time are required to achieve similar performance.

3.5 Evaluation Model

# 评估模型在测试集上的性能
test_datagen = tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(image_size, image_size),
    batch_size=batch_size,
    class_mode='categorical')

# 评估模型在测试集上的性能
loss, acc = model.evaluate(test_generator)
print(f'Test loss: {loss}, Test accuracy: {acc}')

Code output:

Found 3752 images belonging to 2 classes.

49/118 [=====>..................] - ETA: 2s - loss: 0.0645 - accuracy: 0.9758

118/118 [==============] - 5s 44ms/step - loss: 0.0590 - accuracy: 0.9784

Test loss: 0.05904101952910423, Test accuracy: 0.9784114956855774

illustrate:

  • Judging from the output information, both the accuracy rate during training and the test readiness rate have reached more than 97%, which can be said to be very high.

3.6 Model saving

        save as tf 
 

# 保存模型参数
model.save('cat_dog_classfier_v2.tf', overwrite=True, include_optimizer=True)

        Save it in tflite format for use on mobile:

# 导出TensorFlow Lite模型
covertor = tf.lite.TFLiteConverter.from_keras_model(model)
covertor.optimizations = [tf.lite.Optimize.DEFAULT]
tflife_model = covertor.convert()

with open('cat_dog_classfier_v2.tflite', 'wb') as f:
  f.write(tflife_model)

        The exported tflite file is as follows:

       From the comparison above, it can be concluded that the v2 model trained with MobileNet is only 2.54 MB, while the v1 version without MobileNet has 10 MB.

Four. Summary

        This paper introduces the method of using MobileNet as the basic architecture of the neural network to train a cat and dog image classifier, which is very suitable for the mobile terminal. It not only reduces training time, improves accuracy, but also reduces model file size.

        .

Guess you like

Origin blog.csdn.net/eieihihi/article/details/130475137