计算机视觉——飞桨深度学习实战-图像分类算法原理与实战

基础理论:

图像分类是深度学习在视觉领域第一个取得突破性成果的任务。本章首先介绍了图像分类任务的发展历程与评价指标。然后分为三个角度分别介绍了在图像分类领域具有重要地位的三种模型。第一种是基于残差网络的模型,本章重点介绍了ResNet、DenseNet和DPN。第二种是基于Transformer思想的模型,本章重点介绍了ViT和Swin-Transformer模型。第三种是用于移动端设备的轻量级模型,本章重点介绍了MobileNet和PP-LCNet。最后,本章使用飞桨框架完成了桃子分拣项目。学完本章,希望读者能够掌握以下知识点:

  1. 了解图像分类的发展历程;
  2. 掌握基于残差思想的模型特点;
  3. 掌握基于Transformer思想的模型特点;
  4. 掌握轻量级网络模型的特性。

图像分类任务是最早使用深度学习方法的计算机视觉任务,很多经典的网络架构都是首先应用到图像分类任务上,因此图像分类中的深度学习网络模型可以看做其他计算机视觉任务的基石。

早期的图像分类方法主要通过手工提取特征对整个图像进行描述,然后使用分类器判别图像类别,因此图像分类的核心在于对特征进行分类,而如何提取图像的特征至关重要。底层特征中包含了大量冗余噪声,为了提高特征表达的鲁棒性,需要使用一种特征变换算法对底层特征进行编码,称作特征编码。特征编码之后一般会经过空间特征约束,也称作特征汇聚,具体只在一个空间范围内对每一位特征取最大值或者平均值,可以获得一定特征不变性的特征表达。


After the underlying feature extraction, feature encoding and feature aggregation, the image can be expressed as a fixed-dimensional vector description. The image can be classified by classifying the feature vector through a classifier.
SVM based on the kernel method is the most widely used classifier among traditional methods and performs very well on traditional image classification tasks. Traditional image classification methods are effective for some simple and obvious image classification scenarios. However, due to the complexity of the actual situation, traditional classification methods cannot achieve satisfactory classification results when faced with complex scenes. This is because traditional The manual feature extraction method used in the classification method cannot comprehensively and accurately describe the image features, and the manually extracted features cannot cope with problems such as multi-viewing, multi-angle, different illumination and occlusion of the same object, and multi-morphology.
 

The transformation from CNN to Transformer is similar to the change from focusing on the local to specializing in the global. It is more in line with human visual characteristics (humans are good at quickly capturing global features). CNN is good at extracting local small and precise information, but it has the disadvantage of insufficient extraction capabilities. Transformer relies on global long-distance modeling and does not pay attention to local information, so it loses the characteristics of CNN such as translation invariance and flip invariance, which will lead to the disadvantages of capturing information redundancy and greater data requirements.


Taken together, deep learning methods have gone through the development process of simple MLP, CNN, Transformer, complex MLP and other models in image classification problems. These models have their own characteristics. Convolution only contains local connections, so the calculation is efficient; self-attention uses dynamic weights, so the model capacity is larger, and it also has a global receptive field; MLP also has a global receptive field, but No dynamic weights are used. It can be seen that convolution and self-attention have complementary characteristics, convolution has the best generalization ability, and Transformer has the largest model capacity among the three architectures. Convolution is the best choice for designing lightweight models, but Transformer should be considered when designing large models. Therefore, you can consider using convolutional local modeling to help improve the performance of Transformer and MLP. Considering the characteristics of the above structure, sparse connections help improve generalization performance, while dynamic weights and global receptive fields help improve model capacity. Therefore, innovative methods are constantly emerging in image classification tasks, providing a wider application space for computer vision.

Experiment name: Construction and training of peach classification model

1. Experimental goals

This experiment mainly explains: Use the paddlepaddle deep learning framework to build a peach classification model and complete the entire process of training and testing.

After completing this experiment, the abilities you can master include:

  • Master the use of paddlepaddle deep learning framework;
  • Master how to use the paddlepaddle deep learning framework to build a peach classification model;
  • Master how to complete the deep learning work process such as model training, evaluation, saving, and prediction;

2. Experimental background introduction

Image classification is the basis of computer vision and other complex computer tasks. With the development of deep learning technology, many excellent image classification algorithms have been born. For this peach sorting, we will use the typical algorithm represented by resnet .

3. Introduction to the general process of using the paddlepaddle framework

Today, paddlepaddle has launched version 2.0. In version 2.0, a high-level API has been launched, making the code simpler and easier to program.

Currently, the high-level API of Flying Paddle consists of five modules, namely data loading, model construction, model training, model visualization and high-level usage. As shown below:

 

It is very easy to use the paddlepaddle framework to carry out deep learning projects. You can complete the deep learning project by following the general process (the figure below shows the general process of project implementation). The process mainly includes five major steps: data processing, model design, training configuration, training process and model storage. In the data processing stage, it is mainly to obtain data that is quasi-available for the model, including the collection and preprocessing of local or network data. The model design stage is the most discussed model construction in deep learning projects. The key at this stage is the design and implementation of the network structure. The Flying Paddle Framework has prepared a large number of industrially verified model libraries and pre-trained models for developers to facilitate development. or use directly. In the training configuration phase, developers need to set parameters such as optimizer type and learning rate attenuation, and also need to specify whether to use GPU or CPU to complete calculations. The training process stage is the stage where the framework actually runs the calculation process. At this stage, the flying paddle framework continuously completes the process of forward propagation, back propagation and gradient descent. The last step is model saving. When the model reaches predetermined indicators or reaches a predetermined number of training times, developers can save the "trained" model for next training or deployment.

Using paddlepaddle framework for deep learning project process

4. Experiment content

4.1 Introduction to data sets

The data set we used in this experiment is four types of peaches. These peaches are divided into four folders, and the name of each folder corresponds to a type of peach.

peach data set

 

Observing with our own eyes, it seems that these peaches are divided into four categories according to size and color; is this really the case? After completing this experiment, the deep learning model will be able to judge by itself what the classification is based on.

For this experiment, a data set has been provided for everyone. The data set is stored in the  "data/enhancement/"  folder. The pictures are divided into 2 folders, one is the training set and the other is the test set. There are 4 categories in each folder: R0, B1, M2, S3. The original peach sorting data set contains two folders: "train" and "test".
Each folder contains: "B1", "M2", "R0", and "S3".
Training set:
    train_B1: 1601 pictures
    train_M2: 1800 pictures
    train_R0: 1601 pictures
    train_S3: 1635 pictures
Test set:
    test_B1: 16 pictures
    test_M2: 18 pictures
    test_R0: 18 pictures
    test_S3: 15 pictures

# Data decompression. If you have already decompressed it once, comment out this code.
# !unzip /home/aistudio/data/data103593/data.zip -d /home/aistudio/data/enhancement_data/
Experimental file introduction

The file structure of this experiment is as follows:

The code and data set for this experiment have been prepared for everyone. The directory structure is as shown below:

The file structure of this experiment


 

4.2 Import the libraries required for experiments

In the first step of the experiment, you need to import relevant libraries, the most important ones are the following:

  • os: The OS module provides a very rich method for processing files and directories.
  • sys: The sys module provides a series of variables and functions related to the Python running environment.
  • shutil: module for file copying
  • Numpy: Numpy is an extension library of the Python language that supports a large number of dimensional array and matrix operations. In addition, it also provides a large number of mathematical function libraries for array operations.
  • random: The random module in Python is used to generate random numbers.
  • paddle.vision.datasets: This module contains functions related to data loading. For example, it can be used to load commonly used data sets, such as mnist.
  • paddle.vision.transforms: This module contains functions for converting images, such as converting HWC format images into CHW mode input tensors. It also includes the flying paddle framework's image preprocessing method, which can quickly complete common image preprocessing, such as adjusting hue, contrast, image size, etc.;
  • paddle.io.Dataset: This module includes the paddle framework data loading method, which can complete batch loading and asynchronous loading of data with "one click".
#os: The OS module provides a very rich method for processing files and directories.
#sys: The sys module provides a series of variables and functions related to the Python running environment.
#shutil: module for file copying 
#numpy: Numpy is  an extension library of the Python language that supports a large number of dimensional array and matrix operations. In addition, it also provides a large number of mathematical function libraries for array operations.
#random: The random module in Python is used to generate random numbers.
#paddle.vision.datasets: This module contains functions related to data loading. For example, it can be used to load commonly used data sets, such as mnist.
#paddle.vision.transforms: This module contains functions for converting images, such as converting HWC format images into CHW mode input tensors. It also includes the flying paddle framework's image preprocessing method, which can quickly complete common image preprocessing, such as adjusting hue, contrast, image size, etc.;
#paddle.io.Dataset: The high module includes the Paddle Framework data loading method, which can complete batch loading and asynchronous loading of data with "one click".
import  us
import sys
import  shutil
import numpy as np
import paddle
import random
from  paddle.io  import  Dataset, DataLoader
from paddle.vision.datasets import DatasetFolder, ImageFolder
from paddle.vision import transforms as T

4.3 Data set preparation

For this experiment, a data set has been provided for everyone. The data set is stored in the  "data/enhancement/"  folder.

Data preprocessing for this experiment includes:

1. Generate txt file
2. Split training set and validation set

4.3.1 Generate txt file

Why generate txt files? We see that in the data set, each folder corresponds to a category; but there is no txt file to specify the label; so we first need to generate a txt file;

In order to make the code neat and concise, we configure parameters such as the data set path in a global variable train_parameters . The explanation is as follows:

  • 'train_data_dir' is the enhanced original training set provided;
  • 'test_image_dir' is the original test set provided;
  • 'train_image_dir' and 'eval_image_dir' are the actual training set and validation set generated by splitting the original training set
  • 'train_list_dir' and 'test_list_dir' are the generated txt file paths
  • 'saved_model' folder where training results are stored
'''
Parameter configuration:
'train_data_dir' is the enhanced original training set provided;
'test_image_dir' is the original test set provided;
'train_image_dir' and 'eval_image_dir' are the actual training set and validation set generated by splitting the original training set
'train_list_dir' and 'test_list_dir' are the generated txt file paths
'saved_model' folder where training results are stored
'''
train_parameters = {          
    'train_image_dir''./data/splitted_training_data/train_images',
    'eval_image_dir''./data/splitted_training_data/eval_images',
    'test_image_dir''./data/enhancement_data/test',
    'train_data_dir':'./data/enhancement_data/train',
    'train_list_dir':'./data/enhancement_data/train.txt',
    'test_list_dir':'./data/enhancement_data/test.txt',  
    'saved_model':'./saved_model/'
}
#4 category labels of the data set
labels = ['R0''B1''M2''S3']
labels.sort()
#Prepare to generate a txt file with the training set file name and label name
write_file_name = train_parameters[ 'train_list_dir']
#Open the write_file_name file in writing mode
with open(write_file_name, "w"as write_file:
    #Enter different classification tags separately
    for label in labels:
        #Create an empty list to save picture names
        file_list = [] 
        # Used to find all pictures under the path of this label.
        train_txt_dir = train_parameters[ 'train_data_dir']+'/'+label+'/'     
        for file_name in os.listdir(train_txt_dir):
            dir_name = label        
            temp_line = dir_name + '/' + file_name + '\t' + label + '\n'    # 例如:"B1/101.png B1"
            write_file.write(temp_line)
    
#Prepare to generate txt files with test set file names and tag names
write_file_name = train_parameters[ 'test_list_dir']
#Open the write_file_name file in writing mode
with open(write_file_name, "w"as write_file:
    #Enter different classification tags separately
    for label in labels:
        #Create an empty list to save picture names
        file_list = [] 
        # Used to find all pictures under the path of this label.
        train_txt_dir = train_parameters[ 'test_image_dir']+'/'+label+'/'     
        for file_name in os.listdir(train_txt_dir):
            dir_name = label        
            temp_line = dir_name + '/' + file_name + '\t' + label + '\n'    # 例如:"B1/101.png B1"
            write_file.write(temp_line)

After completing the above steps, two files train.txt test.txt will be generated in the data/enhancement_data/ directory.


 
4.3.2 Divide training set and validation set
  • We already have a training set and a test set; it is best to split the training set again and split a validation set from the training set.
  • In this way, when we train, we can use the verification set to verify the training effect of our model. By observing the training effect in real time, we can adjust parameters in a timely manner.
#Determine whether the splitted_training_data folder exists. If it does not exist, create a new one.
if not os.path.exists('data/splitted_training_data'):
    os.makedirs('data/splitted_training_data')
#Define a function to split the training set and validation set
def create_train_eval():
    '''
    Divide training set and validation set
    '''
    train_dir = train_parameters['train_image_dir']
    eval_dir = train_parameters['eval_image_dir']
    train_list_path = train_parameters['train_list_dir']  
    train_data_dir = train_parameters[ 'train_data_dir'
    
    print('creating training and eval images')
    #If the folder does not exist, create the corresponding folder
    if not os.path.exists(train_dir):
        os.mkdir(train_dir)
    if not os.path.exists(eval_dir):
        os.mkdir(eval_dir) 
    #Open the txt file and split the data
    file_name = train_list_path
    f = open(file_name, 'r'
    #Read data row by row
    lines = f.readlines()
    f.close()
        
    for i in range(len(lines)):
        #Split each line of data into 2 parts according to spaces, and take the path name and image file name of the first part, for example: R0/1.png
        img_path = lines[i].split('\t')[0
        #Get the label of the second part, for example: R0
        class_label = lines[i].split('\t')[1].strip('\n')
        # Take one of every 8 pictures as verification data, and the others are used for training
        if i % 8 == 0:
            #Combine directory and file names into one path
            eval_target_dir = os.path.join(eval_dir, class_label) 
            # Combine the total file path and the file name of the current image. In fact, you get the image name in the folder where the training set image is located.   
            eval_img_path = os.path.join(train_data_dir, img_path)
            if not os.path.exists(eval_target_dir):
                    os.mkdir(eval_target_dir)  
            #Copy the image to the folder with the specified label in the verification set      
            shutil.copy(eval_img_path, eval_target _dir) 
        else:           
            train_target_dir = os.path.join(train_dir, class_label)                                 
            train_img_path = os.path.join(train_data_dir, img_path)
            if not os.path.exists(train_target_dir):
                os.mkdir(train_target_dir)
            shutil.copy(train_img_path, train_target_dir) 
    print  ( 'Dividing the training set and validation set is completed!' )
# Make a data set. If you have already done it, please comment out the code.
create_train_eval()
Creating training and eval images  to divide the training set and validation set is completed!

After running the above code, the splitting of the training set and validation set is completed. Split it and place it in the ./data/splitted_training_data/ directory:


 

4.5 Custom data set class

The Paddle Framework has made some of our commonly used data sets into APIs, which are open to users. The corresponding APIs are paddle.vision.datasets and paddle.text.datasets. When we use it, we can directly call these APIs to download and use the data set. These integrated data sets include:

  • Vision-related datasets: ['DatasetFolder', 'ImageFolder', 'MNIST', 'FashionMNIST', 'Flowers', 'Cifar10', 'Cifar100', 'VOC2012']
  • Natural language related data sets: ['Conll05st', 'Imdb', 'Imikolov', 'Movielens', 'UCIHousing', 'WMT14', 'WMT16']

However, in actual usage scenarios, we often need to use our own data sets. For example, in this experiment, we used our own peach data set.

Paddle provides users with the paddle.io.Dataset base class, allowing users to quickly implement data set definition through class integration.

PaddlePaddle's loading method for data sets is to uniformly use Dataset (data set definition) + DataLoader (multi-process data set loading).

Dataset definition-Dataset
  • First we define the data set;
  • Dataset definition mainly implements a new Dataset class, inheriting the parent class paddle.io.Dataset;
  • Then implement the following two abstract methods in the parent class, "__getitem__" and "__len__":


 
class PeachDataset(Dataset):
    """
    Step 1: Inherit the paddle.io.Dataset class
    """
    def __init__(self, mode='train'):
        """
        Step 2: Implement the constructor, define the data reading method, and divide the training, verification and test data sets
        """
        super(PeachDataset, self).__init__()
        train_image_dir = train_parameters[ 'train_image_dir' ] #Path of training set
        eval_image_dir = train_parameters['eval_image_dir']
        test_image_dir = train_parameters['test_image_dir']        
        
        '''         ''' 
        #transform data enhancement function, here only the opening method of the picture is transformed            
        #Here, use Transpose() to change the image opening method (width, height,  number of channels) to the PaddlePaddle reading method (number of channels, width, height)
        mean = [ 127.5 127.5 127.5 # Normalization, mean
        std = [ 127.5 127.5 127.5 # Normalization, label difference 
        transform_train = T.Compose([T.ColorJitter(0.40.40.40.4)
                                     ,T.Resize(size=(224,224)) 
                                     ,T.Transpose()
                                     ,T.Normalize(mean, std)
                                    ])
        transform_eval = T.Compose([T.Resize(size=(224,224)) 
                                    ,T.Transpose()
                                    ,T.Normalize(mean, std)
                                    ])
        transform_test = T.Compose([T.Resize(size=(224,224)) 
                                    ,T.Transpose()
                                    ,T.Normalize(mean, std)
                                    ])
        
        '''         
        # 参考API:https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/vision/Overview_cn.html#about-transforms
        #Here, use Transpose() to change the image opening method (width, height,  number of channels) to the PaddlePaddle reading method (number of channels, width, height)
        # ColorJitter randomly adjusts the brightness, contrast, saturation and hue of an image.
        # hflip Flips the input image horizontally.        
        #Normalize normalization. mean = [127.5, 127.5,  127.5], std = [127.5, 127.5, 127.5]
        # RandomHorizontalFlip performs horizontal flipping of the image based on probability.
        # RandomVerticalFlip Performs a vertical flip of the image based on probability.
        mean = [127.5, 127.5, 127.5] # Normalization, mean
        std = [127.5, 127.5, 127.5] # Normalization, label difference 
        transform_train = T.Compose([T.Resize(size=(224,224)), 
                                     T.Transpose(),                                
                                     T.ColorJitter(0.4, 0.4, 0.4, 0.4),
                                     T.RandomHorizontalFlip(prob=0.5,),
                                     T.RandomVerticalFlip(prob=0.5,),
                                     T.Normalize(mean, std)])
        transform_eval = T.Compose([T.Resize(size=(224,224)), T.Transpose()])
        transform_test = T.Compose([T.Resize(size=(224,224)), T.Transpose()])
        ''' 
        #飞桨推荐使用 paddle.io.DataLoader 完成数据的加载,生成一个可以加载数据的迭代器
        参考API:https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html#cn-api-fluid-io-dataloader
        #加载训练集,train_data_folder 是一个迭代器
        参考API:https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/vision/datasets/DatasetFolder_cn.html#datasetfolder
        train_data_folder = DatasetFolder(train_image_dir, transform=transform_train)
        #Load the validation set, eval_data_folder is an iterator
        eval_data_folder = DatasetFolder(eval_image_dir, transform=transform_eval)
        #Load the test set, test_data_folder is an iterator
        test_data_folder = DatasetFolder(test_image_dir, transform=transform_test)
        self.mode = mode
        if self.mode  == 'train':
            self.data = train_data_folder
        elif self.mode  == 'eval':
            self.data = eval_data_folder
        elif self.mode  == 'test':
            self.data = test_data_folder
    #Return data and corresponding labels during each iteration
    def __getitem__(self, index):
        """
        Step 3: Implement the __getitem__ method, define how to obtain data when specifying index, and return a single piece of data (training data, corresponding label)
        """
        data = np.array(self.data[index][0]).astype('float32')
        label = np.array([self.data[index][1]]).astype('int64')
        return data, label
    # Return the total number of the entire data set
    def __len__(self):
        """
        Step 4: Implement the __len__ method and return the total number of data sets
        """
        return len(self.data)
#Use the custom PeachDataset class to load your own data set
train_dataset = PeachDataset(mode='train')
val_dataset = PeachDataset(mode='eval')
test_dataset = PeachDataset(mode='test')
Data set loading-DataLoader

DataLoader returns an iterator. Each element in the data returned by the iterator is a Tensor. The calling method is as follows:

class paddle.io.DataLoader(dataset, feed_list=None, places=None, return_list=False, batch_sampler=None, batch_size=1, shuffle=False, drop_last=False, collate_fn=None, num_workers=0, use_buffer_reader=True, use_shared_memory=True, timeout=0, worker_init_fn=None) 

DataLoader iterates a given dataset once (the order is given by batch_sampler).
DataLoader supports single-process and multi-process data loading modes. When num_workers is greater than 0, multi-process mode will be used to load data asynchronously.

Detailed introduction https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/io/DataLoader_cn.html

The following code shows how to use DataLoader

# DataLoader sample code
#Load library
import  cv2  as  cv  #Use OpenCV
print ( "opencv version number is:"  + cv.__version__)  #View version number
# In fact, this class library should be installed before using OpenCV, but because AI-Studio is used  , the system has already been pre-installed for developers: opencv-python 4.1.1.26       
from  matplotlib  import  pyplot  as  plt  #Draw on this page
%matplotlib inline 
# Construct a DataLoader
test_loader = DataLoader(test_dataset,
                    batch_size=2,
                    shuffle=True,
                    drop_last=True,
                    num_workers=2)
The opencv version number is: 4.1.1 
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
# Use DataLoader to traverse the data set
for  mini_batch  in  test_loader():  # Get mini_batch from DataLoader 
    print ( "The type of mini_batch is: "  +  str ( type (mini_batch)))
    pic_list = mini_batch[ 0 #Picture data
    label_list = mini_batch[ 1 #label
    print ( "The size of mini_batch is: "  +  str ( len (pic_list)))
    # Convert the image display to numpy format, and set the internal numbers to integer types
    pic_1 = pic_list[0]
    pic_2 = pic_list[1]
    arr1 = np.asarray(pic_1, dtype=np.float64) 
    print(arr1.shape)
    arr2 = np.asarray(pic_2, dtype=np.float64)    
    print(arr2.shape)
    break  #Since this is an example, only the first mini_batch is taken out
    
The type of mini_batch is: <class 'list'>  The size of mini_batch is: 2  (3, 224, 224)  (3, 224, 224)
# Display the obtained image data
r = arr1[0]
g = arr1[1]
b = arr1[2]
img = cv.merge([r,g,b])
plt.imshow(img)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  if isinstance(obj, collections.Iterator):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2366: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return list(data) if isinstance(data, collections.MappingView) else data
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
<matplotlib.image.AxesImage at 0x7fb4678d8350>

As shown in the picture above, the picture obtained at this time is not what the original picture looks like. That's because in the PeachDataset class, the transform method is used on the image data, that is, some changes are made to the image. To illustrate this point, the following method can be used for comparison.

The picture below shows the changes that have taken place:

For comparison and explanation, please modify the code in the PeachDataset class and add a comment symbol. The code is as follows: The
purpose of this is to change the size of the image and meet the requirements of paddle, and to avoid data changes caused by normalization.

transform_test = T.Compose([ T.Resize(size=(224,224)) ,T.Transpose() #,T.Normalize(mean, std) ]) 

After modifying the code, please open the comments of the code below, restart the executor, and click "Run -> Run the currently selected and previous cells" in the upper right corner.

You can see the pictures in the original data set, as shown below:

After viewing the data, don't forget to open the comments in the PeachDataset class. and add comments to the code below.

'''
# Display the obtained image data
arr1 = arr1 / 255 # Change each pixel to between 0-1
r = arr1[0]
g = arr1[1]
b = arr1[2]
img = cv.merge([r,g,b])
plt.imshow(img)
'''

4.6 Build a classification model

Next, we will build an image classification model, which can be used to classify the peach data set.

How to build a classification model?

  • We can build DNN network models, CNN network models, or other network models according to our own ideas, but this requires our algorithm research capabilities to be very high;
  • We can use mature and classic network models, such as VGG, ResNet, etc.; use the paddle framework to build these models for our use.

In this experiment, we used the 50-layer residual network ResNet as our classification model.

Moreover, in this experiment, in order to increase the effect of our model, we still used the transfer learning method. So why use transfer learning? How to use transfer learning?

4.6.1 Transfer learning

In actual engineering development, few people train a complete neural network from scratch.

Why?

Because our data sets are generally not very large, the generalization ability of the trained model is often not strong. And training is very time-consuming.

How to do it?

A common method is to find a large public data set (such as ImageNet, which contains 1.2 million pictures and 1,000 categories), and first train a neural network model A on this data set (this A has generally been trained by others) ), then use this A as a starting point, fine-tune, and then train our own data set. This A is also called  "pre-training model" .

This is   a method of  transfer learning , also called fine tune .

So what is the theoretical basis of fine tune? That is: why can we fine-tune based on models trained by others? This requires an analysis from the structural principles of convolutional neural networks.

  • For convolutional networks: the first few layers learn general features, such as the edges of images; as the network layers deepen, subsequent networks focus more on learning specific features, such as Body parts, faces, and other combined features.
  • The final fully connected layer is usually considered to capture information relevant to solving the corresponding task. For example, the fully connected layer of AlexNet can indicate which category of 1000 categories of objects the extracted features belong to.
  • For example, in the face recognition process, several primary layers of convolution will extract general features such as straight lines and curves; several intermediate layers of convolution will further learn specific parts such as eyes and noses, and high-level convolutions can learn combined features. Thus it is judged that this is a face image. -This characteristic of the convolutional neural network is the theoretical basis of our fine tune.

Some students may ask: Why don't we just use the model trained by others on a large data set (such as ImageNet), but also fine-tune it?

  • Because the models trained by others may not be completely suitable for our own tasks. Maybe other people's networks can do more than our tasks; maybe other people's networks are more complex and our tasks are simpler. -For example, if we want to train a network for binary classification of cat and dog images, we will first think of directly using the network model trained by others on ImageNet. But ImageNet has 1000 categories and we only need 2 categories. At this time, we need to fine-tune our own tasks. For example, we can fix the relevant layers of the original network and modify the output layer of the network to make the results more in line with our needs.

In PaddlePaddle2.0, to use the pretrained model, you only need to set the model parameter preserved=True.

4.6.2 Building the model

A very convenient thing about using Flying Paddle is that the Flying Paddle framework has many built-in models, and a real line of code can implement the deep learning model.

Currently, the models built into the paddle framework are all models in the CV field. In the paddle.vision.models directory, they specifically include the following models:

Built-in models of the flying paddle framework: ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'VGG', 'vgg11', 'vgg13', 'vgg16', 'vgg19' , 'MobileNetV1', 'mobilenet_v1', 'MobileNetV2', 'mobilenet_v2', 'LeNet']

For example, the resnet50 we used this time already has a built-in model.


 
# Using the built-in model, you can choose a variety of different networks. The resnet50 network is selected here.
#pretrained (bool, optional) - whether to load pretrained weights on the imagenet dataset
model = paddle.vision.models.resnet18(pretrained=True, num_classes=4)    
#Try different network structures: MobileNetV2
MobileNetV2 reference document: https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/vision/models/MobileNetV2_cn.html
# model = paddle.vision.models.mobilenet_v2(pretrained=True, num_classes=4)    
#Use paddle.Model to complete the encapsulation of the model and combine the network structure into a class that can quickly use high-level APIs for training and prediction.
model = paddle.Model(model)
100%|██████████| 69183/69183 [00:01<00:00, 47292.60it/s]
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1301: UserWarning: Skip loading for fc.weight. fc.weight receives a shape [512, 1000], but the expected shape is [512, 4].
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1301: UserWarning: Skip loading for fc.bias. fc.bias receives a shape [1000], but the expected shape is [4].
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))

Use model.summary to observe network conditions

Reference documentation: https://github.com/PaddlePaddle/Paddle/blob/release/2.1/python/paddle/hapi/model.py#L883

There is a slight difference between the API documentation and the real code

# 以下为源代码 def summary(self, input_size=None, dtype=None): """Prints a string summary of the network. Args: input_size (tuple|InputSpec|list[tuple|InputSpec], optional): size of input tensor. if not set, input_size will get from ``self._inputs`` if network only have one input, input_size can be tuple or InputSpec. if model have multiple input, input_size must be a list which contain every input's shape. Default: None. dtypes (str, optional): if dtypes is None, 'float32' will be used, Default: None. Returns: Dict: a summary of the network including total params and total trainable params. Examples: .. code-block:: python import paddle from paddle.static import InputSpec input = InputSpec([None, 1, 28, 28], 'float32', 'image') label = InputSpec([None, 1], 'int64', 'label') model = paddle.Model(paddle.vision.models.LeNet(), input, label) optim = paddle.optimizer.Adam( learning_rate=0.001, parameters=model.parameters()) model.prepare( optim, paddle.nn.CrossEntropyLoss()) params_info = model.summary() print(params_info) """ assert (input_size is not None or self._inputs is not None ), "'input_size' or 'self._input' must be set" if input_size is not None: _input_size = input_size else: _input_size = self._inputs return summary(self.network, _input_size, dtype) 

parameter:

  • input_size (tuple|InputSpec|list) – The size of the input tensor. If the network has only one input, then this value needs to be set to tuple or InputSpec. If the model has multiple inputs. Then the value needs to be set to list[tuple|InputSpec], including the shape of each input. If this value is not set, self._inputs will be used as input. Default value: None.
  • dtypes (str, optional) - Data type of the input tensor, if not given, float32 is used by default. Default value: None.

Returns: dictionary. Contains the size of all parameters of the network and the size of all trainable parameters.

# Use summary to observe network information
model.summary(input_size=(13224224), dtype='float32'
-------------------------------------------------------------------------------
   Layer (type)         Input Shape          Output Shape         Param #    
===============================================================================
     Conv2D-1        [[1, 3, 224, 224]]   [1, 64, 112, 112]        9,408     
   BatchNorm2D-1    [[1, 64, 112, 112]]   [1, 64, 112, 112]         256      
      ReLU-1        [[1, 64, 112, 112]]   [1, 64, 112, 112]          0       
    MaxPool2D-1     [[1, 64, 112, 112]]    [1, 64, 56, 56]           0       
     Conv2D-2        [[1, 64, 56, 56]]     [1, 64, 56, 56]        36,864     
   BatchNorm2D-2     [[1, 64, 56, 56]]     [1, 64, 56, 56]          256       ReLU-2 [[1, 64, 56, 56]] [1, 64, 56, 56] 0  Conv2D-3 [[1, 64, 56, 56]] [1, 64, 56, 56] 36.864  BatchNorm2D- 3 [[1, 64, 56, 56]] [1, 64, 56, 56] 256  BasicBlock-1 [[1, 64, 56, 56]] [1, 64, 56, 56] 0  Conv2D-4 [ [1, 64, 56, 56]] [1, 64, 56, 56] 36.864  BatchNorm2D-4 [[1, 64, 56, 56]] [1, 64, 56, 56] 256  ReLU-3 [[1 , 64, 56, 56]] [1, 64, 56, 56] 0  Conv2D-5 [[1, 64, 56, 56]] [1, 64, 56, 56] 36.864  BatchNorm2D-5 [[1, 64 , 56, 56]] [1, 64, 56, 56] 256 
   BasicBlock-2      [[1, 64, 56, 56]]     [1, 64, 56, 56]           0       
     Conv2D-7        [[1, 64, 56, 56]]     [1, 128, 28, 28]       73,728      BatchNorm2D-7 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 ReLU-4 [[1, 128, 28, 28]] [1,  128, 28, 28] 0  Conv2D- 8 [[1, 128, 28, 28]] [1, 128, 28, 28] 147.456  BatchNorm2D-8 [[1, 128, 28, 28]] [1, 128, 28, 28] 512  Conv2D-6 [ [1, 64, 56, 56]] [1, 128, 28, 28] 8.192  BatchNorm2D-6 [[1, 128, 28, 28]] [1, 128, 28, 28] 512  BasicBlock-3 [[1 , 64, 56, 56]] [1, 128, 28, 28] 0  ReLU-5 [[1, 128 , 28, 28]] [1, 128, 28, 28] 0 
     Conv2D-9        [[1, 128, 28, 28]]    [1, 128, 28, 28]       147,456     BatchNorm2D-9 [[1, 128, 28, 28]] [1, 128, 28, 28] 512 Conv2D-10 [[1, 128, 28, 28]] [1,  BatchNorm2D-13 [[1, 256 , 14, 14]] [1, 256, 14, 14] 1.024  Conv2D-10 [[1, 128, 28, 28]] [1, 128, 28, 28] 147.456  BatchNorm2D-10 [[1, 128, 28, 28]] [1, 128, 28, 28] 512  BasicBlock- 4 [[1, 128, 28, 28]] [1, 128, 28, 28] 0  Conv2D-12 [[1, 128, 28, 28]] [1, 256, 14, 14] 294.912 BatchNorm2D  -12 [ [1, 256, 14, 14]] [1, 256, 14, 14] 1.024  ReLU-6 [[1, 256, 14, 14]] [1, 256, 14, 14] 0 Conv2D  -13 [[1 , 256, 14, 14]] [1, 256, 14, 14] 589.824  Conv2D-11 [[1, 128, 28, 28]] [1, 256, 14, 14] 32.768  BatchNorm2D-11 [[1, 256 , 14, 14]] [1, 256, 14, 14] 1.024 
   BasicBlock-5      [[1, 128, 28, 28]]    [1, 256, 14, 14]          0       
     Conv2D-14       [[1, 256, 14, 14]]    [1, 256, 14, 14]       589,824    
  BatchNorm2D-14     [[1, 256, 14, 14]]    [1, 256, 14, 14]        1,024     
      ReLU-7         [[1, 256, 14, 14]]    [1, 256, 14, 14]          0       
     Conv2D-15       [[1, 256, 14, 14]]    [1, 256, 14, 14]       589,824    
  BatchNorm2D-15     [[1, 256, 14, 14]]    [1, 256, 14, 14]        1,024     
   BasicBlock-6      [[1, 256, 14, 14]]    [1, 256, 14, 14]          0       
     Conv2D-17       [[1, 256, 14, 14]]     [1, 512, 7, 7]       1,179,648    BatchNorm2D-17 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,048 
      ReLU-8          [[1, 512, 7, 7]]      [1, 512, 7, 7]           0        Conv2D-18 [[1, 512, 7, 7]] [1, 512, 7, 7] 2,359,296  BatchNorm2D- 18 [[1, 512, 7, 7]] [1, 512, 7, 7] 2.048  Conv2D-16 [[1, 256, 14, 14]] [1, 512, 7, 7] 131.072  BatchNorm2D-16 [ [1, 512, 7, 7]] [1, 512, 7, 7] 2.048  BasicBlock-7 [[1, 256, 14, 14]] [1, 512, 7, 7] 0  Conv2D-19 [[1 , 512, 7, 7]] [1, 512, 7, 7] 2,359,296  BatchNorm2D-19 [[1, 512, 7, 7]] [1, 512, 7, 7] 2.048  ReLU-9 [[1, 512 , 7, 7]] [1, 512, 7, 7] 0 
     Conv2D-20        [[1, 512, 7, 7]]      [1, 512, 7, 7]       2,359,296   
  BatchNorm2D-20      [[1, 512, 7, 7]]      [1, 512, 7, 7]         2,048     
   BasicBlock-8       [[1, 512, 7, 7]]      [1, 512, 7, 7]           0       
AdaptiveAvgPool2D-1   [[1, 512, 7, 7]]      [1, 512, 1, 1]           0       
     Linear-1            [[1, 512]]             [1, 4]             2,052     
===============================================================================
Total params: 11,188,164
Trainable params: 11,168,964
Non-trainable params: 19,200
-------------------------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 57.04
Params size (MB): 42.68
Estimated Total Size (MB): 100.30
-------------------------------------------------------------------------------

{'total_params': 11188164, 'trainable_params': 11168964}
# Call Paddle's VisualDL module and save the information to the directory.
#log_dir (str) - The path where the output log is saved.
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir')

4.6.3 Training configuration

Optimizer configuration

After encapsulating the model with paddle.Model, you need to configure the model before training. Use the Model.prepare interface to make advance configuration preparations for training, including setting the model optimizer, loss calculation method, accuracy calculation method, etc.

 
  • The learning rate (learning_rate) parameter is important.
  • If the accuracy rate fluctuates during the training process, ranging from high to low, you can try lowering the learning rate.
#Use the Model.prepare interface to configure and prepare for training in advance, including setting the model optimizer, Loss calculation method, accuracy calculation method, etc.
# Optimizer API documentation:  https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html#paddle-optimizer
# Learning rate decay strategy
# Learning rate decay strategy API  documentation: https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/Overview_cn.html#about-lr
scheduler_StepDecay = paddle.optimizer.lr.StepDecay(learning_rate=0.1, step_size=50, gamma=0.9, verbose=False)
scheduler_PiecewiseDecay = paddle.optimizer.lr.PiecewiseDecay(boundaries=[1001000400050006000], values=[0.10.50.010.005], verbose=False)
# Try using SGD, Momentum methods
sgd = paddle.optimizer.SGD(
                learning_rate=scheduler_StepDecay, 
                parameters=model.parameters())
adam = paddle.optimizer.Adam( 
                learning_rate= 0.01 #Adjust parameters
                parameters=model.parameters())
model.prepare(optimizer= adam, # adam
              loss=paddle.nn.CrossEntropyLoss(),
              metrics=paddle.metric.Accuracy())
Computing resource configuration

Set the specific computing resources used for this calculation.
First, you can view the computing devices you are currently using. (This step is not required.)
Then, set up the computing device used for this training.

# Check the current computing device
device = paddle.device.get_device()
print(device)
# Use GPU training
device = paddle.set_device('gpu'# or 'cpu'
print(device)

4.6.4 Training model

After completing the preliminary preparations for model training, we formally call the fit() interface to start the training process. We need to specify at least the following three key parameters: training data set, training rounds and single training data batch size.


 

Training time description:

  • Running 10 epochs on the CPU takes about 1.5 hours;
  • Running 10 epochs on the GPU takes about 30 minutes;
# fit API文档: https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/Model_cn.html#fit-train-data-none-eval-data-none-batch-size-1-epochs-1-eval-freq-1-log-freq-10-save-dir-none-save-freq-1-verbose-2-drop-last-false-shuffle-true-num-workers-0-callbacks-none
# 启动模型训练,指定训练数据集,设置训练轮次,设置每次数据集计算的批次大小,设置日志格式
#epochs:总共训练的轮数
#batch_size:一个批次的样本数量
#如果提示内存不足,可以尝试将batch_size调低
#verbose:日志显示,0为不在标准输出流输出日志信息,1为输出进度条记录,2为每个epoch输出一行记录;1为输出进度条记录,2为每个epoch输出一行记录
model.fit(train_dataset,
          val_dataset,
          epochs=1,
          batch_size=2,
          callbacks=callback,
          verbose=1)
The loss value printed in the log is the current step, and the metric is the average value of previous steps.
Epoch 1/1
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return (isinstance(seq, collections.Sequence) and
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:641: UserWarning: When training, we now always track global mean and variance.
  "When training, we now always track global mean and variance.")
step 2904/2904 [==============================] - loss: 0.0549 - acc: 0.5786 - 62ms/step         
Eval begin...
step 415/415 [==============================] - loss: 0.0867 - acc: 0.7819 - 23ms/step        
Eval samples: 830

4.6.5 模型评估和保存

模型训练结束后,我们得到了一个训练好的模型,但是这个模型效果怎么样,还需要我们去具体做下评估。

什么是模型评估呢?

  • 模型评估其实就是:使用我们预留的测试数据放到所得到的模型中进行实际的预测,并基于标签进行校验,来看模型在测试集上的表现。
  • 模型评估的代码实现,在高层API中也非常地简单,我们事先定义好用于评估使用的数据集后,对于训练好的模型进行评估操作可以使用model.evaluate接口;操作结束后会根据prepare接口配置的loss和metric来进行相关指标计算返回。

本实验评价指标:

本次实验,我们采用的评价指标是 准确率(accuracy),简称acc

同学们相互之间比较一下,你的模型评估结果怎么样?你的acc值达到多少了?

该实验如果进行了合理的 数据增强,准确率( accuracy)是可以达到很高的,请大家努力把acc值提升到90%以上。

#模型评估
#对于训练好的模型进行评估操作可以使用 model.evaluate 接口;操作结束后会根据 prepare 接口配置的 loss 和 metric 来进行相关指标计算返回。
#Evaluation  indicator reference document: https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/Model_cn.html#evaluate-eval-data-batch-size-1-log-freq-10- verbose-2-num-workers-0-callbacks-none
model.evaluate(test_dataset, verbose=1)
Eval begin...
step 67/67 [==============================] - loss: 0.0052 - acc: 0.7910 - 13ms/step        
Eval samples: 67
{'loss': [0.0051732725], 'acc': 0.7910447761194029}
#Model save
model.save('./saved_model/saved_model')  # save for training

4.6.6 Model prediction

In the above steps, we have completed model training, model evaluation, and model saving; if the model performs well after evaluation, it can be used. We can use this saved model to make predictions.

How to make model predictions?

  • The model.predict interface is provided in the high-level API of Flying Paddle to facilitate users to predict the trained model;
  • We only need to put the "predicted data + saved model" into the model.predict interface for calculation. The interface will return the prediction results calculated by the model, thereby completing our task.
#predictive model
results = model.predict(test_dataset)
Predict begin...
step 67/67 [==============================] - 12ms/step         
Predict samples: 67
# Observe result
print(type(results)) #list
print(len(results)) #len == 1
#Print the results line by line
for i in results[0]:
    print(i)
<class 'list'>
1
[[ 0.4494054   1.8589294  -2.709025   -0.98785317]]
[[ 0.80108535  2.0312922  -2.3985271  -1.667168  ]]
[[-0.487098    2.5169828  -3.8384209   0.09941977]]
[[ 1.1755923  1.9356494 -2.7956083 -1.824508 ]]
[[ 0.6587918  1.5227697 -1.9370861 -1.2466118]]
[[ 1.9423198  1.8514836 -2.0579038 -3.0512297]]
[[-0.12070499  2.1658874  -3.2705145  -0.2214822 ]]
[[ 2.30185    1.9300838 -2.6378424 -3.3231502]]
[[ 1.7931688  1.7564571 -2.713827  -2.3772974]]
[[ 1.018136   1.9348547 -2.1037087 -2.093875 ]]
[[ 1.2455556  1.7356219 -2.3573794 -1.9229555]]
[[ 1.3166553  2.0454793 -2.1393437 -2.5154655]]
[[ 2.2485528  2.5826378 -2.3228188 -4.113832 ]]
[[ 0.6856951  1.9657588 -2.340539  -1.5627216]]
[[ 0.34038985  2.5555618  -3.4037375  -1.1876322 ]]
[[ 1.7155951  2.2181606 -2.2069125 -3.0874062]]
[[-0.9589406  2.3568041 -3.914858   0.8861027]]
[[-2.2687616  3.561953  -6.1434994  2.204158 ]]
[[-0.8965972   2.812673   -4.498936    0.67248255]]
[[-1.7266133  3.0567627 -5.3219457  1.823607 ]]
[[-1.2236824  2.9153998 -5.2624416  1.1972692]]
[[-1.6313993  2.393093  -4.390437   1.8520648]]
[[-2.261466   3.1709478 -5.7391357  2.475055 ]]
[[-2.0998657  2.7529852 -5.1272326  2.396462 ]]
[[-1.6497151  2.9010382 -5.0573497  1.7648369]]
[[-2.6754675  2.9362612 -5.56551    2.9678605]]
[[-1.073315   2.3352654 -4.07773    1.1857122]]
[[-0.88414484  2.4533503  -4.0443926   0.775055  ]]
[[-1.7560171  3.3508494 -5.375548   1.4013046]]
[[-2.615417   4.013784  -6.8865647  2.4297483]]
[[-1.829337   3.1974657 -5.3266735  1.5116838]]
[[-1.1488906  2.4435222 -4.151718   1.1106087]]
[[-2.672726   3.7604275 -6.60363    2.6530373]]
[[-1.3436769  2.810868  -4.783174   1.3363845]]
[[-7.1727552 -4.178957   6.645717   1.3258969]]
[[-10.802859   -8.898961   13.038587    0.8829916]]
[[-6.100724  -3.6756551  5.3887143  2.429795 ]]
[[-6.956199  -4.8285522  7.192293   1.4987972]]
[[-6.806343  -4.737133   7.0949545  1.9803424]]
[[-10.631139   -8.797351   12.851841    0.9559243]]
[[-9.890509  -7.7998743 11.965744   1.0906614]]
[[-6.637445  -4.125729   6.246958   2.3932679]]
[[-4.850948   -3.7300088   5.50579    -0.28020984]]
[[-5.89312   -3.9382315  5.5570445  1.115171 ]]
[[-9.489717  -7.5113807 11.062157   1.4899993]]
[[-4.060526  -4.7304277  7.44195   -1.7170902]]
[[-6.123046  -5.145837   7.891695  -0.3783728]]
[[-6.7471647  -5.1568007   7.3376994  -0.14631017]]
[[-5.768033  -6.0288777  9.360904  -1.9037125]]
[[-7.037687  -5.0647235  7.345336   1.0650041]]
[[-6.3333025 -4.003666   6.096233   2.0686429]]
[[-8.165305  -4.0971665  5.59594    4.208836 ]]
[[-6.3591156 -0.0809775 -2.1494312  5.8446784]]
[[-5.998541  -0.3071279 -1.633659   5.444659 ]]
[[-5.982375   -0.13737446 -2.0219755   5.588227  ]]
[[-6.2784123  -0.28474385 -1.8074901   5.720227  ]]
[[-5.9097333   0.21499354 -2.4844441   5.4800773 ]]
[[-5.815046    0.34615326 -2.749436    5.516311  ]]
[[-6.144201    0.20839332 -2.5092714   5.6507225 ]]
[[-6.217258   -0.11974069 -2.2099724   5.8341565 ]]
[[-6.0395765   0.08458082 -2.2998967   5.641852  ]]
[[-6.292765   -0.22815469 -1.8958219   5.7871137 ]]
[[-5.9349203   0.03097157 -2.209548    5.578063  ]]
[[-4.8454432  0.6837326 -2.8405902  4.569208 ]]
[[-5.5436296 -0.4322207 -1.2610528  5.0055714]]
[[-5.8578863  -0.32924837 -1.6607574   5.3581743 ]]
[[-5.7073674   0.08094054 -2.3335297   5.431057  ]]
# Process the result with softmax and turn it into a probability value
x = paddle.to_tensor(results[0])
m = paddle.nn.Softmax()
out = m(x)
print(out)
Tensor(shape=[67, 1, 4], dtype=float32, place=CUDAPlace(0), stop_gradient=True,
       [[[0.18607847, 0.76180643, 0.00790692, 0.04420818]],

        [[0.21990354, 0.75249618, 0.00896723, 0.01863303]],

        [[0.04347746, 0.87683898, 0.00152336, 0.07816018]],

        [[0.31181487, 0.66678441, 0.00587796, 0.01552279]],

        [[0.27809274, 0.65979725, 0.02074026, 0.04136980]],

        [[0.51592660, 0.47112727, 0.00944741, 0.00349878]],

        [[0.08482961, 0.83483732, 0.00363582, 0.07669736]],

        [[0.58813888, 0.40553078, 0.00420919, 0.00212116]],

        [[0.50240386, 0.48429418, 0.00554229, 0.00775965]],

        [[0.27857813, 0.69674349, 0.01227855, 0.01239989]],

        [[0.37013263, 0.60421354, 0.01008376, 0.01557007]],

        [[0.31991184, 0.66306269, 0.01009506, 0.00693045]],

        [[0.41515639, 0.57983309, 0.00429428, 0.00071625]],

        [[0.21048497, 0.75708681, 0.01020809, 0.02222010]],

        [[0.09612054, 0.88075089, 0.00227385, 0.02085473]],

        [[0.37300166, 0.61655551, 0.00738223, 0.00306051]],

        [[0.02863417, 0.78866822, 0.00148986, 0.18120776]],

        [[0.00232973, 0.79350960, 0.00004836, 0.20411235]],

        [[0.02143462, 0.87504727, 0.00058431, 0.10293392]],

        [[0.00643685, 0.76924914, 0.00017670, 0.22413737]],

        [[0.01332989, 0.83638644, 0.00023486, 0.15004875]],

        [[0.01116226, 0.62454951, 0.00070716, 0.36358106]],

        [[0.00290894, 0.66527551, 0.00008983, 0.33172569]],

        [[0.00456953, 0.58538061, 0.00022136, 0.40982854]],

        [[0.00792769, 0.75078166, 0.00026256, 0.24102813]],

        [[0.00179510, 0.49116838, 0.00009976, 0.50693673]],

        [[0.02448242, 0.73991507, 0.00121354, 0.23438902]],

        [[0.02903091, 0.81717736, 0.00123135, 0.15256041]],

        [[0.00527186, 0.87065840, 0.00014126, 0.12392850]],

        [[0.00109510, 0.82885396, 0.00001529, 0.17003568]],

        [[0.00550288, 0.83888549, 0.00016662, 0.15544505]],

        [[0.02129946, 0.77363062, 0.00105744, 0.20401244]],

        [[0.00120668, 0.75071740, 0.00002368, 0.24805219]],

        [[0.01260382, 0.80315262, 0.00040434, 0.18383917]],

        [[0.00000099, 0.00001980, 0.99510950, 0.00486970]],

        [[0.00000000, 0.00000000, 0.99999475, 0.00000526]],

        [[0.00000973, 0.00011000, 0.95056945, 0.04931074]],

        [[0.00000071, 0.00000600, 0.99663687, 0.00335647]],

        [[0.00000091, 0.00000722, 0.99401951, 0.00597238]],

        [[0.00000000, 0.00000000, 0.99999321, 0.00000682]],

        [[0.00000000, 0.00000000, 0.99998105, 0.00001892]],

        [[0.00000248, 0.00003062, 0.97920632, 0.02076050]],

        [[0.00003168, 0.00009718, 0.99681073, 0.00306045]],

        [[0.00001052, 0.00007432, 0.98827934, 0.01163586]],

        [[0.00000000, 0.00000001, 0.99993038, 0.00006964]],

        [[0.00001010, 0.00000517, 0.99987948, 0.00010525]],

        [[0.00000082, 0.00000218, 0.99974102, 0.00025600]],

        [[0.00000076, 0.00000375, 0.99943382, 0.00056168]],

        [[0.00000027, 0.00000021, 0.99998665, 0.00001282]],

        [[0.00000057, 0.00000407, 0.99812609, 0.00186927]],

        [[0.00000393, 0.00004036, 0.98245114, 0.01750455]],

        [[0.00000084, 0.00004937, 0.80008936, 0.19986045]],

        [[0.00000500, 0.00266204, 0.00033643, 0.99699652]],

        [[0.00001068, 0.00316434, 0.00083980, 0.99598515]],

        [[0.00000940, 0.00324916, 0.00049351, 0.99624795]],

        [[0.00000613, 0.00245906, 0.00053635, 0.99699843]],

        [[0.00001125, 0.00514054, 0.00034567, 0.99450254]],

        [[0.00001192, 0.00565004, 0.00025565, 0.99408239]],

        [[0.00000751, 0.00430946, 0.00028455, 0.99539846]],

        [[0.00000582, 0.00258814, 0.00032005, 0.99708599]],

        [[0.00000841, 0.00384306, 0.00035409, 0.99579442]],

        [[0.00000566, 0.00243412, 0.00045929, 0.99710089]],

        [[0.00000996, 0.00388200, 0.00041306, 0.99569499]],

        [[0.00007983, 0.02011120, 0.00059271, 0.97921628]],

        [[0.00002605, 0.00432196, 0.00188679, 0.99376523]],

        [[0.00001340, 0.00337382, 0.00089095, 0.99572182]],

        [[0.00001447, 0.00472310, 0.00042231, 0.99484009]]])
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:125: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. 
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  if data.dtype == np.object:

In order to observe the prediction results, we also need to convert the labels:


 
#Use a dictionary to name the value corresponding to the tag
label_dic = {}
for i, label in enumerate(labels):
    label_dic[i] = label
#Write the predicted label results to predict_labels
predict_labels = []
#Get the prediction array of each picture in results[0] in turn
for result in results[0]: 
    #np.argmax: Returns the index of the maximum value in a numpy array
    #Note: The index is the label, not the maximum value of the returned data
    lab_index = np.argmax(result)
    lab = label_dic[lab_index]
    predict_labels.append(lab)
#Look at the prediction results
print(predict_labels)
['M2', 'M2', 'M2', 'M2', 'M2', 'B1', 'M2', 'B1', 'B1', 'M2', 'M2', 'M2', ' M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2' , 'S3', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'M2', 'R0', 'R0', 'R0', ' R0', 'R0', 'R0', 'R0', 'R0', 'R0', 'R0', 'R0', 'R0', 'R0', 'R0', 'R0', 'R0' , 'R0', 'R0', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3', 'S3','S3', 'S3', 'S3', 'S3', 'S3', 'S3'] 

In order to observe the prediction effect more intuitively, we generate a result.csv file, list the prediction results in this csv file, and run the following code to generate a result.csv file in the current directory.

Open the result.csv file and we can see the results:


 
final_result = [ ]
file_name_test = train_parameters['test_list_dir'
f = open(file_name_test, 'r'
#Read data row by row
data = f.readlines()
for  i  in  range ( len (data)):
    #Split each line of data into 2 parts according to spaces, and take the path name and image file name of the first part, for example: R0/1.png
    img_path = data[i].split('\t')[0]
    final_result.append(img_path + ',' + str(predict_labels[i]) + '\n')
f.close( )
with open('result.csv',"w"as f: 
    f.writelines(final_result)

5. Summary

In this experiment, we used the paddlepaddle deep learning framework to build an image classification model and completed the peach classification task.

Through this experiment, we learned:

  • How to use the paddlepaddle deep learning framework;
  • How to use the paddlepaddle deep learning framework to build a peach classification model;
  • How to complete the deep learning work process such as model training, evaluation, saving, and prediction;

The image classification task is a basic task in the field of computer vision (CV). Although it is not difficult, it is very important and is the cornerstone of other computer vision tasks. We must do more hands-on work, debug more code, increase proficiency, and lay the foundation for more complex deep learning projects.

Part of the article is reproduced in Image Classification-Peach Sorting-Paddle AI Studio Galaxy Community (baidu.com). Other parts are combined from the original. If you like this article, please like, collect and follow!

Guess you like

Origin blog.csdn.net/m0_63309778/article/details/133513426