Object Recognition in Deep Learning

 blogger profile

The blogger is a sophomore who focuses on artificial intelligence research. Thanks for letting us meet at CSDN, bloggers are committed to sharing knowledge about artificial intelligence, c++, Python, crawlers, etc. here. If you need friends, you can follow the blogger, and the blogger will continue to update. If there are mistakes, you can correct them.

Column introduction:  This column mainly studies computer vision, involving algorithms, case practices, network models and other knowledge. Including some commonly used data processing algorithms, it will also introduce many Python third-party libraries. If you want, click here to subscribe to the column    .

Let me share with you a sentence I like very much: "Work harder every day, not for anything else, just for the future, to have more choices, choose comfortable days, and choose people you like!"


Table of contents

​Editor's Preface

​Editing the concept of object recognition

​Editing the use of neural networks 

​Edit the method of building the dataset

​Editing and building a neural network

​Editing training and effect evaluation

​EDIT to address overfitting

​Edit Data Augmentation

​Edit transfer learning


foreword

Earlier we introduced the neural network in deep learning, so this year we will officially enter deep learning. Earlier we introduced the four major tasks of computer vision: target recognition, target detection, target tracking and target segmentation. The most basic of these is target recognition. Almost all computer vision knowledge is built on the target recognition problem, which means that the target recognition problem constitutes the foundation of the entire computer vision. If we cannot solve the recognition problem, we cannot build our computer vision building.

In this chapter, we will fully understand target recognition through an actual project. The dataset we use in this section: Download the dataset .

The concept of object recognition

The concept of target recognition has been introduced earlier. The reason why computer vision is called computer vision is because it is a conceptual model that is based on computers and imitates human vision. But a computer is always a computer. It cannot directly distinguish objects like a human being, but marks them. For example, when we recognize a bunch of photos, the cat in it is marked as 1 after it is successfully recognized, and it is marked as 2 if it is a dog. That is to say, his recognition is already planned data and cannot adapt to the situation.

Secondly, the computer outputs the probability of the object category. For example, the probability of the first category is 0.9, and the probability of the second category is 0.1. Finally, the category corresponding to the maximum probability is output, which is different from human recognition of objects. From this It can be seen from the aspect that the computer is very rigorous, because he will not think that the probability of a certain category is 100%.

Since the computer outputs probabilities for each category, there is an extended concept: top k accuracy rate, as the name implies, is to output the k categories with the highest output probability. As long as one of them is guessed correctly, the computer is considered to be right. This is in It is very common in the evaluation of target recognition, because there are often multiple targets in a picture, and there is only one label, so it is unreasonable to simply decide whether to win or lose at one time. The value of k is determined by the number of categories, generally 5~10 .

Use of Neural Networks 

Methods for building datasets

In the learning of neural networks, the use of data sets is very important. In this process, we can use open source data sets or create data sets ourselves. Next, we are going to use the cat and dog war dataset.

(1) Build the picture into the same size, which is fixed with the size of the input picture required by the general convolutional neural network.

(2) Construct data labels for each picture. For the cat and dog war, the picture cat is marked as 0, and the picture dog is marked as 1.

(3), divide the data set into training set and test set, the general ratio is 4:1, or 5:1. In order to prevent overfitting, we need to train on the training set, and then test the test set, when the training set When the final performance of the test set is similar, we can consider that the model is not overfitting, and the final result also needs to use the accuracy rate on the test set.

(4), in batches, for deep learning, we generally use the small batch gradient descent algorithm, so we need to determine the number of pictures in each batch, the number needs to be determined by our CPU or GPU memory, generally take 64 or 129 pictures as a batch.

(5) Randomly shuffle the order of the pictures in the training set. In order to improve the training results, we need to randomly shuffle the order of the data set after each training of the data set to ensure that the pictures input in each batch are completely random. , otherwise it is easy to fall into a local extremum.

Come on, show, code:

import tensorflow as tf
import os
#读取数据集并构建数据集
_URL='https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
#解压
path_to_zip=tf.keras.utils.get_file('cats_and_dogs.zip',origin=_URL,extract=True)
PATH=os.path.join(os.path.dirname(path_to_zip),'cats_and_dogs_filtered')
#分为训练集和测试集
train_dir=os.path.join(PATH,'train')
validation_dir=os.path.join(PATH,'validation')

#分为猫图片和狗图片
train_cats_dir=os.path.join(train_dir,'cats')
train_dogs_dir=os.path.join(train_dir,'dogs')
validation_cats_dir=os.path.join(validation_dir,'cats')
validation_dogs_dir=os.path.join(validation_dir,'dogs')
#批次大小
batch_size=64
epochs=20
#图片输入大小为150*150
IMG_HEIGHT=150
IMG_WIDTH=150
#从目录生成数据集,shuffle表示随机打乱数据顺序
train_data_gen=tf.keras.preprocessing.image.ImageDataGenerator()
train_data_gentor=train_data_gen.flow_from_directory(batch_size=batch_size,directory=train_dir,
                                                                                   shuffle=True,target_size=(IMG_HEIGHT,IMG_WIDTH),class_mode='binary')
val_data_gen=tf.keras.preprocessing.image.ImageDataGenerator()
val_data_gentor=val_data_gen.flow_from_directory(batch_size=batch_size,directory=validation_dir
                                                                                 ,target_size=(IMG_HEIGHT,IMG_WIDTH),class_mode='binary')

Build a neural network

Next, we need to build a suitable neural network according to the size of the picture. For beginners, it is recommended to use a neural network with about 10 layers. In general, there are input and output size constraints only for the first and last layers of a neural network. For example, the input of the first layer needs the shape of the image, while the output of the last layer needs to be the number of object categories.

The first layer: 3x3 convolutional layer, 32 output channels, the input shape is the shape of the picture: 150x150x3, filled with 1 pixel, and the activation function is relu().

The second layer: 2x2 maximum pooling layer.

The third layer: 3x3 convolutional layer, 64 output channels, filling one pixel, and the activation function is relu().

The fourth layer: 2x2 maximum pooling layer.

The fifth layer: 3x3 convolutional layer, 64 output channels, filling one pixel, and the activation function is relu().

The sixth layer: 2x2 maximum pooling layer.

The seventh layer: the output is a fully connected layer of 256 dimensions, and the activation function is relu().

The eighth layer: the output is a 1-dimensional fully connected layer, and the activation function is sigmoid().


#搭建神经网络
#每一行代表神经网络的一层
model=tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32,3,padding='same',activation='relu',input_shape=(IMG_HEIGHT,IMG_WIDTH,3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64,3,padding='same',activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64,3,padding='same',activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256,activation='relu'),
    tf.keras.layers.Dense(1,'sigmoid')
])

Training and Effect Evaluation

In order to prevent possible errors, we do not train on the entire data machine, but train on a small-scale data set to ensure that the model can be overfit on a small-scale data set, and then use the entire data set.

Next, we can train. We need to choose an optimizer. Generally speaking, the Adam optimizer can solve most problems. For the loss function, we generally choose cross-entropy loss. In this example, we use binary cross-entropy. During the training process, we can deploy and record the current loss and accuracy rate in the United States, so as to judge the effect of model training. When we find that the loss function is no longer decreasing, the training should be stopped immediately.


#训练
#编译模型,输入优化器,损失函数,训练过程需要保存的特征
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])
#训练
history=model.fit_generator(
    train_data_gen,
    steps_per_epoch=100//batch_size, #每轮的步数
    epochs=epochs,
    validation_data=val_data_gen,
    validation_steps=100//batch_size
)

Solve overfitting

When the data set we use is relatively small, we need to use certain methods to prevent overfitting. Therefore, the overfitting problem can be solved by reducing the model parameters.


model1=tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32,3,padding='same',activation='relu',input_shape=(IMG_HEIGHT,IMG_WIDTH,3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(32,3,padding='same',activation='relu'),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(1,activation='sigmoid')
])

The second method is to increase the regularization term, commonly used for L1 and L2 regularization methods. In the neural network, we generally use the L2 regularization method. We need to adjust the weight coefficient. There is a magical value of 0.0005, which can be used as the weight coefficient for most problems.


model=tf.keras.Sequential([
    tf.keras.layers.Conv2D(32,3,padding='same',activation='relu',input_shape=(IMG_HEIGHT,IMG_WIDTH,3),kernel_regularizer=tf.keras.regularizers.l2(l=0.0005)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64,3,padding='same',activation='relu',input_shape=(IMG_HEIGHT,IMG_WIDTH,3),kernel_regularizer=tf.keras.regularizers.l2(l=0.0005)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Conv2D(64,3,padding='same',activation='relu',input_shape=(IMG_HEIGHT,IMG_WIDTH,3),kernel_regularizer=tf.keras.regularizers.l2(l=0.0005)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256,activation='relu',kernel_regularizer=tf.keras.regularizers.l2(l=0.0005)),
    tf.keras.layers.Dense(1,activation='sigmoid',kernel_regularizer=tf.keras.regularizers.l2(l=0.0005))
])

The third method is to add the Dropout layer. The principle of the Dropout layer has been mentioned above. Generally speaking, the effect of the Dropout layer is better than the first two. We need to adjust the probability of deleting neurons, which is generally set to 0.5.


#增加Dropout层
model=tf.keras.Sequential([
    tf.keras.layers.Conv2D(32,3,padding='same',activation='relu',input_shape=(IMG_HEIGHT,IMG_WIDTH,3)),
    tf.keras.layers.MaxPooling2D(),
    tf.keras.layers.Dropout(0.5), #设置Dropout层
    tf.keras.layers.Conv2D(64,3,padding='same',activation='relu'),
    tf.keras.layers.Dropout(0.5), #Dropout层
    tf.keras.layers.Conv2D(64,3,padding='same',activation='relu'),
    tf.keras.layers.MaxPooling2D(), #池化层
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256,activation='relu'), #全连接层
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(1,activation='sigmoid')

])

Of course, we can combine the first few methods together to form an optimal model. Finally, for deep learning, there is another very important hyperparameter, which is learning efficiency. Generally speaking, we can start to adjust from 0.001. When the learning rate is too high, it is difficult for us to obtain high-precision results; when the learning rate is too small, the training time is very long.



#调整学习率
#学习率先用0.001训练
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),loss='sparse_categorical_crossentropy',metrics=['accuracy'])
#学习率调小为原来的1/10
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),loss='sparse_categorical_crossentropy',metrics=['accuracy'])

data augmentation

If we use very little data, then the data we get will not be very accurate. At this time, data enhancement is needed. In fact, to put it simply, it is to increase the data set. For example, if you have 2,000 pictures, we can change the pictures by flipping, changing colors, etc., so as to increase the data. In the subsequent training, if you still use 2000 pictures for training in each round, but the pictures in each round are different. After random transformation, the obtained data model will be more accurate.


#随即水平反转
image_gen=tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,horizontal_flip=True)
#随机竖直翻转
image_gen=tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,vertical_flip=True)
#随即旋转
image_gen=tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,rotation_range=45)
#随即缩放,zoom_range在0~1表示图片缩放比例范围[1-zoom_range,1+zoom_range]
image_gen=tf.keras.preprocessing.image.ImageDataGenerator(rescale=1./255,zoom_range=0.5)
#全部应用
image_gen_train=tf.keras.preprocessing.image.ImageDataGenerator(
    rescale=1./255,
    rotation_range=45,
    width_shift_range=.15,
    height_shift_range=.15,
    horizontal_flip=True,
    vertical_flip=True,
    zoom_range=0.5
)

transfer learning

What is transfer learning?

In fact, the simple point is to borrow other people's models to train your own tasks. The advantage is that it is fast and effective. For example, the knowledge in books is very detailed and a lot, but it may not be so fast to understand, but if someone explains it to you, will you be able to understand it quickly? That is to absorb other people's things.

There are two main methods of transfer learning: the first one is called fine-tuning (Fine Tune), as the name implies, it is to make fine adjustments to the trained model. Generally, we will adjust the last few layers of the entire model;

The second method is called adding layers, which is to add a few layers at the end of the model, and then train these layers.

Let me borrow the ResNet50 model to briefly introduce how to use transfer learning:



#选则基础模型
base_model=tf.keras.applications.ResNet50(weights='imagenet')
base_model.summary()
#将基础模型的参数设置为不可训练
base_model.trainable=False
#加层
prediction_layer1=tf.keras.layers.Dense(128,activation='relu')
prediction_layer2=tf.keras.layers.Dense(1,activation='sigmoid')
model=tf.keras.Sequential([
    base_model,
    prediction_layer1,
    prediction_layer2
])
#微调
fine_tune_at=150
for layer in base_model.layers[fine_tune_at:]:
    layer.trainable=True
base_model.summary()
prediction_layer=tf.keras.layers.Dense(1,activation='sigmoid')
model=tf.keras.Sequential([
    base_model,
    prediction_layer
])

 

 

Well, here we have a preliminary understanding of how to build our own neural network and data sets and some conventional methods. In the next section, we will start to learn about vision in neural networks. Goodbye to you!

Like and pay attention to not get lost

Guess you like

Origin blog.csdn.net/qq_59931372/article/details/129974860