Cat and Dog Recognition Based on Convolutional Neural Network (CNN)

Table of contents

introduction

1. What is a Convolutional Neural Network?

1.1 What is a neural network?

1.2 What is convolution?

2. Preparations

2.1 Some knowledge:

2.2keras

2.3Conv2D

2.4 MaxPooling2D

3. Cat and dog recognition based on convolutional neural network

3.1 Import necessary libraries

3.2 Model Definition

3.3 Instantiate the model and train

3.4 Get verified pictures

3.5 Verifying

3.6 Display prediction results

 4. Summary

5. Code and Dataset


introduction

First, let's look at a few pictures:

As humans, we can easily recognize that the first picture is a cat and the last two are dogs. Why do we know? Because since childhood, our parents, our teachers, and everyone around us pointed at dogs and told us that it was a dog, and pointed at cats and told us that it was a cat. In other words, we were taught to recognize dogs and cats now. And what about computers? No one taught it. In its eyes, these colorful pictures are nothing more than a digital matrix composed of pixels. How to make him recognize it? This is the convolutional neural network used today.


1. What is a Convolutional Neural Network?

1.1 What is a neural network?

As the name suggests, a neural network is something that resembles neurons in the human brain. We who have studied biology know that neurons are connected to each other. After a signal is transmitted, it can be continuously transmitted between neurons, and finally prompts the body to respond, such as immediately withdrawing after being pricked by a needle. You can also think of a neural network as a function in general. After one or more parameters are passed in, one or more parameters are output after a series of transformations. The simplest is to take y=x+1 as an example. When a value is passed in, a value can be output. When x=2 is passed in, 3 is output, and when x=3, 4 is output. However, real neural networks are much more complex. Here is an example of BP neural network:

 For now, all you need to know is that a neural network is a big function, passing in inputs and passing out outputs.

If you want to continue to study in depth, you can refer to: What is a neural network?

1.2 What is convolution?

First, give the formula for convolution:

The product can be seen, the product of f(t) and g(t), where is the volume? The author believes that the volume is in two aspects:

The first volume is that if the images of f(t) and g(t) are placed on the same vertical plane, the lines between the corresponding points are intertwined. If the g function is reversed, is it much more comfortable? What?

 The second volume is that the g function is not equal to the convolution kernel, and the g function must be rotated 180 degrees before the convolution kernel.

Can refer to: Where is the convolutional neural network volume? 

Strongly recommended: From "convolution", to "image convolution operation", and then to "convolutional neural network", three changes in the meaning of "convolution"

2. Preparations

2.1 Some knowledge:

1. The eyes only complete the image intake function, and the key to recognition lies in the human brain. The human brain's recognition of images is abstracted layer by layer.

2. Artificial neurons and neural networks simulate the neurons and their connections in the brain.

3. The picture seen by the computer is a number representing light and shade. Color pictures are composed of RGB three colors.

4. The neural network needs to be trained to get the best model parameters.

5. The main design idea of ​​the convolutional neural network is to make better use of the nature of the picture.

  • The pattern of the picture is much smaller than the picture

  • The patterns in the picture appear in different areas of the image

  • Scaling does not affect objects in the picture

6. The convolutional layer is to scan the features in the picture

7. The maximum pooling layer is to scale the picture and reduce the parameters.

8. After multiple convolutions and pooling, connect a fully connected layer through flatten

2.2keras

  1. keras is an advanced neural network APL written in python

  2. sequential model

import keras
from keras import layers
model = keras.Sequential()  #建立模型
model.add(layers.Dense(20,activation="relu",input_shape=(10,))) # 加了一个全连接层 (神经元数量,激活函数,输入的参数值数量:10个参数)
model.add(layers.Dense(20,activation="relu"))  # 再加一个全连接层
model.add(layers.Dense(10,activation="softmax")) # 同上
model.fit(x,y,epochs=10,batch_size=32)  #模型训练: x是图片,y是图形标签 epochs:每张图片看、训练10遍 batch_size:一次只传入32张图片
  • keras.Sequential() build function

  • model.add() add layer

  • model.fit() trains the model

2.3Conv2D

keras.layers.Conv2D(filters,kernel_size,strides=(1,1),padding="valid",data_formt=None))
  • filters: integer, the dimension of the output space, the number of convolution kernels

  • kernel_size: An integer, or a tuple or list of 2 integers, indicating the width and height of the 2D convolution window, which can be an integer, specifying the same value for all spatial dimensions.

  • strides: an integer, or a tuple or list of 2 integers, indicating the strides of the convolution along the width and height directions. Can be an integer, specifying the same value for all spatial dimensions.

  • padding: "valid" or "same", case-sensitive, used for edge processing.

2.4 MaxPooling2D

keras.layers.MaxPooling2D(pool_size=(2,2),strides=None,padding="valid",data_format =None)
  • pool_size: Integer, or a tuple of 2 integers, factor to scale down in the (vertical, horizontal) direction. (2,2) will reduce both dimensions of the input tensor by half. If only one integer is used, then both dimensions will use the same window length.

  • strides : integer, tuple of 2 integers, or None. Indicates the step value. If None, then the default is pool_size.

  • padding: "valid" or "same"

3. Cat and dog recognition based on convolutional neural network

3.1 Import necessary libraries

import sys
from matplotlib import pyplot
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPool2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

 This part can be installed by yourself, pip install corresponds to the library name, it may be a bit slow due to network problems.

3.2 Model Definition

def define_cnn_model():
    # 使用Sequential序列模型
    model = Sequential()
    # 卷积层
    model.add(Conv2D(32,(3,3),activation="relu",padding="same",input_shape=(200,200,3)))  # 第一层即为卷积层,要设置输入进来图片的样式  3是颜色通道个数
    # 最大池化层
    model.add(MaxPool2D((2,2)))  # 池化窗格
    # Flatten层
    model.add(Flatten())
    # 全连接层
    model.add(Dense(128,activation="relu"))  # 128为神经元的个数
    model.add(Dense(1,activation="sigmoid"))
    # 编译模型
    opt = SGD(lr= 0.001,momentum=0.9)  # 随机梯度
    model.compile(optimizer=opt,loss="binary_crossentropy",metrics=["accuracy"])
    return model

First create a Sequential model

Add a convolution layer, the first parameter is the number of convolution kernels, the second is the specification of the convolution kernel, (3,3) is 3*3, the third parameter is the activation function type, and the fourth One is the edge processing method, and the fifth is because the first layer is the convolutional layer, and the specification of the input image (200, 200, 3) must be defined as 200*200, 3 indicating that it is a color image.

Add another pooling layer, (2,2) means that every 2*2 is transformed into a pane.

Add another Flatten layer to expand the pooled results;

Add another fully connected layer, the first parameter is the number of neurons, and the second parameter is the type of activation function;

Finally, add a fully connected layer to output the result. Note that our result needs to judge cats and dogs, so one neuron is enough.

Finally, the model is compiled with stochastic gradients. Students who are interested in this area can consult the information and learn by themselves.

3.3 Instantiate the model and train

def train_cnn_model():
    # 实例化模型
    model = define_cnn_model()
    # 创建图片生成器
    datagen = ImageDataGenerator(rescale=1.0/255.0)
    train_it = datagen.flow_from_directory(
        "./ma1ogo3ushu4ju4ji2/dogs_cats/data/train/",
        class_mode="binary",
        batch_size=64,
        target_size=(200, 200))  # batch_size:一次拿出多少张照片 targe_size:将图片缩放到一定比例
    # 训练模型
    model.fit_generator(train_it,
                        steps_per_epoch=len(train_it),
                        epochs=5,
                        verbose=1)
    model.save("my_model.h5")

First call the function in 3.2 to instantiate the model, and then create a picture generator: this function is to pass the pictures in the folder into the model for training, as long as you know it. The parameter batch_size in it stipulates that only 64 pictures can be passed in at a time, which can effectively avoid memory problems. An important parameter in the training model, epochs, is set to 5 here, indicating that he needs to learn 5 times for the incoming picture. For example, here I passed in a total of 2,500 pictures, and it learned five times, which is 12,500 pictures. This kind of repeated learning can effectively improve the progress, but when your value adjustment is relatively large, it will be very time-consuming. Finally, save the trained model to the project folder.

3.4 Get verified pictures

def read_random_image():
    folder = r"./ma1ogo3ushu4ju4ji2/dogs_cats/data/test/"
    file_path = folder + random.choice(os.listdir(folder))
    pil_im = Image.open(file_path, 'r')
    return pil_im

3.5 Verifying

def get_predict(pil_im,model):
    # 首先更改图片的大小
    name = ''
    pil_im = pil_im.resize((200,200))
    # 将格式转为numpy array格式
    array_im = np.asarray(pil_im)
    # array_im = array_im.resize((4,4))
    array_im = array_im[np.newaxis,:]
    #对图像检测
    result = model.predict([[array_im]])
    if result[0][0]>0.5:
        name = "它是狗!"
        print("预测结果是:狗")
    else:
        name = "它是猫!"
        print("预测结果是:猫")
    return name

Note one line of code:

array_im = array_im[np.newaxis,:]

The array_im in the previous line is a three-dimensional array, which does not conform to the operation specification. Here it must be converted into a four-digit array, otherwise an error will be reported!

3.6 Display prediction results

pil_im =read_random_image()
imshow(np.asarray(pil_im))
plt.title(get_predict(pil_im,model))
pylab.show()

You're done here! Take a look at our forecast results:

Overall, the prediction effect is good. When the number of learning times is 5, the accuracy can reach 70%. If you are interested, you can also increase the number of learning times to see the effect! Red warnings are due to GPU and CPU processing issues and you can ignore them for now.

 4. Summary

This cat and dog project is just an introductory project, but the idea is important. Think about it, as long as we have data and a model, can the computer know anything we want to know. Face recognition on mobile phones, face payment on Alipay, and even in the future, is it possible to install a camera on a drone to help us arrest criminal suspects? The world of artificial intelligence is very open, and the future is waiting for us to explore!

The author is a beginner, so it is inevitable that there will be mistakes. Everyone is welcome to correct me. If you are interested, you can communicate together in the comment area!

5. Code and Dataset

The first part of the code is:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
"""
@Project :神经网络猫狗识别 
@File    :CNN.py
@IDE     :PyCharm 
@Author  :咋
@Date    :2022/10/2 10:37 
"""
import sys
from matplotlib import pyplot
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPool2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.optimizers import SGD
from keras.preprocessing.image import ImageDataGenerator
from keras.models import load_model
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

def define_cnn_model():
    # 使用Sequential序列模型
    model = Sequential()
    # 卷积层
    model.add(Conv2D(32,(3,3),activation="relu",padding="same",input_shape=(200,200,3)))  # 第一层即为卷积层,要设置输入进来图片的样式  3是颜色通道个数
    # 最大池化层
    model.add(MaxPool2D((2,2)))  # 池化窗格
    # Flatten层
    model.add(Flatten())
    # 全连接层
    model.add(Dense(128,activation="relu"))  # 128为神经元的个数
    model.add(Dense(1,activation="sigmoid"))
    # 编译模型
    opt = SGD(lr= 0.001,momentum=0.9)  # 随机梯度
    model.compile(optimizer=opt,loss="binary_crossentropy",metrics=["accuracy"])
    return model

def train_cnn_model():
    # 实例化模型
    model = define_cnn_model()
    # 创建图片生成器
    datagen = ImageDataGenerator(rescale=1.0/255.0)
    train_it = datagen.flow_from_directory(
        "./ma1ogo3ushu4ju4ji2/dogs_cats/data/train/",
        class_mode="binary",
        batch_size=64,
        target_size=(200, 200))  # batch_size:一次拿出多少张照片 targe_size:将图片缩放到一定比例
    # 训练模型
    model.fit_generator(train_it,
                        steps_per_epoch=len(train_it),
                        epochs=5,
                        verbose=1)
    model.save("my_model.h5")
train_cnn_model()

The second part of code:

#!/usr/bin/env python
# -*- coding: UTF-8 -*-
"""
@Project :神经网络猫狗识别 
@File    :CNN_test.py
@IDE     :PyCharm 
@Author  :咋
@Date    :2022/10/2 12:12 
"""

import os,random
import matplotlib.pyplot as plt
from keras.models import load_model
from matplotlib.pyplot import imshow
import numpy as np
from PIL import Image
model_path = "my_model.h5"
model = load_model(model_path)
import pylab
plt.rcParams['font.sans-serif']=['SimHei']
def read_random_image():
    folder = r"./ma1ogo3ushu4ju4ji2/dogs_cats/data/test/"
    file_path = folder + random.choice(os.listdir(folder))
    pil_im = Image.open(file_path, 'r')
    return pil_im

def get_predict(pil_im,model):
    # 首先更改图片的大小
    name = ''
    pil_im = pil_im.resize((200,200))
    # 将格式转为numpy array格式
    array_im = np.asarray(pil_im)
    # array_im = array_im.resize((4,4))
    array_im = array_im[np.newaxis,:]
    #对图像检测
    result = model.predict([[array_im]])
    if result[0][0]>0.5:
        name = "它是狗!"
        print("预测结果是:狗")
    else:
        name = "它是猫!"
        print("预测结果是:猫")
    return name
pil_im =read_random_image()
imshow(np.asarray(pil_im))
plt.title(get_predict(pil_im,model))
pylab.show()

 Dataset download link: Cat and Dog Dataset

Video Teaching Connection: Cat and Dog Recognition Based on Convolutional Neural Networks

Guess you like

Origin blog.csdn.net/weixin_63866037/article/details/127150062