Deep Learning 1. Convolutional Neural Network-CNN

Table of contents

Convolutional Neural Network – CNN

What problem does CNN solve?

The amount of data to be processed is too large

Preserve image features

Principles of Human Vision

Convolutional Neural Networks - Fundamentals of CNN

Convolution - extracting features

Pooling layer (downsampling) - data dimensionality reduction to avoid overfitting

Fully connected layer - output result

What are some practical applications of CNNs?

Summarize

Baidu Encyclopedia + Wikipedia


The convolutional layer is responsible for extracting local features in the image;

The pooling layer is used to greatly reduce the magnitude of parameters (dimension reduction);

The fully connected layer is similar to the part of the traditional neural network and is used to output the desired result.

Convolutional Neural Network – CNN

Convolutional Neural Networks – Image processing is what CNN is best at. It was inspired by the human visual nervous system.

CNN has two major characteristics:

  1. Can effectively reduce the dimensionality of a large amount of data into a small amount of data
  2. Can effectively retain the image features , in line with the principles of image processing

At present, CNN has been widely used, such as: face recognition, automatic driving, Meitu Xiuxiu, security and many other fields.

What problem does CNN solve?

Before CNNs, images were a difficult problem for AI for 2 reasons:

  1. The amount of data that needs to be processed by the image is too large, resulting in high cost and low efficiency
  2. It is difficult to retain the original features of the image during the digitization process, resulting in low accuracy of image processing

The amount of data to be processed is too large

Images are made up of pixels, and each pixel is made up of colors.

An image is made up of pixels, and each pixel is made up of colors

Now any picture is more than 1000×1000 pixels, and each pixel has RGB 3 parameters to represent the color information.

If we process a 1000×1000 pixel image, we need to process 3 million parameters!

1000×1000×3=3,000,000

It is very resource-intensive to process such a large amount of data, and this is just a picture that is not too big!

Convolutional Neural Network – The first problem that CNN solves is to "simplify complex problems", reduce a large number of parameters into a small number of parameters, and then process them.

More importantly: In most scenarios, dimensionality reduction does not affect the results. For example, reducing a 1000-pixel picture to 200-pixel does not affect whether the picture is a cat or a dog with the naked eye, and the same is true for machines.

Preserve image features

Let's simplify the traditional way of digitizing pictures, which is similar to the process shown in the figure below:

Simple digitization of images cannot preserve image features

If there is a circle, it is 1, and if there is no circle, it is 0, then different positions of the circles will produce completely different data expressions. But from a visual point of view, the content (essence) of the image has not changed, only the position has changed .

So when we move the objects in the image, the parameters obtained in the traditional way will be very different! This does not meet the requirements of image processing.

And CNN solves this problem. It retains the features of the image in a visually similar way. When the image is flipped, rotated or transformed, it can also effectively identify similar images.

So how is the convolutional neural network implemented? Before we understand the principle of CNN, let's take a look at what is the principle of human vision?

Principles of Human Vision

Many research results of deep learning are inseparable from the research on the principles of brain cognition, especially the research on the principles of vision.

The 1981 Nobel Prize in Medicine was awarded to David Hubel (a Canadian-born American neurobiologist) and Torsten Wiesel, as well as Roger Sperry. The main contribution of the first two is " the discovery of information processing in the visual system ", and the visual cortex is hierarchical.

The principle of human vision is as follows: start with raw signal intake (pupil intake pixels Pixels), then do preliminary processing (some cells in the cerebral cortex find edges and directions), and then abstract (the brain determines that the shape of the object in front of you is a circle Shaped), and then further abstracted (the brain further determines that the object is a balloon). Here's an example of a human brain doing face recognition:

Principles of Human Vision 1

For different objects, human vision also recognizes them layer by layer in this way:

Principles of Human Vision 2

We can see that the features at the bottom are basically similar, that is, various edges , the higher you go, the more you can extract some features of such objects (wheels, eyes, torso, etc.),

At the top layer, different high-level features are finally combined into corresponding images, allowing humans to accurately distinguish different objects.

Then we can naturally think: Is it possible to imitate this characteristic of the human brain and construct a multi-layer neural network,

The lower layer recognizes the primary image features, and several underlying features form the upper layer features.

Finally, through the combination of multiple levels, finally make a classification at the top level?

The answer is yes, and it is the inspiration for many deep learning algorithms, including CNNs.

Convolutional Neural Networks - Fundamentals of CNN

A typical CNN consists of 3 parts:

  1. convolutional layer
  2. pooling layer
  3. fully connected layer

If you describe it simply:

The convolutional layer is responsible for extracting local features in the image;

The pooling layer is used to greatly reduce the magnitude of parameters (dimension reduction);

The fully connected layer is similar to the part of the traditional neural network and is used to output the desired result.

A typical CNN consists of 3 parts

Convolution - extracting features

The operation process of the convolution layer is as shown in the figure below, and a convolution kernel is used to scan the entire picture:

Convolution layer operation process

We can understand this process as we use a filter (convolution kernel) to filter each small area of ​​the image to obtain the feature values ​​of these small areas.

In specific applications, there are often multiple convolution kernels. It can be considered that each convolution kernel represents an image pattern. If a certain image block has a large value convolved with this convolution kernel, it is considered that the image block Very close to this convolution kernel.

If we design 6 convolution kernels, it can be understood: we think that there are 6 underlying texture modes on this image, that is, we can draw an image with 6 basic modes. The following is an example of 25 different convolution kernels:

25 different convolution kernels

Summary: The convolutional layer extracts local features in the picture through the filtering of the convolution kernel, which is similar to the feature extraction of human vision mentioned above.

Pooling layer (downsampling) - data dimensionality reduction to avoid overfitting

The pooling layer is simply downsampling, which can greatly reduce the dimensionality of the data. The process is as follows:

pooling layer process

In the above picture, we can see that the original image is 20×20, we downsample it, the sampling window is 10×10, and finally downsample it into a 2×2 size feature map.

The reason for this is that even after the convolution is done, the image is still very large (because the convolution kernel is relatively small), so in order to reduce the data dimension, downsampling is performed.

Summary: The pooling layer can reduce the data dimension more effectively than the convolutional layer. This can not only greatly reduce the amount of computation, but also effectively avoid overfitting.

Fully connected layer - output result

This part is the last step. The data processed by the convolutional layer and the pooling layer are input to the fully connected layer to get the final desired result.

Only the fully connected layer can "run" the data that has been dimensionally reduced by the convolutional layer and the pooling layer. Otherwise, the amount of data is too large, the calculation cost is high, and the efficiency is low.

fully connected layer

A typical CNN is not just the 3-layer structure mentioned above, but a multi-layer structure. For example, the structure of LeNet-5 is shown in the following figure:

Convolutional Layer - Pooling Layer - Convolutional Layer - Pooling Layer - Convolutional Layer - Fully Connected Layer

LeNet-5 network structure

After understanding the basic principles of CNN, let's focus on the practical applications of CNN.

What are some practical applications of CNNs?

Convolutional Neural Networks – CNNs are great at processing images. And video is the superposition of images, so it is also good at processing video content. Here are some more mature applications:

Image classification, retrieval

Image classification is a relatively basic application, which can save a lot of labor costs and effectively classify images. For pictures in some specific fields, the classification accuracy can reach 95%+, which is considered a highly usable application.

Typical Scenario: Image Search…

CNN application - image classification, retrieval

Target location detection

Targets can be located in the image, and the position and size of the target can be determined.

Typical scenarios: autonomous driving, security, medical...

CNN Application - Goals

target segmentation

A simple understanding is a pixel-level classification.

He can distinguish between the foreground and the background at the pixel level, and at a higher level, he can identify and classify the target.

Typical scenarios: Meitu Xiuxiu, video post-processing, image generation...

CNN Application - Target Segmentation

face recognition

Face recognition is already a very popular application, and it is widely used in many fields.

Typical scenarios: security, finance, life...

CNN Application - Face Recognition

bone identification

Skeleton recognition can identify the key bones of the body and track the actions of the bones.

Typical scenarios: security, movies, image and video generation, games...

CNN Application - Skeleton Recognition

Summarize

Today we introduced the value, basic principles and application scenarios of CNN. A brief summary is as follows:

Value of CNNs:

  1. Able to effectively reduce the dimensionality of a large amount of data into a small amount of data (without affecting the result)
  2. Ability to preserve the characteristics of pictures, similar to human visual principles

The basic principle of CNN:

  1. Convolutional layer – the main function is to preserve the characteristics of the image
  2. Pooling layer – the main function is to reduce the data dimension, which can effectively avoid overfitting
  3. Fully connected layer – output the results we want according to different tasks

Practical applications of CNNs:

  1. Image classification and retrieval
  2. Target location detection
  3. target segmentation
  4. face recognition
  5. bone identification

Convolutional Neural Networks (CNN) is a type of Feedforward Neural Networks (Feedforward Neural Networks) that includes convolution calculations and has a deep structure. It is one of the representative algorithms for deep learning. Since the convolutional neural network can perform shift-invariant classification, it is also called "Shift-Invariant Artificial Neural Networks (SIANN)".

Research on convolutional neural networks began in the 1980s and 1990s. Time-delay networks and LeNet-5 were the earliest convolutional neural networks; after the 21st century, with the introduction of deep learning theory and numerical calculation With the improvement of equipment, the convolutional neural network has been developed rapidly, and has been widely used in computer vision, natural language processing and other fields.

In deep learning, a convolutional neural network (CNN or ConvNet) is a class of deep neural networks most commonly used to analyze visual images.

CNNs are designed using a variant of the multi-layer perceptron that requires minimal preprocessing. They are also known as shift-invariant or space-invariant artificial neural networks (SIANNs), based on their shared weight architecture and translation invariance characteristics. Convolutional networks are inspired by the biological process of connecting neurons in a pattern similar to the organization of the visual cortex in animals. Individual cortical neurons respond to stimuli only in a restricted area of ​​the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire field of view.

Compared to other image classification algorithms, CNN uses relatively less preprocessing. This means that the network learns the hand-designed filters found in traditional algorithms. This independence from prior knowledge and human effort in feature design is a major advantage.

They can be used in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing.

Guess you like

Origin blog.csdn.net/qq_38998213/article/details/132515652