Ng Enda Series Courses - Convolutional Neural Networks (From: NetEase Cloud Classroom)

1. Convolutional Neural Networks

1. Computer Vision

Picture classification, picture recognition: Given 64*64 pictures, the computer judges whether it is a cat

Object detection: detect which objects are in the picture and their location

Style transfer: the image fuses the outline of image 1 and the style of image 2

eg: The feature vector of a picture of size 1000*1000 will reach 1000*1000*3 (RGB 3channels). If a standard fully connected network is used, a huge number of weights will need to be generated. . . So the introduction of convolutional neural network

 

2. Example of edge detection

The convolution operation is the basis of CNN, and edge detection is used as an introductory learning example.

Vertical edge detection: convolution kernel (filter) [1 1 1;0 0 0;-1 -1 -1] flipping is horizontal edge detection

 

Python:cov_forward,tensorflow:tf.nn.cov2d,keras:Conv2D    

 

 

Sobel operator: [1 2 1;0 0 0;-1 -2 -1] Increase the weight of the middle row and increase the robustness

 

3.Padding

Original image n*n, convolution kernel f*f, image after convolution (n-f+1)* (n-f+1)

=> (1) The image is reduced after convolution (2) The pixels at the corners only participate in one convolution, while the pixels in the middle participate in multiple times (loss of information)

=> fill the original image pixels with 0

 

Valid/same convolutions

Valid: not filled

Same: Does not change the output image size. Assuming p pixels are padded, the output is (n+2p-f+1)* (n+2p-f+1)= n*n

P=(f-1)/2

 

f is usually odd

 

4. Convolution stride stride

At this time, the output picture is [(n+2p-f)/s+1]* [(n+2p-f)/s+1]

 

 

5. Convolution on RGB images

(1)

(2) Convolution with multiple filters

n_c is the number of channels, n_c' is the number of filters

 

6. Single-layer convolutional network

 

7. Simple Convolutional Network Example

 

8. Pooling layer (pooling):

Maximum pooling (commonly used super parameters: f=2, s=2), pooling generally does not require padding, and pooling does not require learning parameters.

 

9. Convolutional Neural Network Example:

Convolutional layers and pooling layers are treated as one layer. As the depth of the neural network increases, the height and width decrease (by convolution and pooling) and the number of channels (=number of filters) increases.

 










Fully Connected Layer (FC): A standard neural network where each unit of input and output is connected.

Another common network form: multiple convolutional layers followed by a pooling layer and then multiple convolutional layers followed by a pooling layer.

 

2. Examples of Deep Neural Networks

1. Classic cases : LeNet-5, AlexNet, VGG.

2. ResNet (residual network): long-jump connection, which can build a deeper network and solve the problem of gradient disappearance/gradient explosion.

Inception

3. Transfer Learning : Using datasets that others have already trained on

(1) When you only have a small training set, you can download other people's code+weights (other people's pre-trained weights) for your own initialization, and replace the output layer with your own softmax layer (training the weight of the softmax layer)

(2) When there are many data sets, freeze fewer layers and replace them with their own hidden layers + output layers later

 

3. Target detection

1. Target positioning (the labels output by the neural network need to add four labels in addition to the categories, bx, by, bh, bw)

The loss function is divided into two cases: Y1=1 (Pc=1) and Y1=0 (Pc=0). When Y1=0, only the squared error of Y1 can be considered.

 

2. Feature point detection

Define several feature points on the target object, first manually mark the coordinate positions of these feature points, and then put the marked pictures into the neural network for training, and the output label is the existence of the detection object in the test picture and the coordinates of the feature points. Then, the coordinates of these feature points can be used for facial expression analysis, posture analysis, etc.

 

3. Object Detection

Object detection algorithm based on sliding window.

Cut the original picture into small pictures containing only the target and input it into the network for training, and then slide the test picture with windows of different sizes to detect whether there is a target in each window.

Detection Accuracy & Overhead

 

4. Sliding window implementation of convolution

(1) Turn the fully connected layer into a convolutional layer

(2) Sliding window implementation of convolution

Convolve the entire image to get all predictions at once.

 

1. Bounding box prediction (get more accurate bounding box)

YOLO

 

2. Intersection and Union (IoU): predicted bounding box and actual bounding box/predicted bounding box or actual bounding box

IoU>=0.5 considers the prediction correct.

 

3. Non-maximum suppression

When the mesh is finely divided, the multiple meshes covered by the target will predict and locate the target multiple times. (Each grid feels that there is a target in it), then p_c represents the probability.

4.Anchor boxes

Used to handle the case where the midpoints of two targets fall into the same grid. Set anchor boxes with different shapes according to the two target shapes.

5.YOLO

Set two anchor boxes, train each grid, and output 3*3*16

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325680763&siteId=291194637