A Beginner's Guide To Understanding Convolutional Neural Networks Part One 笔记

Original link: https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/

Take this article for a preliminary understanding of Convolutional Neural Networks (Convolutional Nerual Networks)

 

Image Classification

  Image classification(图像分类) is the task of taking an input image and outputting a class(a dog, a cat, ect.) or a probablity of classes that best describes the image.

Inputs and Outputs

  When a computer sees an image, it will see an array of pixel values, e.g. 32*32*3, RGB(red,green,blue) values.

  /****Replenish****/

  Single-channel image: commonly known as grayscale image, each pixel can only have one value to represent the color, and the pixel value is between 0-255 (0 is black, 255 is white, and the intermediate values ​​are some different levels of gray).

  Three-channel map (RGB): Each pixel has three values, and the channel values ​​of red, green, and blue are changed and superimposed between them to get a variety of colors. A three-channel grayscale image means that all three channels have the same value.

Biological Connection

  Some neurons only respond to edges in specific orientations, some neurons only respond vertically, some only horizontally, etc. These neurons are organized in a columnar organization (photoreceptors in the human eye: columns, which have an overall perception of things) and are the basis of convolutional neural networks.

First Layer - Math Part(Convolutional Layer aka conv layer)

  

  The filter(or a neuron神经元/kernel) has an array of numbers,called weights or parameters. The filter is convolving, next step(stride) is moving to the right by 1 unit.

  The depth of this filter has to be the same as the depth of the input, so the filter is 5*5*3. If we use two filters(5*5*3), the output would be 28*28*2.

First Layer - High Level Perspective

  Each of these filters can be thought of as feature identifiers(straight edges, colors, curves ect.).

  E.g. a curve detector

  The filter will have a pixel structure in which there will be higher numerical values along the area that is a shape of a curve.

  

  So we take this image as example.

  

  (It can be seen that the first image has a high degree of matching, and the second image has a low degree of matching)

Going Deeper Through the Network

  A classic CNN architecture would look like this:

  Input -> Conv -> ReLU -> Conv -> ReLU -> Pool -> ReLU -> Conv -> ReLU -> Pool -> Fully Connected Layer

  (ReLU: activation function, Pool: pooling layer)

 

  There're other layers that are interspersed(点缀,散布) between these conv layers, they provide nonlinearities (ReLU) and preservation(维度保护) of dimension(Pool) that help to improve the robustness(鲁棒性) of the network and control overfitting.

   As you go through more and more conv layers,(i).you get activation maps that represent more and more complex features;(ii).the filters begin to have a larger and larger receptive field.

Fully Connected Layer(FC)

  The fully connected layer acts as a classifier in the entire network and can be implemented by convolution.

  目前全连接由于参数冗余(仅全连接层参数就可占整个网络参数80%左右),近期有使用全局平均池化(global average pooling,GAP),通常有较好的预测性能。

 

 

 

 

  

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324771246&siteId=291194637