How tf.nn.conv2d implements convolution

Reference: xf_mao  thunder faintly

Experimental environment: tensorflow version 1.6.0, python3.5
————————————————————————————————————————————————————————

introduce:

Intuitively, the process of convolution is equivalent to "condensing" the picture. Of course, during the condensing process, the thickness can be changed.


In the above picture, the black board is the input image, the orange board is the image after convolution "condensation", and the small green block is the convolution kernel, which is the "rag". Of course, if the rag can do one stop at a time and use the "SAME" mode, then the size of the concentrated will not change. It is worth noting that the thickness of the result after convolution is specified by the user. As for the meaning of this thickness, understand: each layer of thickness is similar to a sine wave of a certain frequency and amplitude, and multiple different sine waves In theory, the wave can fit any complex waveform, so a variety of different convolution results can theoretically fit more complex feature extraction schemes.

————————————————————————————————————————————————————

function:

conv2d(input,
       filter,
       strides,
       padding,
       use_cudnn_on_gpu=True,
       data_format="NHWC",
       dilations=[1, 1, 1, 1],
       name=None)

Remove the name parameter to specify the name of the operation. There are a total of seven parameters related to the method:

  • input: 
    refers to the input image that needs to be convolved. It requires a Tensor with a shape of [batch, in_height, in_width, in_channels]. The specific meaning is [the number of images in a batch during training, image height, image width, image Number of channels], note that this is a 4-dimensional Tensor, which requires one of float32 and float64 type

  • filter: 
    equivalent to the convolution kernel in CNN, it requires a Tensor with a shape of [filter_height, filter_width, in_channels, out_channels], the specific meaning is [the height of the convolution kernel, the width of the convolution kernel, the number of image channels , the number of convolution kernels], the type is required to be the same as the parameter input, there is one place to pay attention to, the third dimension in_channels is the fourth dimension of the parameter input

  • strides: strides in each dimension of the image during convolution, this is a one-dimensional vector with a length of 4

  • padding: 
    The amount of string type, which can only be one of "SAME" and "VALID". This value determines the different convolution methods. It is the processing method of the convolution kernel at the edge, which is more complicated to describe. A picture is more intuitive, directly above: it can be seen that the VALID mode adopts the method of "less than (that is, not considered)" at the edge, while SAME It is a "pass" strategy, and the effect of the two depends on whether your data has any important information at the edge.

  • use_cudnn_on_gpu: 
    bool type, whether to use cudnn acceleration, the default is true

  • data_format: string type, can only be "NHWC", "NCHW". Defaults to "NHWC". Specifies the data format for input and output data. According to the default format "NHWC", the data is stored in the following order:   [batch, height, width, channel]. Alternatively, the format is "NCHW" and the data storage order is: [batch, channel, height, width]. (translated from function declaration comments) 

  • dilations: An optional list of `ints`. Defaults to [1,1,1,1]. 1D length tensor 4. The inflation factor for each `input` dimension. If set to k > 1, there will be k-1 skipped cells between each filter element on that dimension . The dimension order is determined by the value of `data_format`, see above. Batch and depth dimensions must be 1 in Dilations ( translated from function declaration comments ) 

The result returns a Tensor, and this output is what we often call the feature map

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324929281&siteId=291194637