03 Convolution operation picture

1. Mean filtering

# Convolution operation
# Input image. input, must be 4-dimensional tensor (number of images, image height, image width, image channel number) #
filters, convolution kernel, must be 4-dimensional tensor (height of convolution kernel and width, the number of channels of the input image, the number of convolution kernels)
# strides, step size, the moving step size of the convolution kernel in each dimension of the image, (1, 1, 1, 1)
# padding, 0 padding , 'Valid' and 'Same', valid means no padding, same means the size of the input image and the output image are consistent.
# Input data format: data_format 'NHWC'
# tf.nn.conv2d()

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf


moon = plt.imread('/newdisk/darren_pty/CNN/moonlanding.png')
print(moon.shape)

plt.figure(figsize=(10, 8))
plt.imshow(moon, cmap='gray')

plt.show()

plt.imshow is a function used to display images. It can convert an array or matrix into an image and display it.

plt.show is the function used to display the image, it will open a window to display the current image, only when this function is called, the image will actually be displayed.

Simply put, plt.imshow turns data into images, and plt.show displays images .

plt.figure("Image") creates an image window

Mean filtering:

np.array([[1/9, 1/9, 1/9], [1/9, 1/9, 1/9], [1/9, 1/9, 1/9]]) // 3x3 matrix

Use such a matrix to perform a smoothing operation that reduces noise or detail in an image by replacing each pixel's value with the average of its surrounding pixel values. This operation is called mean filtering.

np.array([[1/9, 1/9, 1/9], [1/9, 1/9, 1/9], [1/9, 1/9, 1/9]]).reshape(3, 3, 1, 1)

Reshape this matrix into a 4-dimensional array of shape (3, 3, 1, 1).

# 均值滤波
input_img = tf.constant(moon.reshape(1, 474, 630, 1), dtype=tf.float32)
filters = tf.constant(np.array([[1/9, 1/9, 1/9], [1/9, 1/9, 1/9], [1/9, 1/9, 1/9]]).reshape(3, 3, 1, 1), dtype=tf.float32)
strides = [1, 1, 1, 1]
conv2d = tf.nn.conv2d(input=input_img, filters=filters, strides=strides, padding='SAME')
plt.figure(figsize=(10, 8))

# 4维图片转为2维图像
plt.imshow(conv2d.numpy().reshape(474, 630), cmap='gray') 

 

`tf.constant` is a function in TensorFlow that creates a constant tensor. In TensorFlow, tensors are multi-dimensional arrays that can contain scalars, vectors, matrices, etc.

The basic syntax of `tf.constant` is as follows:

tf.constant(value, dtype=None, shape=None, name='Const', verify_shape=False)

Parameter Description:

- `value`: The value of the constant tensor to be created. Can be scalars, lists, NumPy arrays or other TensorFlow tensors in Python.
- `dtype`: Optional parameter specifying the data type of the constant. For example, `tf.float32` represents a floating-point number type, and the default is `tf.float32`.
- `shape`: Optional parameter specifying the shape of the constant tensor. If not specified, it is automatically determined from the shape of `value`.
- `name`: optional parameter, specify a name for the constant tensor.
- `verify_shape`: Optional parameter, if True, will check whether `value` has a shape matching the specified `shape`. The default is False.

Here are some examples:

import tensorflow as tf

# 创建一个标量常量
scalar_constant = tf.constant(5)

# 创建一个形状为 (2, 3) 的常量张量
matrix_constant = tf.constant([[1, 2, 3], [4, 5, 6]])

# 创建一个形状为 (3, 2) 的常量张量,并指定数据类型为 float32
float_matrix_constant = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]], dtype=tf.float32)

`tf.constant` is used to create immutable tensors, i.e. their values ​​cannot be changed after creation. If you need a variable tensor, you can use other TensorFlow operations, such as `tf.Variable`.

2. Gaussian filter

The convolution kernel of Gaussian filter has the following characteristics:

1. **The center point has the highest weight**: The center point of the Gaussian filter kernel has the highest weight, while the surrounding weights gradually decrease. This is because the Gaussian distribution has a peak at the center point and gradually decreases in weight the farther away from the center point.

2. **Symmetry**: The Gaussian filter kernel is usually symmetrical, that is, with the center point as the axis of symmetry, the weights on the left and right or up and down are equal. This ensures that the smoothing operation is uniform and does not introduce shifting or stretching of the image.

3. **The sum of weights is 1**: The sum of all weights of the Gaussian filter kernel is always 1. This ensures that the brightness of the image does not change significantly during the filtering process, as they are all weighted averages.

4. **Standard deviation controls the degree of smoothness**: The smoothness of the Gaussian filter kernel is controlled by the standard deviation (σ, sigma) parameter. A smaller standard deviation produces less smoothing, while a larger standard deviation produces greater smoothing. The larger the standard deviation, the wider the distribution of weights, resulting in a greater degree of smoothness.

5. **Kernel size**: The size of the Gaussian filter kernel is usually an odd number, such as 3x3, 5x5, etc. The size of the kernel determines the degree of smoothing, with larger kernels producing stronger smoothing effects.

The shape and weight distribution of the Gaussian filter kernel enable it to effectively remove high-frequency noise in the image, smooth the image, and maintain the overall structure of the image. This makes it one of the commonly used filtering methods in image processing and computer vision, especially in pre-processing steps to reduce noise to improve the performance of subsequent processing steps.

# 高斯滤波
input_img = tf.constant(moon.reshape(1, 474, 630, 1), dtype=tf.float32)
filters = tf.constant(np.array([[1/9, 2/9, 1/9], [2/9, 3/9, 2/9], [1/9, 2/9, 1/9]]).reshape(3, 3, 1, 1), dtype=tf.float32)
strides = [1, 1, 1, 1]
conv2d = tf.nn.conv2d(input=input_img, filters=filters, strides=strides, padding='SAME')
plt.figure(figsize=(10, 8))
plt.imshow(conv2d.numpy().reshape(474, 630), cmap='gray')

3. Edge detection

np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]]) is used for image edge detection

This specific convolution kernel can be used to detect edge features in images. It works like this:

  • The center pixel (-4) has the highest negative weight, indicating that it is very sensitive to differences in surrounding pixels .
  • The adjacent pixels (1) above, below, left, and right have positive weights, indicating their influence on the central pixel.
  • The four diagonally adjacent pixels (0) have no effect.
  • By sliding this kernel over the image and performing a convolution operation, you can highlight the edge features in the image, because edges are usually sharp changes in pixel values. This convolution kernel can also be used for image sharpening to enhance edge features in the image.
cat = plt.imread('cat.jpg')
plt.figure(figsize=(10, 8))
plt.imshow(cat)

# 把猫变成黑白图片. 
cat = cat.mean(axis=2)
plt.figure(figsize=(10, 8))

#不加gray,图像将会被上色
plt.imshow(cat, cmap='gray')

 `cat.mean(axis=2)` is a NumPy array operation used to calculate the mean along a specified axis. Let us explain what this operation means:

Suppose `cat` is a NumPy array with shape `(height, width, channels)`, where: -
`height` represents the height of the image (in vertical pixels).
- `width` represents the width of the image (number of horizontal pixels).
- `channels` indicates the number of channels of the image, usually 3 (representing red, green, and blue channels).

`axis=2` means you are averaging along the third dimension (i.e. the channel dimension). In this context, `cat.mean(axis=2)` will return a new NumPy array of shape `(height, width)`, where each element represents the mean of the pixel channel at the corresponding position value .

This operation is usually used to convert a color image to a grayscale image, because it will convert the color image to a grayscale image by taking the average value of the color channel of each pixel as the grayscale value of the pixel.

#Without gray, the image will be colored

# 边缘检测
input_img = tf.constant(cat.reshape(1, 456, 730, 1), dtype=tf.float32)
filters = tf.constant(np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]]).reshape(3, 3, 1, 1), dtype=tf.float32)
strides = [1, 1, 1, 1]
conv2d = tf.nn.conv2d(input=input_img, filters=filters, strides=strides, padding='SAME')
plt.figure(figsize=(10, 8))
plt.imshow(conv2d.numpy().reshape(456, 730), cmap='gray')
plt.show()

Usually, in neural networks, the convolution kernel value is calculated using backpropagation

4. Sharpening

# 锐化
input_img = tf.constant(cat.reshape(1, 456, 730, 1), dtype=tf.float32)
filters = tf.constant(np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]]).reshape(3, 3, 1, 1), dtype=tf.float32)
strides = [1, 1, 1, 1]
conv2d = tf.nn.conv2d(input=input_img, filters=filters, strides=strides, padding='SAME')
plt.figure(figsize=(10, 8))
plt.imshow(conv2d.numpy().reshape(456, 730), cmap='gray')

5. Convolution of color pictures

The color picture has three channels, and each channel is treated as a picture.

euro.reshape(1, 582, 1024, 3).transpose([3, 1, 2, 0]) //Modify the dimensions without changing the image

euro = plt.imread('./欧式.jpg')
plt.figure(figsize=(10, 8))
plt.imshow(euro)
print(euro.shape) #形状

# 对彩色图片进行卷积操作.
# 把彩色图片的每个通道当成一张图
input_img = tf.constant(euro.reshape(1, 582, 1024, 3).transpose([3, 1, 2, 0])), dtype=tf.float32)
filters = tf.constant(np.array([[1/9, 1/9, 1/9], [1/9, 1/9, 1/9], [1/9, 1/9, 1/9]]).reshape(3, 3, 1, 1), dtype=tf.float32)
strides = [1, 1, 1, 1]
conv2d = tf.nn.conv2d(input=input_img, filters=filters, strides=strides, padding='SAME')
plt.figure(figsize=(10, 8))
plt.imshow(conv2d.numpy().reshape(3, 582, 1024).transpose([1, 2, 0]) / 255.0)

Guess you like

Origin blog.csdn.net/peng_258/article/details/132730899