Tensorflow entry and practical study notes (13)-FNN image semantic segmentation

table of Contents

1 Image semantic segmentation

1.1 Application scenarios:

1.1.1 semantic segmentation of Street

1.2 The essence of image semantic segmentation

1.3 Network structure

1.4 Two implementation methods

1.4.1 Upsampling is used

Features:

1.4.2 Input and Output

1.4.3 Full convolution

1.5 Upsampling

1.5.1 Reverse operation

1.5.2 Difference Method

1.5.3 Anti-pond

​1.5.4 Deconvolution (transposed convolution)

2 Jumper structure of image semantic segmentation network FCN

2.1 FCN effect

2.2 Disadvantages of FCN

3 Image Semantic Segmentation Network FCN Code Implementation-Pre-training Network

3.1 Pre-trained network

3.2 Create a sample submodel

3.3 Get the middle layer

3.3.1 on sampling

3.4 Model prediction

3.5 Forecast results

3.6 Drawing loss function

3.7 Images in training


1 Image semantic segmentation

The picture above is an example of semantic segmentation, the goal is to predict the class label of each pixel in the image

 Image semantic segmentation is an important part of image processing and image understanding in computer vision technology

Semantic segmentation For each pixel in the image, we only segment the category , not the entity

1.1 Application scenarios:

1. Get up by autopilot

2. Medical image diagnosis

3. Judgment of UAV landing point

1.1.1 Semantic Segmentation of Street View

1.2 The essence of image semantic segmentation

The goal of semantic segmentation:

Generally, an RGB image (height*width*3) or grayscale image (height*width*1) is used as input, and the output is a segmentation image, in which each pixel contains its category label (height* width *1)

Specifically for such a picture, we have five categories

At present, most of the more successful algorithms in the field of image segmentation come from the same pioneer: Fully Convolutional Network (FCN) proposed by Long et al . , or full convolutional network.
FCN converts the classification network into a network structure for segmentation tasks, and proves that end-to-end network training can be achieved on the segmentation problem. FCN has become the cornerstone of deep learning to solve segmentation problems.

1.3 Network structure

Although the classification network structure can accept pictures of any size as input (cats and dogs) on the surface , due to the existence of the fully connected layer at the end of the network structure, it loses the input spatial information. Therefore, these networks cannot be used directly. To solve dense estimation problems such as segmentation.

So that the network structure can adapt to pixel-level dense estimation tasks . So that the network structure can adapt to pixel-level dense estimation tasks.

1.4 Two implementation methods

  • Use the pre-trained network to write the convolutional layer yourself
  • UNET

Unet is a model born in 2015, it is almost the most widely used model in the current segmentation project.
Unet can learn from fewer training images. When it is trained on a biomedical dataset with less than 40 images, the IOU value can still reach 92%.
 

1.4.1 Upsampling is used

Features:

  • Essentially classification network + upsampling = the original image size
  • In the process of upsampling, the previous information is combined to perform upsampling,
  • It is a fully convolutional network that can input any size network

1.4.2 Input and Output

The input of the network can be a color image of any size; the output is the same size as the input, and the number of channels is: n ( number of target categories) +1 (background)

1.4.3 Full convolution

The purpose of the network in the CNN convolution part is not to be fully connected but replaced with convolution to allow the input picture to be any size exceeding a certain size.

1.5 Upsampling

Since in the convolution process, our feature image becomes very small (for example, the length and width become 1/32 of the original image), in order to get the dense pixel prediction of the original image size, we need to perform upsampling.

1.5.1 Reverse operation

Compared with downsampling, there are three ways that can be easily thought of, which correspond to the reverse use of maximum pooling, average pooling, and convolution operations.

We got our upsampling

  • 1. Interpolation
  • 2. Anti-pond
  • 3. Deconvolution (transposed convolution)
     

1.5.2 Difference Method

Add two weights and divide by two

1.5.3 Anti-pond

Max pooling

Average pooling


1.5.4 Deconvolution (transposed convolution)

Here we use deconvolution, which is very , very important

We have a built-in function, strides is not enlarged by default = 2 is twice

Class autoencoder structure

If we use the autoencoder-like structure shown in the figure below, or directly upsample the last-layer feature map to the original image size, we will lose a lot of details.

2 Jumper structure of image semantic segmentation network FCN

2.1 FCN effect

The image obtained by upsampling the prediction (FCN-32s) of the bottom layer (stride 32) by 2 times, and fusion (adding) the prediction from the poo|4 layer (stride 16). This part of the network is called FCN-16s. This part of the prediction is then up-sampled by 2 times and merged with the prediction from the pool3 layer. This part of the network is called FCN-8s.

When our upsampling times are high, the better the extracted features

The addition of the Skips structure combines the prediction of the last layer ( with richer global information ) and the prediction of the shallower layer ( with more local details ),
so that local predictions can be made while observing global predictions.

2.2 Disadvantages of FCN

  • The results obtained are not detailed enough and not sensitive to details;

  • Did not consider the relationship between pixels, lack of spatial consistency, etc.

3 Image Semantic Segmentation Network FCN Code Implementation-Pre-training Network

3.1 Pre-trained network

Using VGG18, we will do an upsampling of the final output

3.2 Create a sample submodel

3.3 Get the middle layer

3.3.1 Up sampling

3.4 Model prediction

3.5 Forecast results

3.6 Drawing loss function

num = 3
for image,mask in test_dataset.take(1):
    pred_mask = model.predict (image)
    # 取出预测的是哪一个类,像素点1,2,3 具体的分类
    pred_mask = tf.argmax (pred_mask, axis=-1)
    # 扩展维度,取前面所有的维度
    pred_mask = pred_mask[..., tf.newaxis]
    plt.figure (figsize=(10,10))
    for i in range (num):
        plt.subplot (num,3, i*num+1)
        # 原始图像
        plt.imshow (tf. keras. preprocessing. image. array_ to_ img (image[i]))
        # 真实分割图
        plt.subplot (num, 3,i*num+2)
        plt.imshow (tf. keras. preprocessing. image. array_ _to_ img (mask[i]))
        # 预测图
        plt.subplot (num, 3, i*num+3)
        plt.imshow (tf. keras. preprocessing. image. array_ to_ img(pred_ mask[i]))

3.7 Images in training

 

 

Guess you like

Origin blog.csdn.net/qq_37457202/article/details/108020509