Deep Learning Combat 23 (Advanced Edition) - Semantic Segmentation Combat, to achieve the effect of character image matting (computer vision)

Hello everyone, I am Weixue AI, and today I will bring you Deep Learning Practice 23 (Advanced Edition) - Semantic Segmentation Practice, to achieve the effect of character image matting. Semantic segmentation is an important task in computer vision, and its goal is to assign a semantic category label to each pixel in an image. Different from traditional object detection or classification tasks, semantic segmentation not only needs to identify the objects present in the image and their locations, but also requires fine-grained classification of each pixel.

1. Application of Semantic Segmentation in Computer Vision

Semantic segmentation can be used in many applications, such as road perception for autonomous vehicles, tumor segmentation in medical image analysis, pedestrian tracking in video surveillance, and more. Typically, semantic segmentation is done using convolutional neural networks, such as U-Net, FCN, DeepLab, etc. Through the training and optimization of these deep learning models, we can better understand the semantic information in images, and can achieve efficient and accurate semantic segmentation tasks.

2. Application of Semantic Segmentation of Characters

To implement semantic segmentation, we need to use a pre-trained neural network model. I will use the DeepLabV3 model, which is directly available in Pytorch.

The original DeepLab method replaces fully connected layers with learnable parameters based on dilated convolutions to solve the upsampling problem. Compared with ordinary convolution, dilated convolution can increase the receptive field of the convolution kernel, thereby retaining more contextual information. At the same time, using the dilation rate can change the output resolution to a certain extent.

In the traditional convolution operation, each convolution kernel only processes the information of its neighboring pixels, but after using hole convolution , the convolution kernel can "see" more pixels, that is, a larger receptive field, so that it can Better capture of global information in images. At the same time, dilated convolution also increases the effective receptive field size of the convolutional layer, which can avoid the problem of discarding useful information while maintaining resolution.

Hole convolution creation case:

import torch

# 定义空洞卷积层
conv = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, dilation=2)

# 定义输入
input = torch.randn(1, 3, 32, 32)

# 进行空洞卷积操作
output = conv(input)

# 查看输出形状
print(output.shape)

3. DeepLabV3 model

In DeepLabV3, the ASPP module is used. ASPP captures information of various scales in the image by setting different sampling rates in the hole convolution. This multi-scale information acquisition method can help the model better capture the outline and context information of objects of different sizes. Finally, these parallel branches are fused with average pooling and 1x1 convolution, and upsampled to obtain pixel-level segmentation results.

 DeepLabV3 model training process:
1. Data preparation: Prepare an image dataset with pixel-level annotations. For example, datasets such as PASCAL VOC, Cityscapes or COCO. Each pixel needs to have a corresponding label indicating which category the pixel belongs to.
2. Data enhancement: Enhance the training data by image rotation, scaling, flipping, etc. to increase the diversity of training data and improve the generalization ability of the model.
3. Network construction : DeepLabV3 includes a convolutional neural network for feature extraction (such as ResNet, Xception, etc.) and a module called ASPP (Atrous Spatial Pyramid Pooling, empty space pyramid pooling). The ASPP module contains multiple dilated convolutional layers with different sampling rates to capture information at different scales. These parallel branches are finally fused by a global average pooling and a 1x1 convolutional layer.
4. Loss function: The cross-entropy loss is usually used to measure the difference between the model's predicted results and the real results. The weights of the model are updated by computing the cross-entropy loss between the probability distribution of each pixel's predicted class and the probability distribution of the true label.
5. Optimization algorithm : choose an optimizer (such as SGD, Adam, etc.) to minimize the loss function. By continuously feeding in images, the model performs forward propagation and calculates the loss, and then backpropagates to update the weights.
6. Model training : Repeat the iterative optimization steps until a certain convergence condition is reached, such as fixed period, stable loss, etc.
7. Model evaluation and verification : Evaluate the performance of the model on the verification set and test set, and adjust hyperparameters, network structure, etc. as needed. 

4. Code implementation

import torch
import torchvision
import numpy as np
from PIL import Image
from torchvision import transforms


def segment_person(image_path, output_path):
    # 加载预训练的DeepLabV3模型
    model = torchvision.models.segmentation.deeplabv3_resnet101(pretrained=True)
    model.eval()

    # 读取图片并转换
    input_image = Image.open(image_path).convert("RGB")
    preprocess = transforms.Compose([
        transforms.Resize((256, 256)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ])
    input_tensor = preprocess(input_image)
    input_batch = input_tensor.unsqueeze(0)

    #if torch.cuda.is_available():
    input_batch = input_batch.to('cpu')
    model.to('cpu')

    with torch.no_grad():
        output = model(input_batch)['out'][0]
    output = torch.argmax(output, dim=0).byte().cpu().numpy()

    # 人物语义分割标签 (在PASCAL VOC数据集中,人物用标签15表示)
    output_person = (output == 15)

    # 应用掩码
    mask = output_person.astype(np.uint8) * 255
    mask = Image.fromarray(mask)
    masked_image = Image.composite(input_image.resize((256, 256)), Image.new('RGB', mask.size), mask)

    masked_image.save(output_path)

# 使用方法
input_image_path = "111.png"
output_image_path = "222.png"
segment_person(input_image_path, output_image_path)

Running result: we input 111.png picture and output 222.png picture

The girl in the picture is generated by AI. Those who are interested in AI-generated pictures can also pay attention to:
Deep learning practice 9-text generation image-local computer to achieve text2img.

Past works:

 Deep Learning Practical Projects

1. Deep learning practice 1-(keras framework) enterprise data analysis and prediction

2. Deep learning practice 2-(keras framework) enterprise credit rating and prediction

3. Deep Learning Practice 3-Text Convolutional Neural Network (TextCNN) News Text Classification

4. Deep Learning Combat 4 - Convolutional Neural Network (DenseNet) Mathematical Graphics Recognition + Topic Pattern Recognition

5. Deep Learning Practice 5-Convolutional Neural Network (CNN) Chinese OCR Recognition Project

6. Deep Learning Combat 6- Convolutional Neural Network (Pytorch) + Cluster Analysis to Realize Air Quality and Weather Prediction

7. Deep learning practice 7-Sentiment analysis of e-commerce product reviews

8. Deep Learning Combat 8-Life Photo Transformation Comic Photo Application

9. Deep learning practice 9-text generation image-local computer realizes text2img

10. Deep learning practice 10-mathematical formula recognition-converting pictures to Latex (img2Latex)

11. Deep learning practice 11 (advanced version) - fine-tuning application of BERT model - text classification case

12. Deep Learning Practice 12 (Advanced Edition) - Using Dewarp to Correct Text Distortion

13. Deep learning practice 13 (advanced version) - text error correction function, good luck for friends who often write typos

14. Deep learning practice 14 (advanced version) - handwritten text OCR recognition, handwritten notes can also be recognized

15. Deep Learning Combat 15 (Advanced Edition) - Let the machine do reading comprehension + you can become a question maker and ask questions

16. Deep learning practice 16 (advanced version) - virtual screenshot recognition text - can do paper contract and form recognition

17. Deep Learning Practice 17 (Advanced Edition) - Construction and Development Case of Intelligent Assistant Editing Platform System

18. Deep Learning Combat 18 (Advanced Edition) - 15 tasks of NLP fusion system, which can realize the NLP tasks you can think of on the market

19. Deep Learning Combat 19 (Advanced Edition) - SpeakGPT's local implementation deployment test, based on ChatGPT to implement SpeakGPT function on your own platform

20. Deep Learning Combat 20 (Advanced Edition) - File Intelligent Search System, which can search for keywords based on file content and quickly find files

21. Deep Learning Practice 21 (Advanced Edition)-AI Entity Encyclopedia Search, an encyclopedia that can be searched for any noun

22. Deep learning practice 22 (advanced version)-AI comic video generation model, make your own comic video

...(pending upgrade)

Guess you like

Origin blog.csdn.net/weixin_42878111/article/details/130015593
Recommended