Segment anything can be played like this

What can Segment Anything do for us?

Preface

Recently, the popularity of large models is really very high, from chatgpt to segment anything, these things really make me a little confused as a newbie. I really don’t know what to do lately, so I’m thinking about whether I can use large models to do some fun things.

At this time, I suddenly remembered last year's Pure Land. It was indeed a bit rough. Last year, I just extracted the lower edge and added an RGB and it was done. This made me a little unhappy until I saw the chicken dance.
Insert image description here
What does segment anything do? Isn't it just image segmentation? Then can I segment the dancer and then change it to other backgrounds and do it as I say?
Insert image description here

content

The general idea is that we use segment anything to segment the image, and then take out the person's mask separately, and then add the entire background, put the extracted person in, and that's it. The whole process seems quite simple. , let’s see how to do it in detail.

First of all, the first step is to configure the environment of segment anything. Here we do not use the official service provided by segment anything to run, because we do need to run many, so we configure it ourselves. In fact, it is very simple to install the gpu version of pytorch (if the graphics card is slightly inferior, you can use the cpu version), then pull the project from github, install the libraries it requires, and it will be ok. There are many blogs on the Internet that configure segment anything. Here I I won’t go into details.

At this time, when we convert the video into a picture, we can directly use opencv to convert it. The detailed code is in the next section. Then we use segment anything to segment it, and we can get such a mask.
Insert image description here
We can see that the segmentation effect is very nice, but there is a question, how do I separate the person's mask? ? ?

The blogger searched for it, and it seems that someone has already classified it on the basis of segment anything, but it is indeed a bit troublesome. The blogger thought about it based on the mentality of not standing if you can sit, and not sitting if you can lie down, I found it really saves the trouble.

You know, we have Yolo, which has excellent target detection capabilities.
Insert image description here
Then we can directly extract the maximum mask in the detection frame. In this way, we get the mask of the character.
Insert image description here
At this point, the character is extracted. , then we just find a random background and splice it together, and let’s see the specific implementation.

Implementation

Because segment anything and yolov5 have ready-made codes, we will not introduce them. We will only introduce the splicing part.

Splicing code

# -*- codeing = utf-8 -*-
# @Time : 2023/7/5 19:30
# @Author : xiaow
# @File : test.py
# @Software : PyCharm
import os

import cv2

import numpy as np


def mix():
    # 背景图片存放的位置
    backs = os.listdir('../video2img2')
    # 人物掩码的位置
    masks = os.listdir('mask')
    # 0 1 掩码的位置
    mask2s = os.listdir('mask2')
    # 设置输出视频的帧数
    fps = 15
    videopath = 'test10.avi'  # 图片保存地址及格式
    size = (1000, 666)
    out1 = cv2.VideoWriter(videopath, cv2.VideoWriter_fourcc(*'DIVX'), fps, size)

    for i in range(len(backs)):
        print(masks[i])

        back = cv2.imread('../video2img2/' + backs[i])
        # 修改背景尺寸大小
        back = cv2.resize(back, size, interpolation=cv2.INTER_AREA)
        human = cv2.imread('mask/' + masks[i])
        mask2 = cv2.imread('mask2/' + mask2s[i], 0)


        # 设置mask,human大小和背景相同大小  start
        width = back.shape[0]
        height = back.shape[1]
        width_diff = width - human.shape[0]
        height_diff = height - human.shape[1]
        human = np.pad(human, ((width_diff // 2, width_diff // 2), (height_diff // 2, height_diff // 2), (0, 0)))
        mask2 = np.pad(mask2, ((width_diff // 2, width_diff // 2), (height_diff // 2, height_diff // 2)))
        # 设置mask,human大小和背景相同大小  end

        mask2 = np.expand_dims(mask2, 2)
        out1.write((1 - mask2) * back + human)


if __name__ == '__main__':
    mix()

Achievements

Insert image description here
Part of the content is displayed here in the form of gif. You can take a look at the entire video at station b.

segment anything meet the chicken dance

That's it, slip away, slip away

If your rights are infringed, please contact us. Infringement will be deleted.

Guess you like

Origin blog.csdn.net/qq_43627076/article/details/131563586