foreword

[Knowledge Points]
Visual processing technology combined with pattern recognition methods.

[Material preparation]
Prepare video A and video B.
Among them, video A is a video recording of a scenic spot (such as Zhangjiajie, Wuyi Mountain or one of the five famous mountains) downloaded from the Internet.
Video B is a video of an individual walking in front of a white wall or a solid color background.

[Requirements]
(i) Through the threshold method, combined with the OR operation of OPENCV, integrate the personal image of video B (after eliminating the background) into video A to obtain video C. (50%)
(ii) Design an eigenvalue such that the image of an individual can be detected in the image. (20%)
(iii) Use the eigenvalue calculation method of (ii) to find the position of the person in each frame in video C, and draw the approximate position with a rectangle (30%)

1. Video preparation and function definition

import os
import cv2
import matplotlib.pyplot as plt

os.chdir('C:/Users/Bert/PycharmProjects/模式识别与计算机视觉/实验三/video/')

# 定义视频路径
A_video = "A_3.mp4"  # 背景视频
B_video = "B_5.mp4"  # 人物视频
result_video = "A3_mingle_B5.mp4"  # 输出视频

# Define the codec and create VideoWriter object
cap_A = cv2.VideoCapture(A_video)  # 读取视频A
cap_B = cv2.VideoCapture(B_video)  # 读取视频B

fps_video_A = cap_A.get(cv2.CAP_PROP_FPS)  # 获取视频A帧率
fps_video_B = cap_B.get(cv2.CAP_PROP_FPS)  # 获取视频B帧率

fourcc = cv2.VideoWriter_fourcc(*"mp4v")  # 设置写入视频的编码格式

width_A = int(cap_A.get(cv2.CAP_PROP_FRAME_WIDTH))  # 获取视频A宽度
width_B = int(cap_B.get(cv2.CAP_PROP_FRAME_WIDTH))  # 获取视频B宽度

height_A = int(cap_A.get(cv2.CAP_PROP_FRAME_HEIGHT))  # 获取视频A高度
height_B = int(cap_B.get(cv2.CAP_PROP_FRAME_HEIGHT))  # 获取视频B高度

roi_width = int((width_A - width_B) / 2)  # 人物视频在背景视频起始宽度
roi_height = int((height_A - height_B) / 2)  # 人物视频在背景视频起始高度

videoWriter = cv2.VideoWriter(result_video, fourcc, fps_video_A, (width_A, height_A))  # 保存视频

print("视频A的宽度：{}  视频A的高度：{}  视频A的帧率：{}".format(width_A, height_A, fps_video_A))
print("视频B的宽度：{}  视频B的高度：{}  视频B的帧率：{}".format(width_B, height_B, fps_video_B))


## 去除视频的水印
def process_watermarkn(image):
    # 需要注意的是第一个范围是y轴坐标的范围,第二个是x轴坐标的范围
    image[140:220, 0:255] = image[140 - 80:220 - 80, 0:255]
    return image

## 视频融合
def video_mingle(frame_g, frame_m):
    ## 1. 根据背景大小提取感兴趣区域roi
    # 把人物放在背景视频中心位置，提取原图中要放置人物的区域roi
    rows, cols = frame_m.shape[:2]
    roi = frame_g[roi_height:rows + roi_height, roi_width:cols + roi_width]
    ## 2. 创建掩膜mask:用一副二值化图片对另外一幅图片进行局部的遮挡。
    img2gray = cv2.cvtColor(frame_m, cv2.COLOR_BGR2GRAY)  # 将图片灰度化，如果在读取人物时直接灰度化，该步骤可省略
    # cv2.THRESH_BINARY：如果一个像素值低于200，则像素值转换为255（白色色素值），否则转换成0（黑色色素值）
    # 即有内容的地方为黑色0，无内容的地方为白色255.
    # 白色的地方还是白色，除了白色的地方全变成黑色
    ret, mask = cv2.threshold(img2gray, 190, 255, cv2.THRESH_BINARY)  # 阙值操作
    mask_inv = cv2.bitwise_not(mask)  # 与mask颜色相反，白色变成黑色，黑变白
    ## 3. 人物与感兴趣区域roi融合
    # 保留除人物外的背景
    img1_bg = cv2.bitwise_and(roi, roi, mask=mask)
    img2_fg = cv2.bitwise_and(frame_m, frame_m, mask=mask_inv)
    dst = cv2.add(img1_bg, img2_fg)  # 人物与感兴趣区域roi进行融合
    frame_g[roi_height:rows + roi_height, roi_width:cols + roi_width] = dst  # 将融合后的区域放进原图
    img_new = frame_g.copy()  # 对处理后的图像进行拷贝
    return img2gray, mask, mask_inv, roi, img1_bg, img2_fg, dst, img_new


## cv2与matplotlib的图像颜色模式转换，cv2是BGR格式，matplotlib是RGB格式
def img_convert(cv2_img):
    # 灰度图片直接返回
    if len(cv2_img.shape) == 2:
        return cv2_img
    # 3通道的BGR图片
    elif len(cv2_img.shape) == 3 and cv2_img.shape[2] == 3:
        b, g, r = cv2.split(cv2_img)  # 分离原图像通道
        return cv2.merge((r, g, b))  # 合并新的图像通道
    # 4通道的BGR图片
    elif len(cv2_img.shape) == 3 and cv2_img.shape[2] == 4:
        b, g, r, a = cv2.split(cv2_img)
        return cv2.merge((r, g, b, a))
    # 未知图片格式
    else:
        return cv2_img

2. Video frame change process in video fusion and extraction fusion

Extract the first picture to view the video frame change process during fusion, and perform video fusion under the condition that cap_A.isOpened() & cap_B.isOpened() are both True.

frame_id = 0
while cap_A.isOpened() & cap_B.isOpened():
    ret_A, frame_A = cap_A.read()  # 背景视频
    ret_B, frame_B = cap_B.read()  # 人物视频
    if ret_A == True & ret_B == True:
        frame_id += 1
        ## 抽取融合中视频帧变化过程
        if frame_id == 1:
            frame_B = process_watermarkn(frame_B)
            capture_new = video_mingle(frame_A, frame_B)
            titles = ['B', 'B_gray', 'B_mask', 'B_mask_inv', 'roi', 'img1_bg', 'img2_fg', 'dst']
            imgs = [frame_B, capture_new[0], capture_new[1], capture_new[2], capture_new[3], capture_new[4],
                    capture_new[5], capture_new[6]]
            for i in range(len(imgs)):
                plt.subplot(2, 4, i + 1), plt.imshow(img_convert(imgs[i]), 'gray')
                plt.title(titles[i])
                plt.xticks([]), plt.yticks([])
            plt.show()
            plt.close()
            continue
        ## 视频A和视频B融合过程
        else:
            frame_B = process_watermarkn(frame_B)
            img_new_add = video_mingle(frame_A, frame_B)[-1]
            videoWriter.write(img_new_add)
    else:
        break

# Release everything if job is finished
cap_A.release()
cap_B.release()
videoWriter.release()
cv2.destroyAllWindows()

The result is as follows:

Hara A.mp4:
insert image description here

Original B.mp4:
insert image description here

A_mingle_B.mp4:
insert image description here
Video frame change process:
Reference link:
OpenCV_Python official document 7+ - add logo to image by bitwise operation

3. People Image Detection

Perform feature detection on the fused video obtained above, so that personal images can be detected in the image. Find the location of the individual in each frame of the video, using a rectangle to draw the approximate location.

import cv2

#定义视频路径
org_video = "./video/A3_mingle_B5.mp4"
sub_video = "./video/A3_feature_extract_B5.mp4"

# Define the codec and create VideoWriter object
cap = cv2.VideoCapture(org_video)  # 读取视频
fps_video = cap.get(cv2.CAP_PROP_FPS)# 获取视频帧率
fourcc = cv2.VideoWriter_fourcc(*"mp4v")# 设置写入视频的编码格式
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))# 获取视频宽度
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))# 获取视频高度
videoWriter = cv2.VideoWriter(sub_video, fourcc, fps_video, (width, height))#保存视频
print(width,height)


def is_inside(o, i):
    '''
    判断矩形o是不是在i矩形中

    args:
        o：矩形o  (x,y,w,h)
        i：矩形i  (x,y,w,h)
    '''
    ox, oy, ow, oh = o
    ix, iy, iw, ih = i
    return ox > ix and oy > iy and ox + ow < ix + iw and oy + oh < iy + ih


def draw_person(img, person):
    '''
    在img图像上绘制矩形框person

    args:
        img：图像img
        person：人所在的边框位置 (x,y,w,h)
    '''
    x, y, w, h = person
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)


def detect_test(img):
    hog = cv2.HOGDescriptor()
    detector = cv2.HOGDescriptor_getDefaultPeopleDetector()
    hog.setSVMDetector(detector)

    # 多尺度检测，found是一个数组，每一个元素都是对应一个矩形，即检测到的目标框
    found, w = hog.detectMultiScale(img)

    # 过滤一些矩形，如果矩形o在矩形i中，则过滤掉o
    found_filtered = []
    for ri, r in enumerate(found):
        for qi, q in enumerate(found):
            if ri != qi and is_inside(r, q):
                break
        else:
            found_filtered.append(r)

    for person in found_filtered:
        draw_person(img, person)
    return img


while cap.isOpened():
    ret, frame = cap.read()
    if ret == True:
        # HOG：对象检测与模式匹配中是一种常见的特征提取算法，是基于本地像素块进行特征直方图提取的一种算法 + SVM
        # 方向梯度直方图:计算和统计图像局部区域的梯度方向直方图来构成特征.
        frame = detect_test(frame)
        videoWriter.write(frame)
    else:
        break

# Release everything if job is finished
cap.release()
videoWriter.release()
cv2.destroyAllWindows()

The results are as follows:
insert image description here
Reference link: Detailed explanation of HOG features and pedestrian detection

4. Suggestions

1. The final effect of the video is related to the fact that Video B is a video of a person walking in front of a white wall or a solid color background. It is best to choose a video with a clear white background, and the whole person is clearly distinguished from the white background, such as complexion, clothes, etc.

2. The experimental effect can also be enhanced by adjusting the threshold. The specific code is in this step:

ret, mask = cv2.threshold(img2gray, 215, 255, cv2.THRESH_BINARY)  # 阙值操作

The interpretation of the code is: set the points whose gray value is less than 215 in the grayscale image to 0, and the points whose gray value is greater than 215 to be 255. Therefore, you can change the value of 215 in the code to enhance the video effect.

3. Pay attention to the size of the video: the size of the background video must be larger than the size of the character video, regardless of length and width.

4. For human image detection, it is recommended to add the following parameter values to get better results, but it will increase the detection delay:

found, w = hog.detectMultiScale(img)

Such as the following parameter values winStride=(4, 4), padding=(8, 8), scale=1.25, useMeanshiftGrouping=False. For specific usage, you can search for hog.detectMultiScale() usage

Video file and code link:
https://pan.baidu.com/s/1DFDbAK9nnZ3oHn4mLaq7SA
Extraction code: dvei

Opencv - video fusion and human image detection

Article directory

foreword

1. Video preparation and function definition

2. Video frame change process in video fusion and extraction fusion

3. People Image Detection

4. Suggestions

Guess you like