Convert video to slideshow images: A guide to video material conversion with OpenCV

Video has become one of the important media for disseminating knowledge and information. However, sometimes we need to save video content in a static form, such as converting a video lecture into slides or images for easy sharing, archiving or printing. Fortunately, OpenCV, a powerful computer vision library, provides various techniques and tools that make it possible to convert video materials into PDF images.

In this article, we will explore how to use OpenCV's basic frame difference and statistical background subtraction models, such as KNN (K-Nearest Neighbors) and GMG (Gaussian Mixture-based Background/Foreground Segmentation), to convert video material into corresponding slides . These models are able to extract keyframes in a video and save them as PDF images, making each keyframe a page in a slideshow.

insert image description here
insert image description here

By reading this guide, you will learn about:

What is Background Subtraction

Background subtraction is a commonly used computer vision technique for extracting foreground objects and removing the background from videos. It determines foreground regions by building a model of the background, comparing video frames to this model, and detecting pixels that differ from the background. Background subtraction has a wide range of applications in many fields, such as motion detection, object tracking, video analysis, etc.

insert image description here

Process steps of background subtraction

  1. Build a background model : First, we need to build a background model to represent the static background in the video. This can be obtained by using an initial set of frames or by employing an adaptive method.
  2. Frame difference calculation : After the background model is established, for each frame of video, we compare it with the background model to calculate the pixel-level difference. Common frame difference calculation methods include absolute difference, difference square, and difference threshold.
  3. Foreground detection : According to the result of frame difference calculation, we can perform foreground detection. This usually involves binarizing the difference image to determine the distinction between foreground and background by setting an appropriate threshold.
  4. Foreground optimization : To further optimize the foreground detection results, some image processing techniques can be applied, such as morphological operations (dilation, erosion) to fill holes or remove noise.
  5. Target extraction : On the basis of foreground detection, we can use methods such as contour detection and connected region analysis to extract target objects in the foreground.
  6. Background update : In order to adapt to the dynamic changes in the scene, the background model needs to be updated. Techniques such as moving average method and adaptive background modeling can be used to continuously update the background model.

When using OpenCV for background subtraction, algorithms such as KNN (K-Nearest Neighbors) or GMG (Gaussian Mixture-based Background/Foreground Segmentation) can be used to implement the steps of background modeling and foreground extraction. These algorithms provide corresponding functions and methods in the OpenCV library to facilitate and quickly implement the process of background subtraction.

opencv uses frame difference for background subtraction

Background subtraction is a common computer vision technique that can be used to detect foreground objects in videos. OpenCV provides a variety of methods to implement background subtraction, one of which is the method of using frame difference (Frame Difference).

Frame differencing methods detect the foreground based on the difference between each frame image and the previous frame image. Here are the general steps for frame-differencing background subtraction using OpenCV:

  1. Read video : use OpenCV to load a video file and get each frame of the video.
  2. Grayscale conversion : Convert each frame of image to a grayscale image to simplify the processing steps.
  3. Frame difference calculation : For the current frame and the previous frame, the difference image between them is calculated by subtraction operation. The difference image will highlight foreground objects.
  4. Thresholding : Thresholding the difference image marks pixels with larger differences as foreground (white) and pixels with smaller differences as background (black).
  5. Foreground Optimization : Optimizing foreground detection results by applying morphological operations such as dilation and erosion to fill in foreground regions or remove noise.
  6. Target extraction : According to the foreground image, techniques such as contour detection and connected region analysis can be used to extract the target object in the foreground.

Through the above steps, we can use OpenCV's frame difference method to implement background subtraction and extract foreground objects from the video.

opencv background subtraction technology

OpenCV provides a variety of background subtraction techniques, including KNN, Mixture of Gaussians (MOG v2), and GMG. These algorithms have different principles and calculation formulas for building background models and extracting foreground.

KNN-based background subtraction

The KNN background subtraction algorithm is based on the principle of nearest neighbor classification and is used to determine whether a pixel belongs to the background or the foreground. For each pixel, calculate its difference from its nearest neighbors and classify based on the difference value. The function prototype for creating an object for the KNN background subtraction model in OpenCV is:

cv2.createBackgroundSubtractorKNN([, history[, dist2Threshold[, detectShadows]]])

Assuming that the original image is I, and the position of each pixel is (x, y), the principle of KNN background subtraction can be expressed as the following formula:

  1. Compute the difference between a pixel and its nearest neighbors:

△ I ( x , y ) = ∑ i = 1 N ∣ I ( x , y ) − I i ( x i , y i ) ∣ \triangle I(x,y) = \sum_{i=1}^N|I(x,y)-I_i(x_i,y_i)| I(x,y)=i=1NI(x,y)Ii(xi,yi)

  1. Determine whether a pixel belongs to the background or foreground:

M ( x , y ) = { 1 , i f △ I ( x , y ) < = T   0 , o t h e r w i s e } M(x,y)=\lbrace1,if \triangle I(x,y)<=T \space 0,otherwise \rbrace M(x,y)={ 1,ifI(x,y)<=T 0 , otherwise}

Among them, M(x, y) represents the classification result of pixel (x, y), and T is the threshold.

Mixture of Gaussians (MOG v2)

Mixture of Gaussian (MOG v2) is a background subtraction method based on a Gaussian mixture model. For each pixel, multiple Gaussian distributions are used to model its background value, and a weighted sum is used to determine whether the pixel belongs to the background or the foreground. Its functional prototype is:

cv2.createBackgroundSubtractorMOG2([, history[, varThreshold[, detectShadows]]])

Assuming that the original image is I, and the position of each pixel is (x, y), the principle of Gaussian mixture (MOG v2) can be expressed as the following formula:

  1. Build the background model:

B ( x , y ) = ∑ i = 1 K w i ( x , y ) ⋅ N ( I ( x , y ) ; μ i ( x , y ) , σ i ( x , y ) ) B(x,y)=\sum_{i=1}^Kw_i(x,y) \cdot N(I(x,y);μ_i(x,y),\sigma_i(x,y)) B(x,y)=i=1Kwi(x,y)N(I(x,y);mi(x,y),pi(x,y))

where, B ( x , y ) B(x, y)B(x,y ) means pixel( x , y ) (x, y)(x,y ) , K is the number of Gaussian distributions,wi ( x , y ) w_i(x, y)wi(x,y) μ i ( x , y ) μ_i(x, y) mi(x,y )σ i ( x , y ) σ_i(x, y)pi(x,y ) represent the weight, mean and variance of the i-th Gaussian distribution, respectively.

  1. Determine whether a pixel belongs to the background or foreground:

M ( x , y ) = { 1 , i f 1 K ∑ i = 1 K w i ( x , y ) ⋅ N ( I ( x , y ) ; μ i ( x , y ) , , σ i ( x , y ) ) < = T   0 , o t h e r w i s e } M(x,y)=\lbrace1,if \frac{1}{K}\sum_{i=1}^Kw_i(x,y)\cdot N(I(x,y);μ_i(x,y),,\sigma_i(x,y))<= T\space 0,otherwise \rbrace M(x,y)={ 1,ifK1i=1Kwi(x,y)N(I(x,y);mi(x,y),,pi(x,y))<=T 0 , o th er w i se } Among them, M(x, y) represents the classification result of the pixel (x, y), and
T is the threshold.

GMG background subtraction

GMG (Gaussian Mixture-based Background/Foreground Segmentation) is a background subtraction algorithm based on a Gaussian mixture model, which is used to determine whether a pixel belongs to the background or the foreground. GMG works by modeling a sequence of pixel values ​​and using Bayesian inference to compute the background or foreground probability of a pixel. Its functional prototype is:

cv2.bgsegm.createBackgroundSubtractorGMG([, initializationFrames[,  decisionThreshold]])

Assuming that the original image is I, and the position of each pixel is (x, y), the principle of GMG background subtraction can be expressed as the following formula:

  1. Let's use the equations:
    P ( x , y ) ( I t ) = ∑ c = 1 C wc ( x , y ) ⋅ N ( I t ; µ c ( x , y ) , σ c ( x , y ) . ) P(x,y)(I_t)=\sum_{c=1}^Cw_c(x,y)\cdot N(I_t;μ_c(x,y),\sigma_c(x,y))P(x,y)(It)=c=1Cwc(x,y)N(It;mc(x,y),pc(x,y))

其中, P ( x , y ) ( I t ) P(x, y)(I_t) P(x,y)(It) means pixel( x , y ) (x, y)(x,y ) Pixel value I t I_tat time tItmodel, C is the number of Gaussian distributions, wc ( x , y w_c(x, ywc(x,y )、μ c ( x , y ) μ_c(x, y)mc(x,y )σ c ( x , y ) σ_c(x, y)pc(x,y ) represent the weight, mean and variance of the c-th Gaussian distribution, respectively.

  1. Compute the probability that a pixel belongs to the foreground:

P ( x , y ) ( f o r e g r o u n d ) = 1 − P ( x , y ) ( I t ) ( b a c k g r o u n d ) P ( x , y ) ( I t ) ( b a c k g r o u n d ) + P ( x , y ) ( I t ) ( f o r e g r o u n d ) P(x,y)(foreground)=1-\frac{P(x,y)(I_t)(background)}{P(x,y)(I_t)(background)+P(x,y)(I_t)(foreground)} P(x,y)(foreground)=1P(x,y)(It)(background)+P(x,y)(It)(foreground)P(x,y)(It)(background)

其中, P ( x , y ) ( f o r e g r o u n d ) P(x, y)(foreground) P(x,y ) ( fore g ro u n d ) means pixel( x , y ) (x, y )(x,y ) belongs to the probability of the foreground,P ( x , y ) ( background ) P(x, y)(background)P(x,y ) ( back g ro u n d ) means pixel ( x , y ) (x, y )(x,y ) the probability of belonging to the background.

  1. Determine whether a pixel belongs to the background or foreground:

M ( x , y ) = { 1 , i f P ( x , y ) ( f o r e g r o u n d ) > = T   0 , o t h e r w i s e } M(x,y)=\lbrace1,if P(x,y)(foreground) >=T \space 0,otherwise \rbrace M(x,y)={ 1,ifP(x,y)(foreground)>=T 0 , o th er w i se } Among them, M(x, y) represents the classification result of the pixel (x, y), and
T is the threshold.

Video to Slideshow App Workflow

开始
读取视频帧
应用帧差异方法
前景百分比 > 阈值
保存帧
应用概率背景减法
前景百分比 > 阈值T1
等待运动稳定
前景百分比 < 阈值T2
保存帧
生成幻灯片
保存为PDF图片
结束

Flow Description:

  1. start
  2. Read video frames: Use OpenCV to read video files and get each frame of the video.
  3. Apply frame difference method: For each frame of image, use frame difference technique to perform background subtraction, calculate foreground mask and calculate foreground percentage.
  4. Determine if the foreground percentage exceeds a threshold:
    • If yes, save the frame (representing a static frame with significant motion).
    • If not, continue to the next step.
  5. Apply probabilistic background subtraction: For each frame of image, perform background subtraction using probabilistic background subtraction technique, calculate foreground mask and calculate foreground percentage.
  6. Determine if the foreground percentage exceeds a threshold:
    • If yes, wait for the motion to stabilize and go back to step 5.
    • If not, save the frame (representing motion stabilized frames).
  7. Generate Slideshow: Convert saved frames into a slideshow, including adding titles, text, and more.
  8. Save as PDF Image: Save the generated slideshow as an image in PDF format.
  9. Finish

Implementation code

import cv2
import numpy as np
from PIL import Image
from fpdf import FPDF

# 计算图像的感知哈希值
def calculate_hash(image):
    # 转换为灰度图像并调整大小为8x8像素
    if len(image) ==3:
        image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    image = cv2.resize(image, (8, 8))

    # 计算均值并二值化图像
    mean = np.mean(image)
    _, image = cv2.threshold(image, mean, 255, cv2.THRESH_BINARY)

    # 将图像转换为一维数组
    image = image.flatten()

    return image

# 计算汉明距离
def calculate_hamming_distance(hash1, hash2):
    return np.count_nonzero(hash1 != hash2)

# 判断两个图像是否相似
def is_duplicate(frame1, frame2, threshold):
    hash1 = calculate_hash(frame1)
    hash2 = calculate_hash(frame2)
    distance = calculate_hamming_distance(hash1, hash2)
    print(distance)
    # 如果汉明距离小于阈值,则判定为重复帧
    if distance < threshold:
        return True
    else:
        return False

# 背景减法帧差
def frame_difference(frame, prev_frame):
    # 将帧转换为灰度图像
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    if len(prev_frame.shape) == 2:
        prev_gray = prev_frame
        pass
    else:
        prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

    # 计算帧之间的绝对差异
    diff = cv2.absdiff(gray, prev_gray)

    # 进行形态学操作,去除噪声
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
    diff = cv2.morphologyEx(diff, cv2.MORPH_OPEN, kernel)

    # 计算前景掩码
    _, thresh = cv2.threshold(diff, 30, 255, cv2.THRESH_BINARY)

    return thresh


# 使用OpenCV对背景像素进行统计建模
def background_subtraction(frame, bg_model):
    # 使用背景减法模型传递帧
    fg_mask = bg_model.apply(frame, learningRate=-1)

    # 计算前景百分比
    height, width = fg_mask.shape[:2]
    foreground_pixels = cv2.countNonZero(fg_mask)
    foreground_percent = (foreground_pixels / (width * height)) * 100

    return fg_mask, foreground_percent


# 视频到幻灯片转换处理(包括重复帧检测与删除)
def video_to_slides(video_path, output_path, frame_difference_threshold, bg_subtraction_threshold, duplicate_threshold):
    # 打开视频
    video = cv2.VideoCapture(video_path)

    # 读取第一帧
    _, prev_frame = video.read()
    prev_frame_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

    # 创建概率背景减法模型
    bg_model = cv2.createBackgroundSubtractorMOG2()

    # 幻灯片计数器
    slide_counter = 0

    # 保存非重复帧的列表
    unique_frames = [prev_frame_gray]

    while True:
        # 读取当前帧
        ret, frame = video.read()
        if not ret:
            break

        # 背景减法帧差
        diff_frame = frame_difference(frame, prev_frame)

        # 判断是否保存帧
        if cv2.countNonZero(diff_frame) > frame_difference_threshold:
            cv2.imwrite(f"slide_{
      
      slide_counter}.jpg", frame)
            slide_counter += 1

        # 使用概率背景减法
        fg_mask, foreground_percent = background_subtraction(frame, bg_model)

        # 判断是否保存帧
        if foreground_percent > bg_subtraction_threshold:
            cv2.imwrite(f"slide_{
      
      slide_counter}.jpg", frame)
            slide_counter += 1

        # 更新前一帧
        prev_frame = frame.copy()
        prev_frame_gray = cv2.cvtColor(prev_frame, cv2.COLOR_BGR2GRAY)

        # 判断是否为重复帧并进行删除
        is_duplicate_frame = False
        for unique_frame in unique_frames:
            if is_duplicate(prev_frame_gray, unique_frame, duplicate_threshold):
                is_duplicate_frame = True
                break

        if not is_duplicate_frame:
            unique_frames.append(prev_frame_gray)

    # 生成幻灯片和保存为PDF图片(请使用适当的库实现)
        # 生成幻灯片和保存为PDF图片
    pdf = FPDF()

    # 设置页面尺寸为A4
    pdf.set_auto_page_break(auto=True, margin=15)

    for i in range(slide_counter):
        pdf.add_page()
        pdf.image(f"slide_{
      
      i}.jpg", x=15, y=15, w=pdf.w,h=pdf.h - 100)

        # 保存PDF文件
    pdf.output(output_path)

    # 关闭视频
    video.release()

# 调用函数进行视频到幻灯片转换处理
video_to_slides("input_video.mp4", "output_slides.pdf", 5000, 5000, 7)

Guess you like

Origin blog.csdn.net/weixin_42010722/article/details/131378121