Quick Start with OpenCV: Moving Object Detection and Target Tracking

Article directory


Preface

In today's digital world, computer vision technology is developing rapidly and is widely used in various situations. Especially in the field of moving object detection and target tracking, this technology is not only crucial for safety monitoring systems, but also plays an important role in many fields such as autonomous driving, interactive media, and robotics.

This article will introduce the basic knowledge of moving object detection and target tracking using OpenCV, including the principles, formulas and actual code implementation of various algorithms. We will start with the basic concepts of moving object detection and dive into different types of object tracking methods, such as template, feature, density, model and learning based tracking techniques. Through this article, we can not only understand the theoretical basis of these technologies, but also learn how to apply these technologies in actual projects through the provided code examples.
OpenCV Logo


1. Introduction to moving object detection and target tracking

1.1 Basic concepts of moving object detection

Moving object detection refers to identifying and locating dynamically changing objects in video sequences. This process usually includes the following steps:

  1. Background Modeling: Identify static backgrounds in videos. This is done by analyzing a series of frames, aiming to find out which parts are static.

  2. Foreground Detection: The algorithm will identify parts of the background that do not match the model, these are usually moving objects.

  3. Data processing: Remove noise through filtering and threshold processing to accurately extract information about moving objects.

1.2 Types of moving object detection algorithms

  1. Background subtraction based method: This is the most intuitive method that detects moving objects by subtracting the background frame from the current frame. This requires the background to be static or have a good background update mechanism.

  2. Optical flow method: Optical flow refers to the movement pattern of the object surface in the image sequence. By analyzing these pattern changes, the movement of the object can be inferred.

  3. Frame difference based method: This method detects motion by comparing the differences between consecutive frames. It is particularly effective for fast-moving objects, but may not detect slow-moving objects.

  4. Machine learning-based methods: These methods use training data to identify moving objects. For example, models trained using deep learning algorithms can effectively detect and classify objects in complex environments.

The main task of object tracking is to identify and track specific objects in consecutive video frames. This technology is widely used in security monitoring, human-computer interaction, autonomous driving and other fields. In this part, we will explore the basic concepts of object tracking and different types of tracking algorithms.

1.3 Basic concepts of target tracking

The target tracking process usually includes two main steps: target detection and target localization. First, the object of interest is identified in the first frame or initial frames of the video, a step called object detection. Next, the system needs to locate the target in subsequent video frames, even if it moves or changes in form. This step is called target positioning.

During the target tracking process, the algorithm needs to deal with various challenges, such as rapid movement of the target, occlusion, illumination changes, scale changes, etc. Effective target tracking algorithms can track targets stably despite these challenges.

1.4 Types of target tracking algorithms

  1. Template-based tracking: This type of algorithm uses the initial appearance of the object as a template and searches for the best matching region in subsequent frames. This method is simple and intuitive, but does not work well when the target appearance changes significantly.

  2. Feature-based tracking: This method relies on detecting and tracking key features of the target (such as edges, corners, etc.). Feature-based tracking can handle certain appearance changes, but is sensitive to occlusion and illumination changes in complex scenes.

  3. Density-based tracking: This method tracks objects by estimating motion at the pixel level (such as optical flow methods). It has good adaptability to fast motion and local occlusion, but the computational cost is high.

  4. Model-based tracking: This type of algorithm builds a 3D model of the target and attempts to match that model in every frame. It is very effective in handling complex shapes and motions, but requires high computational resources and accurate initial models.

  5. Learning-based tracking: In recent years, with the development of machine learning, especially deep learning, learning-based tracking methods have made significant progress. This type of algorithm automatically learns how to effectively track targets by training neural networks and can handle a variety of complex scenarios and challenges.

Each algorithm has its advantages and limitations, and in practical applications, the choice of which tracking algorithm is usually determined by the needs of the specific task and the available computing resources. Through OpenCV, we can implement these different types of tracking algorithms and apply them to actual target tracking tasks.

2. Difference method to detect moving objects

The difference method is a simple and effective moving object detection technology suitable for monitoring and real-time tracking systems. The core idea is to identify moving objects by comparing the differences between consecutive video frames.

2.1 Principle of difference method

The basic principle of the difference method is to compare the pixel differences between two or more consecutive images. For static backgrounds, the difference between adjacent frames is small, while for moving objects, the pixel values ​​between adjacent frames will have larger differences due to changes in their positions.

2.2 Difference method formula

I ( x , y , t ) I(x, y, t) I(x,y,t)的在时间 t t t moment, the image is at position ( x , y ) (x, y) (x,The pixel value of y). The difference method detects moving objects by calculating the difference between two adjacent frames:

D ( x , y , t ) = ∣ I ( x , y , t ) − I ( x , y , t − 1 ) ∣ D(x, y, t) = |I(x, y, t) - I(x, y, t-1)| D(x,y,t)=I(x,y,t)I(x,y,t1)

其中, D ( x , y , t ) D(x, y, t) D(x,y,t)Display time t t t与时刻 t − 1 t-1 t1's position ( x , y ) (x, y) (x,y) image element difference.

2.3 Code implementation

2.3.1 Video or camera detection of moving objects

Here is a simple example of implementing the difference method:

import cv2

# # 初始化摄像头
# cap = cv2.VideoCapture(0)
# 读取视频
cap = cv2.VideoCapture('video.mp4')

# 读取第一帧
ret, frame1 = cap.read()
gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

# 定义矩形结构元素
rectangle_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))

while True:
    # 读取下一帧
    ret, frame2 = cap.read()
    if not ret:
        break  # 如果视频结束,跳出循环

    gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    # 计算两帧的差异
    diff = cv2.absdiff(gray1, gray2)

    # 二值化以突出差异
    _, thresh = cv2.threshold(diff, 30, 255, cv2.THRESH_BINARY)
    thresh = cv2.dilate(thresh, rectangle_kernel, iterations=2)  # 膨胀操作,使轮廓更清晰

    # 找出轮廓
    _, contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # 识别面积最大的轮廓
    if contours:
        largest_contour = max(contours, key=cv2.contourArea)
        x, y, w, h = cv2.boundingRect(largest_contour)
        cv2.rectangle(frame2, (x, y), (x + w, y + h), (0, 255, 0), 2)  # 用绿色矩形框出

    # 显示结果
    thresh_img = cv2.merge([thresh, thresh, thresh])
    cv2.imshow('Difference', cv2.hconcat([frame2, thresh_img]))
    # 准备下一次迭代
    gray1 = gray2

    # 按'q'退出
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放资源
cap.release()
cv2.destroyAllWindows()

This code first initializes the camera and then reads each frame of image in a loop. Moving objects are detected by calculating the difference in grayscale images between two consecutive frames and highlighting these differences through thresholding.

Difference1
Difference2

2.3.2 Moving object detection generated by random animation

animation generation codeAnimation.py

import cv2
import numpy as np
import random

class Animation:
    def __init__(self, width=800, height=800, num_shapes=10):
        self.width, self.height = width, height
        self.canvas = np.zeros((height, width, 3), dtype=np.uint8)
        self.shapes = [self.Shape(width, height) for _ in range(num_shapes)]
        self.running = False

    class Shape:
        def __init__(self, width, height):
            self.type = random.choice(["rectangle", "circle", "ellipse"])
            self.color = tuple(np.random.randint(0, 255, (3,)).tolist())
            self.center = np.random.randint(0, min(width, height), (2,))
            self.size = np.random.randint(10, 50)
            self.velocity = np.random.randint(-5, 5, (2,))
            self.width = width
            self.height = height

        def move(self):
            self.center += self.velocity
            for i in range(2):
                if self.center[i] < 0 or self.center[i] > (self.width if i == 0 else self.height):
                    self.velocity[i] *= -1
                    self.center[i] += self.velocity[i]

        def draw(self, canvas):
            if self.type == "rectangle":
                top_left = (self.center - self.size).astype(int)
                bottom_right = (self.center + self.size).astype(int)
                cv2.rectangle(canvas, tuple(top_left), tuple(bottom_right), self.color, -1)
            elif self.type == "circle":
                cv2.circle(canvas, tuple(self.center), self.size, self.color, -1)
            else:  # ellipse
                axes = (self.size, self.size // 2)
                cv2.ellipse(canvas, tuple(self.center), axes, 0, 0, 360, self.color, -1)

    def start(self):
        self.running = True
        while self.running:
            self.canvas[:] = 0
            for shape in self.shapes:
                shape.move()
                shape.draw(self.canvas)
            cv2.imshow("Animation", self.canvas)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                self.stop()

    def stop(self):
        self.running = False
        cv2.destroyAllWindows()

    def get_frame(self):
        self.canvas[:] = 0
        for shape in self.shapes:
            shape.move()
            shape.draw(self.canvas)
        return self.canvas.copy()

# 使用方法:
# animation = Animation()
# animation.start()         # 开始动画
# animation.get_frame()     # 获取一帧画面

Moving object detection code

import cv2
import Animation

animation = Animation.Animation(500, 400, 10)
frame1 = animation.get_frame()
gray1 = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)

# 定义矩形结构元素
rectangle_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))

while True:
    # 读取下一帧
    frame2 = animation.get_frame()

    gray2 = cv2.cvtColor(frame2, cv2.COLOR_BGR2GRAY)

    # 计算两帧的差异
    diff = cv2.absdiff(gray1, gray2)

    # 二值化以突出差异
    _, thresh = cv2.threshold(diff, 30, 255, cv2.THRESH_BINARY)
    # 闭运算操作
    thresh = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, rectangle_kernel,iterations=2)

    # 找出轮廓
    _, contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    # 识别面积最大的轮廓
    for contour in contours:
        x, y, w, h = cv2.boundingRect(contour)
        cv2.rectangle(frame2, (x, y), (x + w, y + h), (0, 255, 0), 2)  # 用绿色矩形框出

    # 显示结果
    thresh_img = cv2.merge([thresh, thresh, thresh])
    cv2.imshow('Difference', cv2.hconcat([frame2, thresh_img]))
    # 准备下一次迭代
    gray1 = gray2

    # 按'q'退出
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

Difference3

3. Template-based tracking

Template-based tracking is a simple and intuitive method of object tracking. In this method, we use the initial appearance of the object as a template and then search for the region that best matches this template in subsequent frames of the video. The key to this approach is how to define and use a template, and how to search for that template in new frames.

3.1 Template tracking principle

Template-based tracking typically involves the following steps:

  1. Template selection: Select an area in the first frame of the video or a specific frame as the target template for tracking.
  2. Similarity measure: Define a metric to calculate the similarity between the template and the candidate region in the new frame. Common metrics include squared differences, correlation coefficients, etc.
  3. Search for matches: Search for the region most similar to the template in subsequent frames. This can be achieved through sliding windows and similarity measures.

3.2 Template tracking formula

A common similarity measure is the normalized cross-correlation coefficient, whose formula is:

R ( x , y ) = ∑ x ′ , y ′ [ T ( x ′ , y ′ ) ⋅ I ( x + x ′ , y + y ′ ) ] ∑ x ′ , y ′ [ T ( x ′ , y ′ ) 2 ] ⋅ ∑ x ′ , y ′ [ I ( x + x ′ , y + y ′ ) 2 ] R(x, y) = \frac{\sum_{x', y'}[T(x', y') \cdot I(x + x', y + y')]}{\sqrt{\sum_{x', y'}[T(x', y')^2] \cdot \sum_{x', y'}[I(x + x', y + y')^2]}} R(x,y)=x,y[T(x,and)2]x,y[I(x+x,and+and)2] x,y[T(x,and)I(x+x,and+and)]

Naka, R ( x , y ) R(x, y) R(x,y) is the correlation coefficient at position (x, y), T T TThis is a mockup image, and I I I is the search area in the current frame.

3.3 Code implementation

3.3.1 Target tracking in video or camera

The following is a simple example of template-based tracking:
import cv2

# # 初始化摄像头
# cap = cv2.VideoCapture(0)
# 读取视频
cap = cv2.VideoCapture('video.mp4')

# 读取第一帧并选择模板
ret, frame = cap.read()
template = cv2.selectROI("Select Template", frame, fromCenter=False)
template_img = frame[int(template[1]):int(template[1] + template[3]), int(template[0]):int(template[0] + template[2])]
h, w = template_img.shape[:2]

# 开始跟踪
while True:
    _, frame = cap.read()
    if not ret:
        break

    # 匹配模板
    res = cv2.matchTemplate(frame, template_img, cv2.TM_CCOEFF_NORMED)
    _, _, _, max_loc = cv2.minMaxLoc(res)

    # 绘制跟踪结果
    top_left = max_loc
    bottom_right = (top_left[0] + w, top_left[1] + h)
    cv2.rectangle(frame, top_left, bottom_right, (0, 255, 0), 2)

    cv2.imshow("Tracking", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放资源
cap.release()
cv2.destroyAllWindows()

Tracking

In this example, we first select a template region from the first frame of the video. Then, use the cv2.matchTemplate function to find the region in each frame that best matches that template. This method may not work well when the target's appearance changes significantly, but it can work effectively when the target's appearance remains relatively stable.

In actual tasks, there are problems such as loss or incorrect positioning.
正确定位:
Error location
错误定位:
Error location

3.3.2 Target tracking in random animation

import cv2
import Animation

animation = Animation.Animation(500, 400, 10)
frame = animation.get_frame()

template = cv2.selectROI("Select Template", frame, fromCenter=False)
template_img = frame[int(template[1]):int(template[1] + template[3]), int(template[0]):int(template[0] + template[2])]
h, w = template_img.shape[:2]

# 开始跟踪
while True:
    frame = animation.get_frame()

    # 匹配模板
    res = cv2.matchTemplate(frame, template_img, cv2.TM_CCOEFF_NORMED)
    _, _, _, max_loc = cv2.minMaxLoc(res)

    # 绘制跟踪结果
    top_left = max_loc
    bottom_right = (top_left[0] + w, top_left[1] + h)
    cv2.rectangle(frame, top_left, bottom_right, (0, 255, 0), 2)

    cv2.imshow("Tracking", frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

Tracking

4. Feature-based tracking

In computer vision, feature-based tracking focuses on identifying and tracking key feature points of objects in video sequences.

4.1 Feature tracking principle

Feature-based tracking usually consists of two main steps: feature point detection and feature point matching.

  1. Feature point detection: First, the algorithm identifies key feature points in the first frame. These points are unique areas in the image, such as corners, edges, etc.

  2. Feature point matching: These feature points are then tracked in subsequent frames. This is done by comparing the appearance and location of feature points in adjacent frames.

4.2 Feature tracking formula

A commonly used feature point detection algorithm is the Shi-Tomasi corner detector, and its calculation formula is as follows:

R = min ( λ 1 , λ 2 ) R = min (\lambda_1, \lambda_2)R=min(λ1,l2)

In that, λ 1 \lambda_1 l1 sum λ 2 \lambda_2 l2 is the eigenvalue of the covariance matrix of the gradient in the neighborhood of a certain point in the image. Larger R R A value of R means that the point is a strong corner point.

4.3 Code implementation

4.3.1 Object tracking in video or camera

Here is sample code for feature-based tracking:

import numpy as np
import cv2

# # 初始化摄像头
# cap = cv2.VideoCapture(0)
# 读取视频
cap = cv2.VideoCapture('video.mp4')

# Shi-Tomasi角点检测参数
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

# 光流法参数
lk_params = dict(winSize=(15, 15), maxLevel=2, criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# 随机颜色
color = np.random.randint(0, 255, (100, 3))

# 读取第一帧
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)

# 创建一个掩模用于绘制轨迹
mask = np.zeros_like(old_frame)

while True:
    ret, frame = cap.read()
    if not ret:
        break
        
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # 计算光流以获取新的特征点位置
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)
    # 如果p1为None,重新检测特征点
    if p1 is None:
        p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)
        continue
    # 选取好的特征点
    good_new = p1[st == 1]
    good_old = p0[st == 1]

    # 绘制轨迹
    for i, (new, old) in enumerate(zip(good_new, good_old)):
        a, b = new.ravel()
        c, d = old.ravel()
        mask = cv2.line(mask, (a, b), (c, d), color[i].tolist(), 2)
        frame = cv2.circle(frame, (a, b), 5, color[i].tolist(), -1)
    img = cv2.add(frame, mask)

    cv2.imshow('Frame', img)

    # 更新上一帧的图像和特征点位置
    old_gray = frame_gray.copy()
    p0 = good_new.reshape(-1, 1, 2)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放资源和关闭窗口
cv2.destroyAllWindows()

Shi-Tomasi feature tracking

4.3.2 Target tracking in random animation

import numpy as np
import cv2
import Animation

animation = Animation.Animation(500, 400, 2)

# Shi-Tomasi角点检测参数
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

# 光流法参数
lk_params = dict(winSize=(15, 15), maxLevel=2, criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# 随机颜色
color = np.random.randint(0, 255, (100, 3))

# 读取第一帧
old_frame = animation.get_frame()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)

# 创建一个掩模用于绘制轨迹
mask = np.zeros_like(old_frame)

while True:
    frame = animation.get_frame()
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # 计算光流以获取新的特征点位置
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)
    # 如果p1为None,重新检测特征点
    if p1 is None:
        p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)
        continue
    # 选取好的特征点
    good_new = p1[st == 1]
    good_old = p0[st == 1]

    # 绘制轨迹
    for i, (new, old) in enumerate(zip(good_new, good_old)):
        a, b = new.ravel()
        c, d = old.ravel()
        mask = cv2.line(mask, (a, b), (c, d), color[i].tolist(), 2)
        frame = cv2.circle(frame, (a, b), 5, color[i].tolist(), -1)
    img = cv2.add(frame, mask)

    cv2.imshow('Frame', img)

    # 更新上一帧的图像和特征点位置
    old_gray = frame_gray.copy()
    p0 = good_new.reshape(-1, 1, 2)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放资源和关闭窗口
cv2.destroyAllWindows()

Shi-Tomasi feature tracking
Shi-Tomasi feature tracking

5. Density-based tracking

5.1 Mean shift method target tracking

5.1.1 Principle of mean shift method

The basic idea of ​​the mean shift method is to use the density distribution of sample points to perform clustering. During the algorithm process, each sample point moves toward the density center in its neighborhood, and this process continues to iterate until it reaches the point with the maximum local density. In this way, sample points with similar characteristics will gradually gather together to form clusters. The key to the mean shift algorithm is how to determine the neighborhood of each point and its density center.

5.1.2 Mean shift method formula

The core formula of the mean migration method involves calculating the mean value of sample points in the neighborhood of each point as the direction of migration. The specific formula is:

x 1 , x 2 , … , x n x_1, x_2, \ldots, x_n x1,x2,,xn is a sample point, for each sample point x i x_i xi, the mean shift algorithm updates its position through the following steps:

  1. Select window size: First select a "window" or "kernel" (usually a Gaussian kernel or a uniform kernel) and the corresponding bandwidth parameters< a i=2> h h h

  2. Calculate the mean within the window: for each data point x i x_i xi, calculate the bandwidth around it h h The mean of all sample points within h. This mean is calculated using weights, which are usually determined by a kernel function. The mean calculation formula is:
    m ( x i ) = ∑ x j ∈ N ( x i ) K ( x i − x j ) x j ∑ x j ∈ N ( x i ) K ( x i − x j ) m(x_i) = \frac{\sum_{x_j \in N(x_i)} K(x_i - x_j) x_j}{\sum_{x_j \in N(x_i)} K(x_i - x_j)} m(xi)=xjN(xi)K(xixj)xjN(xi)K(xixj)xj
    inside, N ( x i ) N(x_i) N(xi) 表示 x i x_i xiNeighborhood around , K K K is the kernel function.

  3. Update data point position: Change each data point x i x_i xi Move to calculated mean m ( x i ) m(x_i) m(xi) 移动。

  4. Iteration: Repeat steps 2 and 3 until the moving distance of all points is less than a certain threshold or reaches the preset number of iterations.

The key to the mean shift algorithm lies in the selection of kernel function and bandwidth parameters h h Settings for h. The choice of kernel function determines the weight distribution of sample points, and the bandwidth h h h determines the size of the local neighborhood. In this way, mean shift can find the density peak of the data, thereby achieving clustering of the data.

5.1.3 Code implementation

The following is an example of implementing mean shift object tracking:
Object tracking in video or camera:

import numpy as np
import cv2

# # 初始化摄像头
# cap = cv2.VideoCapture(0)
# 读取视频
cap = cv2.VideoCapture('video.mp4')

# 读取第一帧并选择跟踪目标
ret, frame = cap.read()
roi = cv2.selectROI(frame, False)
x, y, w, h = roi
track_window = (x, y, w, h)

# ROI的直方图
roi_img = frame[y:y+h, x:x+w]
hsv_roi = cv2.cvtColor(roi_img, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((180.,255.,255.)))
roi_hist = cv2.calcHist([hsv_roi], [0], mask, [180], [0,180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# 均值迁移参数
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

    # 应用均值迁移来获取新窗口位置
    ret, track_window = cv2.meanShift(dst, track_window, term_crit)

    # 绘制窗口
    x, y, w, h = track_window
    final_img = cv2.rectangle(frame, (x, y), (x+w, y+h), 255, 2)

    cv2.imshow('Mean Shift Tracking', final_img)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Mean Shift Tracking
Target tracking in random animation:

import numpy as np
import cv2
import Animation

animation = Animation.Animation(500, 400, 2)

# 读取第一帧并选择跟踪目标
frame = animation.get_frame()
roi = cv2.selectROI(frame, False)
x, y, w, h = roi
track_window = (x, y, w, h)

# ROI的直方图
roi_img = frame[y:y+h, x:x+w]
hsv_roi = cv2.cvtColor(roi_img, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_roi, np.array((0., 60.,32.)), np.array((180.,255.,255.)))
roi_hist = cv2.calcHist([hsv_roi], [0], mask, [180], [0,180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# 均值迁移参数
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)

while True:
    frame = animation.get_frame()

    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

    # 应用均值迁移来获取新窗口位置
    ret, track_window = cv2.meanShift(dst, track_window, term_crit)

    # 绘制窗口
    x, y, w, h = track_window
    final_img = cv2.rectangle(frame, (x, y), (x+w, y+h), (0,255,0), 2)

    cv2.imshow('Mean Shift Tracking', final_img)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cv2.destroyAllWindows()

Mean Shift Tracking
Mean Shift Tracking

5.2 Optical flow method target tracking

Optical flow method is a technology for analyzing and tracking target motion in continuous dynamic images. It is widely used in the fields of computer vision and video processing, especially in target tracking.

5.2.1 Principle of optical flow method

The optical flow method is based on the assumption that the movement of an object in an image sequence will cause changes in image brightness over time. Therefore, by analyzing these brightness changes, it is possible to infer the object's motion between two consecutive frames.

Optical flow is essentially the vector field of the motion speed and direction of each pixel in the image. It is not the speed of the actual object, but the projection of the object's motion on the image plane. By analyzing these vectors, the trajectory, speed, and direction of an object can be estimated.

5.2.2 Optical flow method formula

The core formula of the optical flow method is based on the assumption of constant brightness, that is, the brightness of a point remains unchanged in two consecutive frames of images. Suppose the brightness of the image I ( x , y , t ) I(x, y, t) I(x,y,(x, y) (x, y)(x,y) 和时间 t t t is known, then the optical flow equation can be expressed as:

∂ I ∂ x v x + ∂ I ∂ y v y + ∂ I ∂ t = 0 \frac{\partial I}{\partial x}v_x + \frac{\partial I}{\partial y}v_y + \frac{\partial I}{\partial t} = 0 xIinx+yIiny+tI=0

其中, ∂ I ∂ x \frac{\partial I}{\partial x} xI ∂ I ∂ y \frac{\partial I}{\partial y} yI is the brightness gradient of the image in the spatial dimension, ∂ I ∂ t \frac{\partial I}{\partial t} tI is the brightness change in the time dimension, v x v_x inx sum v y v_y iny are the pixels at x x x sum y y Movement speed in y direction.

The challenge of the optical flow method is that this equation has only one equation but two unknowns ( v x v_x inx sum v y v_y iny), so it is an ill-posed problem. In order to solve this problem, it is usually necessary to introduce additional constraints, such as smoothness constraints, or use a variety of techniques and algorithms to approximate the solution.

In practical applications, the optical flow method needs to take into account the effects of noise, illumination changes, occlusion and other factors, so it is usually combined with other algorithms and technologies to improve accuracy and robustness. The optical flow method is widely used in many fields such as target tracking, scene analysis, and 3D structure reconstruction.

5.2.3 Code implementation

The code implementation here is similar to3.3.1 Target tracking in video or camera.

Object tracking in video or camera:

import cv2

# # 初始化摄像头
# cap = cv2.VideoCapture(0)
# 读取视频
cap = cv2.VideoCapture('video.mp4')

# Shi-Tomasi角点检测参数
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

# 读取第一帧
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)

# 光流法参数
lk_params = dict(winSize=(15, 15), maxLevel=2, criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# 使用Shi-Tomasi方法检测角点
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # 计算光流
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)

    # 选取好的特征点
    good_new = p1[st==1]
    good_old = p0[st==1]

    # 绘制特征点
    for i, (new, old) in enumerate(zip(good_new, good_old)):
        a, b = new.ravel()
        c, d = old.ravel()
        frame = cv2.line(frame, (a, b), (c, d), (0, 255, 0), 2)
        frame = cv2.circle(frame, (a, b), 5, (0, 255, 0), -1)

    cv2.imshow('Optical Flow Tracking', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    old_gray = frame_gray.copy()
    p0 = good_new.reshape(-1, 1, 2)

cap.release()
cv2.destroyAllWindows()

Optical Flow Tracking

The code implementation here is similar to3.3.2 Target tracking in random animation.

Target tracking in random animation:

import cv2
import Animation

animation = Animation.Animation(500, 400, 2)

# Shi-Tomasi角点检测参数
feature_params = dict(maxCorners=100, qualityLevel=0.3, minDistance=7, blockSize=7)

# 读取第一帧
old_frame = animation.get_frame()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)

# 光流法参数
lk_params = dict(winSize=(15, 15), maxLevel=2, criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# 使用Shi-Tomasi方法检测角点
p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)

while True:
    frame = animation.get_frame()
    frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # 计算光流
    p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)
    # 如果p1为None,重新检测特征点
    if p1 is None:
        p0 = cv2.goodFeaturesToTrack(old_gray, mask=None, **feature_params)
        continue
    # 选取好的特征点
    good_new = p1[st == 1]
    good_old = p0[st == 1]

    # 绘制特征点
    for i, (new, old) in enumerate(zip(good_new, good_old)):
        a, b = new.ravel()
        c, d = old.ravel()
        frame = cv2.line(frame, (a, b), (c, d), (0, 255, 0), 2)
        frame = cv2.circle(frame, (a, b), 5, (0, 255, 0), -1)

    cv2.imshow('Optical Flow Tracking', frame)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

    old_gray = frame_gray.copy()
    p0 = good_new.reshape(-1, 1, 2)

cv2.destroyAllWindows()

Optical Flow Tracking

6. Model-based tracking

6.1 Principle of model tracking

Model-based tracking is a method that uses mathematical models to represent and track targets. This tracking technology usually relies on predefined target models, which can be geometric shapes, 3D models of objects, or models with specific characteristics. The tracking process involves continuously adjusting model parameters to ensure the best fit between the model and the observed data.

6.2 Model tracking formula

In model-based tracking, the core of the formula for model tracking is the optimization problem, that is, finding the best model parameters θ \theta θ so that the model predictions are as close as possible to the actual observations. Typically this is achieved by minimizing a loss function. The loss function measures the difference between the predicted value and the actual observed value.

and \mathbf{y} y is the observed data point (for example, the position of the target in the image), f ( θ ) f(\theta) < /span>f(θ) This is the model 预浦, part θ \theta θ This is a model number. Index function L ( θ ) L(\theta) L(θ) (usually unnamed) Function) Possible display:

L ( θ ) = ∑ i ( y i − f ( θ ) i ) 2 L(\theta) = \sum_{i}(y_i - f(\theta)_i)^2 L(θ)=i(yif(θ)i)2

这り, L ( θ ) L(\theta) L(θ) This is actually a different view < /span> y i y_i andi Japanese model 预测 f ( θ ) i f(\theta)_i f(θ)iThe sum of squared differences between . The goal is to find the parameters θ \theta θ, use L ( θ ) L(\theta) L(θ) Invariant >

Optimization

  1. Gradient descent method: This is a commonly used optimization technique for updating parameters θ \theta θ to minimize the loss function. The parameter update formula is:

    θ : = θ − α ∇ θ L ( θ ) \theta := \theta - \alpha \nabla_\theta L(\theta) .i:=iαθL(θ)

    in that, α \alpha α Determine the range, ∇ θ L ( θ ) \nabla_\theta L(\theta) θL(θ) is the erroneous function 关于 < /span> θ \theta θ level.

  2. Iterative method: In practical applications, the gradient descent method will iterate multiple times, and each iteration will be updated according to the direction of the gradient θ \ theta θ until the minimum value of the loss function is found or a certain stopping condition is reached.

In this way, model-based tracking methods are able to adjust the model parameters θ \theta in each frameθ to ensure that the model’s description of the target is as close as possible to the actual observation data and to achieve effective tracking of the target.

6.3 Code implementation

注:以下方法只适合简单图形

Object tracking in video or camera:

import cv2
import numpy as np

# # 初始化摄像头
# cap = cv2.VideoCapture(0)
# 读取视频
cap = cv2.VideoCapture('video2.mp4')

# 读取第一帧并定义初始矩形位置
ret, frame = cap.read()
init_pos = cv2.selectROI("Frame", frame, False)
cv2.destroyWindow("Frame")  # 关闭选择窗口
x, y, w, h = init_pos
track_window = (x, y, w, h)

# 设置ROI并计算直方图
roi = frame[y:y+h, x:x+w]
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_roi, np.array((0., 60., 32.)), np.array((180., 255., 255.)))
roi_hist = cv2.calcHist([hsv_roi], [0], mask, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# 设置跟踪模型
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)
    ret, track_window = cv2.CamShift(dst, track_window, term_crit)

    # 绘制跟踪结果
    pts = cv2.boxPoints(ret)
    pts = np.int0(pts)
    img2 = cv2.polylines(frame, [pts], True, 255, 2)

    cv2.imshow('Tracking', img2)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放资源
cap.release()
cv2.destroyAllWindows()

In this example, we use the CamShift algorithm for model-based tracking. CamShift is an adaptive tracking method that can handle changes in target size. When tracking starts, the user needs to select an ROI (region of interest), and then the algorithm will find the best match in subsequent frames based on the color information in the ROI.

CamShift
Target tracking in random animation:

import cv2
import numpy as np
import Animation

animation = Animation.Animation(500, 400, 2)
# 读取第一帧并定义初始矩形位置
frame = animation.get_frame()
init_pos = cv2.selectROI("Frame", frame, False)
cv2.destroyWindow("Frame")  # 关闭选择窗口
x, y, w, h = init_pos
track_window = (x, y, w, h)

# 设置ROI并计算直方图
roi = frame[y:y + h, x:x + w]
hsv_roi = cv2.cvtColor(roi, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_roi, np.array((0., 60., 32.)), np.array((180., 255., 255.)))
roi_hist = cv2.calcHist([hsv_roi], [0], mask, [180], [0, 180])
cv2.normalize(roi_hist, roi_hist, 0, 255, cv2.NORM_MINMAX)

# 设置跟踪模型
term_crit = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1)

while True:
    frame = animation.get_frame()

    hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
    dst = cv2.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)
    ret, track_window = cv2.CamShift(dst, track_window, term_crit)

    # 绘制跟踪结果
    pts = cv2.boxPoints(ret)
    pts = np.int0(pts)
    img2 = cv2.polylines(frame, [pts], True, (0, 255, 0), 2)

    cv2.imshow('Tracking', img2)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放资源
cv2.destroyAllWindows()

CamShift

CamShift

6.4 Reasons for inaccurate tracking and positioning

  1. Initial ROI selection: Use the cv2.selectROI function to manually select a region in the first frame of the video. The color information in this area is used to initialize tracking.

  2. Color histogram: The code calculates the color histogram of the HSV color space of the selected ROI. This histogram is used to search for areas with the same or similar color distribution in subsequent frames.

  3. Color distribution dependence: The CamShift algorithm's ability to track objects strongly depends on the color distribution of the initial ROI. If there is no similar color distribution in other frames in the video, or the color of the tracked target changes significantly, the tracking effect will be greatly reduced.

  4. Tracking window update: In each frame, the CamShift algorithm updates the position of the tracking window, trying to match areas similar to the initial histogram. If the target moves to an area with a different color distribution than the initial ROI, tracking may fail.

  5. Environmental factors: Factors such as lighting changes, occlusion, and backgrounds of similar colors may affect the accuracy of tracking.

7. Learning-based tracking

7.1 Learning tracking principles

Learning-based tracking methods involve using machine learning algorithms to train models to identify and track objects in videos. These methods typically include feature extraction, model training, and online tracking.

  1. Feature extraction: Extract effective features from video frames, which can represent important attributes of the target.

  2. Model training: Use the extracted features to train a classifier or regression model to distinguish the target from the background.

  3. Online tracking: Apply the trained model in the video stream and update the model parameters in real time to adapt to changes in the target.

OpenCV provides some built-in learning-based trackers, such as KCF (Kernelized Correlation Filters) and CSRT (Channel and Spatial Reliability Tracker)

7.2 KCF Tracker

7.2.1 KCF tracker principle and formula

The KCF tracker is based on the concept of correlation filters and achieves target tracking efficiently by using circulant matrices and fast Fourier transform (FFT).

1. Circulant matrix and correlation
The core of the KCF tracker is to construct a circulant matrix, which is achieved by converting training samples (i.e., image patches around the target) into a cyclic structure of. Such a circulant matrix allows the correlation between samples to be efficiently calculated through Fast Fourier Transform (FFT), which greatly improves the calculation speed.

2. Objective function
The purpose of the KCF tracker is to learn a filter that maximizes the response function on new image frames. The response function is defined as follows:
f ( w ) = ∑ i = 1 n ( y i − w T ϕ ( x i ) ) 2 + λ ∥ w ∥ 2 f(\mathbf{w}) = \sum_{i=1}^{n} \left( y_i - \mathbf{w}^T \phi(\mathbf{x}_i) \right)^2 + \lambda \|\mathbf{w}\| ^2 f(w)=i=1n(yiInTϕ(xi))2+λw2
这り, w \mathbf{w} wThe weight of the display wave device, ϕ ( x i ) \phi(\mathbf{x}_i) ϕ(xi) is the feature mapped by the kernel function, y i y_i andi is the response value of the target, and λ \lambda λ is a regularization parameter used to prevent overfitting.

3. Kernel correlation
KCF uses kernel techniques to map data into a higher-dimensional feature space, thereby being able to capture more complex feature relationships. The kernel correlation function can be defined as:
K ( x , z ) = ϕ ( x ) T ϕ ( z ) K(\mathbf{x}, \mathbf{z}) = \phi( \mathbf{x})^T \phi(\mathbf{z}) K(x,z)=ϕ(x)Tϕ(z)
这里, x \mathbf{x} x z \mathbf{z} zThis is the special direction, and ϕ \phi ϕ is the kernel function mapping.

4. Filter training
Filter training involves solving the optimal solution of the above objective function. Using Fourier transforms and kernel techniques, this process can be accomplished efficiently.

5. Target localization
In new video frames, the learned filters are used to calculate relevant responses to locate the target. The target position usually corresponds to the maximum value in the response plot.

6. Update mechanism
In order to adapt to changes in the appearance of the target, the KCF tracker includes a mechanism for gradually updating the filter based on new tracking results.

KCF tracker is popular for its good balance between speed and performance. By using FFT and kernel techniques, it can effectively track targets in real-time video streams, especially suitable for application scenarios that require fast tracking processing.

7.2.2 Code implementation

Object tracking in video or camera:

import cv2

# 创建KCF跟踪器的实例
tracker = cv2.TrackerKCF_create()

# 读取视频
cap = cv2.VideoCapture('video.mp4')

# 读取视频的第一帧
ret, frame = cap.read()

# 选择要跟踪的目标
bbox = cv2.selectROI(frame, False)

# 初始化跟踪器
ok = tracker.init(frame, bbox)

while True:
    # 读取新的帧
    ret, frame = cap.read()
    if not ret:
        break

    # 更新跟踪器
    ok, bbox = tracker.update(frame)

    # 绘制跟踪框
    if ok:
        (x, y, w, h) = [int(v) for v in bbox]
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2, 1)

    # 显示结果
    cv2.imshow("Tracking", frame)

    # 退出条件
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放资源
cap.release()
cv2.destroyAllWindows()

KCF Tracking
Target tracking in random animation:

import cv2
import Animation

animation = Animation.Animation(500, 400, 10)

# 创建KCF跟踪器的实例
tracker = cv2.TrackerKCF_create()

# 读取视频的第一帧
frame = animation.get_frame()

# 选择要跟踪的目标
bbox = cv2.selectROI(frame, False)

# 初始化跟踪器
ok = tracker.init(frame, bbox)

while True:
    # 读取新的帧
    frame = animation.get_frame()

    # 更新跟踪器
    ok, bbox = tracker.update(frame)

    # 绘制跟踪框
    if ok:
        (x, y, w, h) = [int(v) for v in bbox]
        cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2, 1)

    # 显示结果
    cv2.imshow("Tracking", frame)

    # 退出条件
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# 释放资源
cv2.destroyAllWindows()

KCF Tracking


Summarize

Through the study of this article, we have a comprehensive understanding of the application of OpenCV in the field of moving object detection and target tracking. From basic difference methods to complex learning trackers, each method has its unique advantages and application scenarios. Although the difference method is simple, it can be very effective in certain situations. Template, feature, and density-based methods provide more flexibility and accuracy and are suitable for more complex scenarios. Model- and learning-based methods represent the latest progress in target tracking technology and can handle extremely complex tracking environments.

Different tracking technologies have their own advantages and are suitable for solving different types of problems. As a dynamically developing field, computer vision and target tracking technology still have a lot of room for development, and will surely bring more innovations and breakthroughs in the future.

Guess you like

Origin blog.csdn.net/qq_31463571/article/details/134646806