Traditional target tracking - MeanShift algorithm

Table of contents

1. Mean Shift (MeanShift)

2. Process

3. Code

3.1 code for meanshift + fixed frame

3.2 Optimization: meanshift+mouse selection

3.3 meanshift + self-implementation function

4. Supplementary knowledge

4.1 Histogram

4.2 Normalization

4.3 Histogram Backprojection


1. Mean Shift (MeanShift)

        This algorithm finds the maximum density of discrete samples and recalculates the maximum density of the next frame. The characteristic of this algorithm is that it can give the direction of target movement.

The principle          of the meanshift algorithm is very simple. Suppose you have a bunch of point sets and a small window. This window may be circular. Now you may want to move this window to the area with the highest density of point sets. 

        The first window is the area of ​​the blue circle, named C1. The center of the blue circle is marked with a blue rectangle, named C1_o.
        The center of mass formed by the point sets of all points in the window is at the blue circular point C1_r, obviously the centroid and the center of mass of the ring do not coincide. So, move the blue window so that the centroid coincides with the centroid obtained earlier. Find the centroid of the point set enclosed in the circle again in the area of ​​the newly moved circle, and then move again. Usually, the centroid and the centroid do not coincide. Continue to perform the above moving process until the centroid and the centroid roughly coincide. In this way, the final circular window will fall to the place where the pixel distribution is the largest, that is, the green circle in the figure, named C2.
        In addition to being used in video tracking, the meanshift algorithm has important applications in various occasions involving data and unsupervised learning such as clustering and smoothing. It is a widely used algorithm.

        If you don't know the target to be tracked in advance, you can use this clever method to add conditions to enable dynamic start tracking (and stop tracking) certain areas of the video, (such as pre-trained SVM can be used for detection of the target, and then start tracking the detected target using MeanShift)

        So it is generally divided into two steps: 1. Mark the region of interest 2. Track the region

2. Process

         An image is a matrix of information. How to use the meanshift algorithm to track a moving object in a video? The general process is as follows:

meanshift process
1. First select a target area on the image
2. Calculate the histogram distribution of the selected area, generally the histogram of the HSV color space
3. Calculate the histogram distribution for the next frame image b
4. Calculate the area in image b that is most similar to the histogram distribution of the selected area, and use the meanshift algorithm to move the selected area along the most similar part until the most similar area is found, and the goal in image b is completed track.
5. Repeat the process from 3 to 4 to complete the entire video target tracking.
Usually we use the image obtained by histogram backprojection and the starting position of the target object in the first frame. When the movement of the target object will be reflected in the histogram backprojection, the meanshift algorithm will move our window to the backprojection. The area with the highest gray density in the projected image is projected.

                Histogram Backprojection

The main process of implementing Meanshift is:

  1. Read video file: cv.videoCapture()
  2. Area of ​​interest setting: Get the first frame of image and set the target area, that is, the area of ​​interest
  3. Calculate histogram: Calculate the HSV histogram of the region of interest and normalize it
  4. Target tracking: set the window search stop condition, histogram back projection, target tracking, and draw a rectangular frame at the target position

3. Code

opencv API

cv2.meanShift(probImage, window, criteria)

parameter:

  • probImage: ROI area, which is the back projection of the target histogram
  • window: The initial search window is the rect that defines the ROI
  • Criteria: Criteria for determining the window search stop, mainly including the number of iterations reaching the set maximum value, the drift value of the window center is greater than a set limit, etc.

(Python) From scratch, simply and quickly learn machine humanoid vision Opencv --- application five: object motion tracking - Gu Yueju

3.1 code for meanshift + fixed frame

import cv2 as cv

# 创建读取视频的对象
cap = cv.VideoCapture("E:\Python-Code/videodataset/enn.mp4")

# 获取第一帧位置,并指定目标位置
ret, frame = cap.read()
c, r, h, w = 530, 160, 300, 320
track_window = (c, r, h, w)
# 指定感兴趣区域
roi = frame[r:r + h, c:c + w]

# 计算直方图
# 转换色彩空间
hsv_roi = cv.cvtColor(roi, cv.COLOR_BGR2HSV)
# 计算直方图
roi_hist = cv.calcHist([hsv_roi], [0], None, [180], [0, 180])
# 归一化
cv.normalize(roi_hist, roi_hist, 0, 255, cv.NORM_MINMAX)

# 目标追踪
# 设置窗口搜索终止条件:最大迭代次数,窗口中心漂移最小值
term_crit = (cv.TermCriteria_EPS | cv.TERM_CRITERIA_COUNT, 10, 1)

while True:
    ret, frame = cap.read()
    if ret:
        # 计算直方图的反向投影
        hsv = cv.cvtColor(frame, cv.COLOR_BGR2HSV)
        dst = cv.calcBackProject([hsv], [0], roi_hist, [0, 180], 1)

        # 进行meanshift追踪
        ret, track_window = cv.meanShift(dst, track_window, term_crit)

        # 将追踪的位置绘制在视频上,并进行显示
        x, y, w, h = track_window
        img = cv.rectangle(frame, (x, y), (x + w, y + h), 255, 2)
        cv.imshow("frame", img)

        if cv.waitKey(20) & 0xFF == ord('q'):
            break

    else:
        break

# 资源释放
cap.release()
cv.destroyAllWindows()

 Disadvantages: This is the size and position of the specific frame set at the beginning, which cannot be changed according to the actual situation.

        The position of the initial box is important. Taking Meanshift as an example, its working principle is to find the largest density area based on the probability density, but if we initially place the tracking frame in a completely black area (density is 0) in the histogram back projection, this It will cause it to not move in the correct direction towards the object, causing it to get stuck there.

        We have tracked the initial frame (first frame) of the video as an example. If we want to track one of the objects, we have to place the tracking frame in the area around the tracked object to make the program run normally, but it is actually difficult for us to know Track the exact position of an object in an image. To give a simple example, for example, I have a picture now, and we want to track an object in the lower right corner of the picture, but I don’t know the coordinate range of this object, so I can only try it again and again (modify the initial box in the code position, check the running status of the program) to ensure that the code can run normally, but the universality of this code is very poor, because whenever you want to change the tracking object, you need to modify it repeatedly to deal with the current situation, so It is a bit troublesome.  

3.2 Optimization: meanshift+mouse selection

        Here is a function named: cv2.selectROI. Using this function, we can manually animate our tracking frame. The function syntax is as follows: 

track_window=cv2.selectROI('frameName', frame)

parameter:

  • framename: the canvas name of the display window
  • frame: specific frame
import cv2
import numpy as np

# 读取视频
cap=cv2.VideoCapture('E:\Python-Code/videodataset/enn.mp4')
# 获取第一帧位置,参数ret 为True 或者False,代表有没有读取到图片 第二个参数frame表示截取到一帧的图片
ret,frame=cap.read()
#我这里画面太大了所以缩小点——但是缩小后我的就会报错
# frame=cv2.resize(frame,None,None,fx=1/2,fy=1/2,interpolation=cv2.INTER_CUBIC)

#跟踪框
track_window=cv2.selectROI('img', frame)

#获得绿色的直方图
# 转换色彩空间
hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
mask=cv2.inRange(hsv,np.array((35,43,46)),np.array((77,255,255)))
# 计算直方图
hist=cv2.calcHist([hsv],[0],mask,[181],[0,180])
# hist=cv2.calcHist([hsv],[0],[None],[180],[0,180])
# 归一化
cv2.normalize(hist,hist,0,255,cv2.NORM_MINMAX)

# 目标追踪
# 设置窗口搜索终止条件:最大迭代次数,窗口中心漂移最小值
term_crit=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT,10,1)

while True:
    ret,frame=cap.read()
    # frame = cv2.resize(frame, None, None, fx=1 / 2, fy=1 / 2, interpolation=cv2.INTER_CUBIC)
    if ret== True:
        # 计算直方图的反向投影
        hsv=cv2.cvtColor(frame,cv2.COLOR_BGR2HSV)
        dst=cv2.calcBackProject([hsv],[0],hist,[0,180],1)

        # 进行meanshift追踪
        ret,track_window=cv2.meanShift(dst,track_window,term_crit)

        # 将追踪的位置绘制在视频上,并进行显示
        x,y,w,h = track_window
        img = cv2.rectangle(frame,(x,y),(x+w,y+h),(0,255,0),2)
        (x,y)=img.shape[:2]
        cv2.imshow('img',img)

        if cv2.waitKey(1)==ord('q'):
            break
    else:
        break

# 资源释放
cap.release()
cv2.destroyAllWindows()

          Run the above code, a window will pop up, the first frame of the video we loaded is displayed on the window, we drag with the mouse to draw the position of the object we want to track

        But what needs to be noted here is that this function can only draw one rectangle each time (OpenCV in the C++ version can draw multiple rectangles at a time), if you want to draw multiple rectangles, you can use the while loop:

bboxes = []
colors = [] 
while 1:
    bbox = cv2.selectROI('MultiTracker', frame)
    bboxes.append(bbox)
    colors.append((randint(0, 255), randint(0, 255), randint(0, 255)))
    print("按下q键退出,按下其他键继续画下一个框")
    if cv2.waitKey(0) & 0xFF==ord('q'):
        break
print('选取的边框为{}'.format(bboxes))

But the complete code doesn't work

3.3 meanshift + self-implementation function

This works the best, but I don't know why? Why is it not effective to call library functions?

import math
import numpy as np
import cv2


def get_tr(img):
    # 定义需要返回的参数
    mouse_params = {'x': None, 'width': None, 'height': None,
                    'y': None, 'temp': None}
    cv2.namedWindow('image')
    # 鼠标框选操作函数
    cv2.setMouseCallback('image', on_mouse, mouse_params)
    cv2.imshow('image', img)
    cv2.waitKey(0)
    return [mouse_params['x'], mouse_params['y'], mouse_params['width'],
            mouse_params['height']], mouse_params['temp']


def on_mouse(event, x, y, flags, param):
    global img, point1
    img2 = img.copy()
    if event == cv2.EVENT_LBUTTONDOWN:  # 左键点击
        point1 = (x, y)
        cv2.circle(img2, point1, 10, (0, 255, 0), 5)
        cv2.imshow('image', img2)
    elif event == cv2.EVENT_MOUSEMOVE and (flags & cv2.EVENT_FLAG_LBUTTON):  # 按住左键拖曳
        cv2.rectangle(img2, point1, (x, y), (255, 0, 0), 5)
        cv2.imshow('image', img2)
    elif event == cv2.EVENT_LBUTTONUP:  # 左键释放
        point2 = (x, y)
        cv2.rectangle(img2, point1, point2, (0, 0, 255), 5)
        cv2.imshow('image', img2)
        # 返回框选矩形左上角点的坐标、矩形宽度、高度以及矩形包含的图像
        param['x'] = min(point1[0], point2[0])
        param['y'] = min(point1[1], point2[1])
        param['width'] = abs(point1[0] - point2[0])
        param['height'] = abs(point1[1] - point2[1])
        param['temp'] = img[param['y']:param['y'] + param['height'],
                        param['x']:param['x'] + param['width']]


def main():
    global img
    cap = cv2.VideoCapture("E:\Python-Code/videodataset/enn.mp4")
    # 获取视频第一帧
    ret, frame = cap.read()
    img = frame
    # 框选目标并返回相应信息:rect为四个信息,temp为框选出来的图像
    rect, temp = get_tr(img)
    print(temp)
    (a, b, c) = temp.shape
    y = [a / 2, b / 2]

    # 计算目标图像的权值矩阵
    m_wei = np.zeros((a, b))
    for i in range(a):
        for j in range(b):
            z = (i - y[0]) ** 2 + (j - y[1]) ** 2
            m_wei[i, j] = 1 - z / (y[0] ** 2 + y[1] ** 2)

    # 计算目标权值直方图
    C = 1 / sum(sum(m_wei))
    hist1 = np.zeros(16 ** 3)
    for i in range(a):
        for j in range(b):
            q_b = math.floor(float(temp[i, j, 0]) / 16)
            q_g = math.floor(float(temp[i, j, 1]) / 16)
            q_r = math.floor(float(temp[i, j, 2]) / 16)
            q_temp1 = q_r * 256 + q_g * 16 + q_b
            hist1[int(q_temp1)] = hist1[int(q_temp1)] + m_wei[i, j]
    hist1 = hist1 * C

    # 接着读取视频并进行目标跟踪
    while (1):
        ret, frame = cap.read()

        if ret == True:
            Img = frame
            num = 0
            Y = [1, 1]

            # mean shift迭代
            while (np.sqrt(Y[0] ** 2 + Y[1] ** 2) > 0.5) & (num < 20):
                num = num + 1

                # 计算候选区域直方图
                temp2 = Img[int(rect[1]):int(rect[1] + rect[3]), int(rect[0]):int(rect[0] + rect[2])]
                hist2 = np.zeros(16 ** 3)
                q_temp2 = np.zeros((a, b))
                for i in range(a):
                    for j in range(b):
                        q_b = math.floor(float(temp2[i, j, 0]) / 16)
                        q_g = math.floor(float(temp2[i, j, 1]) / 16)
                        q_r = math.floor(float(temp2[i, j, 2]) / 16)
                        q_temp2[i, j] = q_r * 256 + q_g * 16 + q_b
                        hist2[int(q_temp2[i, j])] = hist2[int(q_temp2[i, j])] + m_wei[i, j]
                hist2 = hist2 * C

                w = np.zeros(16 ** 3)
                for i in range(16 ** 3):
                    if hist2[i] != 0:
                        w[i] = math.sqrt(hist1[i] / hist2[i])
                    else:
                        w[i] = 0

                sum_w = 0
                sum_xw = [0, 0]
                for i in range(a):
                    for j in range(b):
                        sum_w = sum_w + w[int(q_temp2[i, j])]
                        sum_xw = sum_xw + w[int(q_temp2[i, j])] * np.array([i - y[0], j - y[1]])
                Y = sum_xw / sum_w

                # 位置更新
                rect[0] = rect[0] + Y[1]
                rect[1] = rect[1] + Y[0]

            v0 = int(rect[0])
            v1 = int(rect[1])
            v2 = int(rect[2])
            v3 = int(rect[3])
            pt1 = (v0, v1)
            pt2 = (v0 + v2, v1 + v3)

            # 画矩形
            IMG = cv2.rectangle(Img, pt1, pt2, (0, 0, 255), 2)
            cv2.imshow('IMG', IMG)
            k = cv2.waitKey(60) & 0xff
            if k == 27:
                break
        else:
            break


if __name__ == '__main__':
    main()

4. Supplementary knowledge

4.1 Histogram

The function cv2.calcHist is needed to construct the histogram of the image, and its common function syntax is as follows:  

hist=cv2.calcHist(images, channels, mask, histSize, ranges) 

images:输入的图像
channels:选择图像的通道,如果是三通道的话就可以是[0],[1],[2]
mask:掩膜,是一个大小和image一样的np数组,其中把需要处理的部分指定为1,不需要处理的部分指定为0,一般设置为None,如果有mask,会先对输入图像进行掩膜操作
histSize:使用多少个bin(柱子),一般为256,但如果是H值就是181
ranges:像素值的范围,一般为[0,255]表示0~255,对于H通道而言就是[0,180]

  It should be noted that, except for the mask, the remaining parameters must be added with [], as shown below:  

hist=cv2.calcHist([img],[0],mask,[181],[0,180]) 

4.2 Normalization

        At this time, we also need to use a normalization method to normalize the quantity values ​​in the color histogram. The value in the existing histogram is the number of corresponding pixels, among which the number value of the largest number of pixels in the graph (the y-axis value corresponding to the highest column) is recorded as max, the value in the y direction of the entire histogram The range is [0,max], we need to reduce this range to [0,255],

        Here we need to use the cv2.normalize function, the main syntax of the function is as follows:  

cv2.normalize(src,dst, alpha,beta, norm_type)
·src-输入数组。
·dst-与SRC大小相同的输出数组。
·α-范数值在范围归一化的情况下归一化到较低的范围边界。
·β-上限范围在范围归一化的情况下;它不用于范数归一化。
·范式-规范化类型(见下面详细介绍)。

  What we need to pay attention to here is the paradigm-normalization type, here are the following options.  

NORM_MINMAX:数组的数值被平移或缩放到一个指定的范围,线性归一化。
NORM_INF:归一化数组的(切比雪夫距离)L∞范数(绝对值的最大值)
NORM_L1: 归一化数组的(曼哈顿距离)L1-范数(绝对值的和)
NORM_L2: 归一化数组的(欧几里德距离)L2-范数

  The above nouns may seem grand, but they are actually very simple. Let’s explain them one by one. (If you are not very interested, just look at the first NORM_MINMAX, and you can ignore the remaining three)

        The first is NORM_MINMAX , which is our most commonly used normalization method. For example, the y-axis coordinate corresponding to the tallest column we mentioned above is max. If we use this method and the specified range we want to zoom to is [0,255], then max will be directly assigned a value of 255. The rest of the pillars will be compressed accordingly (similar to the scaling feeling of similar triangles). That's right, one is introduced very simply. Readers who don't really want to know about the others can skip the rest of this section, because the remaining three are not very commonly used.  

      Here are the remaining

4.3 Histogram Backprojection

        In simple terms, it will output an image of the same size as the input image (to be searched), where each pixel value represents the probability that the corresponding point on the input image belongs to the target object (the target we need to track). Explained in simpler terms, the higher (whiter) points in the output image are more likely to represent the target we are searching for (where the input image is located). For the grayscale image, it has only one channel, and the value range is 0 to 255, so we compressed the value range of the y-axis coordinate of the histogram to the range of 0-255 during normalization. It is for direct assignment here.  

Histogram Backprojection 1     Histogram Backprojection 2

        The darker the place, the lower the possibility of belonging to the tracking part, and the brighter the higher the possibility of belonging to the tracking part. The function used here is cv2.calcBackProject, and the function syntax is as follows:  

dst=cv2.calcBackProject(image,channel,hist,range,scale)

image:输入图像
channel:用来计算反向投影的通道数,与产生直方图对应的通道应一致
hist:作为输入的直方图
range:直方图的取值范围
scale:输出图像的缩放比,一般为1,保持与输入图像一样的大小
dst:输出图像

注意:除了hist和scale外,其他的参数都要加上[]

  For example:  

dst=cv2.calcBackProject([hsv],[0],hist,[0,180],1)

Guess you like

Origin blog.csdn.net/weixin_45823221/article/details/128478105