Construction of an epidemic crowd density detection system based on directional gradient histogram

Construction of an epidemic crowd density detection system based on directional gradient histogram

1 Mission objectives

  1. Understand the importance of crowd density detection
  2. Understand the HOG-based crowd density detection system
  3. Master the principles of HOG
  4. Master the application of OpenCV in image processing
  5. Use OpenCV and HOG to build a simple human flow detection system

2 Task description

2.1 Crowd density detection

COVID-19 is an infectious disease caused by the severe acute respiratory syndrome coronavirus. The disease has the ability to spread from person to person. Therefore, in order to reduce the speed and scope of the disease, various regions in the country have accordingly introduced policies to limit the population density. .

Insert picture description here

The traditional method is to use the experience of the administrator to qualitatively judge the density of people in a certain area, but with the rapid development of computer image processing, this work can already be done by a computer through a camera.

2.2 History of crowd density detection

In 2001, Paul Viola and Michael Jones proposed Haar cascade detection in their paper, and this method soon has been widely used in crowd density detection.

In 2005, the appearance of the histogram of directional gradients (HOG) greatly improved the accuracy of crowd density detection.

In 2012, convolutional neural networks achieved amazing results in image classification tasks, and convolutional neural networks began to be widely used in various fields of computer vision. A large number of excellent convolutional neural networks such as R-CNN series and YOLO series have also appeared in target detection tasks including human flow density detection.

People flow density detection methods based on convolutional neural networks can often get a high accuracy rate, but most algorithms require far more computing power than traditional Haar and HOG, so these two methods are still widely used until now use.

3 Knowledge preparation

In order to better complete the task, we need to master some basic knowledge.

3.1 Pedestrian detection

In the field of computer vision, pedestrian density detection requires the use of pedestrian detection methods.

The function of the pedestrian detection algorithm is to find all the pedestrians in the image or video, including their position and size, and use a rectangular box to represent them.

3.2 Directional gradient histogram

The histogram of directional gradients (HOG) can convert a three-channel color image into a feature vector by extracting useful information from the image. We use this feature vector to complete the task of pedestrian detection.

For content images, we must first perform the necessary preprocessing on the image to enhance the characteristics of the image. The most commonly used is to adjust the size of the image and convert the image to grayscale. Both of these methods can increase the computing speed at the same time. Enhance the characteristics of the image.

After preprocessing the image, we can get the gradient map of the image by using kernels of different shapes. There are two commonly used kernels as shown below.

Then divide the entire image into several 8$*$8 small units (each unit has 128 values, 64 of which represent the gradient size and 64 represent the direction).

In each small unit, all gradient values ​​are mapped to a tensor of length 9, which is the gradient histogram.

Finally, we use the form of interval normalization to reduce the sensitivity between the feature and the image light. The usual practice is to use a 16 ∗ * 16 blocks to combine four units into one 36∗ * 1 tensor.

The last step is to further combine these 36$*$1 tensors into one tensor, which is used as the input for subsequent detection.

The above steps are integrated in OpenCV, we can directly use the HOGDescriptor () function to complete.

3.3 Support Vector Machine

In 2005, Navneet Dalal and Bill Triggs found through a large number of tests that under the premise of using HOG as feature extraction, using linear support vector machine (SVM) as a classifier can achieve better results in the overall performance of speed and effect. Our task will also use support vector machines as a classifier.

Support vector machine is a two-classification model. Its basic model is a linear classifier with the largest interval defined in the feature space, which is to solve a separation hyperplane that can correctly divide the training data set and has the largest geometric interval.

In layman's terms, it is to find a hyperplane to separate the samples as much as possible.

This learning strategy of maximizing the interval makes the support vector machine different from the perceptron (it can be understood as a neural network with only one layer), and it can also transform the training support vector machine into a quadratic programming problem.

The HOG+SVM detector has been integrated in OpenCV, so we only need to call it directly.

3.4 OpenCV

The full name of OpenCV is Open Source Computer Vision Library, which is a cross-platform computer vision library that can be tried for free in business and research fields.

OpenCV can be used to develop real-time image processing, computer vision and pattern recognition programs, so it has become one of the most widely used computer vision libraries. Many deep learning programs use OpenCV for image preprocessing.

3.5 Video reading based on OpenCV

OpenCV has a built-in video reading object, and we can read the video frame by frame by constructing this object.

The function to create the camera object is

cap = cv2.VideoCapture(path)

Where path represents the video path, and the returned cap is the camera object. For the camera object, we can use the following method to read a frame in the video.

ret,frame = cap.read()

This method returns two variables: the former judges whether the object still has a frame image, the latter is the tensor form of the frame image (if any), when the read() method is used once , the object will take out a frame, and again Use it to take out the next frame (if any).

Usually, we use a loop function to continuously extract the frame images of the video.

while True:
    ret,frame = cap.read()

    ###任意代码###

    #当视频已经读取完毕则退出
    if not ret:
        break

Finally, we release the space of the camera by calling the release() method to prevent unexpected situations (such as program stuck).

cap.release()

4 Project implementation

Based on the content of task description and knowledge preparation, we already have a preliminary task completion ability. Below we will use OpenCV to complete the task.

4.1 Implementation ideas

For crowd density statistics, we can use the following method: when the number of pedestrians in the area is below the threshold, a green border is used to divide the location of all people; when the threshold is exceeded, a red border is used.

Implementation steps :

  1. Import libraries required for experiment
  2. Read video resources and initialize
  3. Picture preprocessing
  4. Build the overall system

4.2 Implementation steps of data preprocessing

Step 1: Import the libraries needed for the experiment

The libraries required for this experiment are:

import cv2

OpenCV includes the HOG and SVM classifiers we need for this task.

Step 2: Read the video resource and initialize

First of all, as an epidemic population density statistics system, we need to set a number of people as a warning threshold. When the number of people in the image exceeds this threshold, a warning will be triggered (in this task, the border turns red).

#设置疫情警告人数
LIMIT = 10

Subsequently, in order to enable us to analyze the effect of the system, we used the OXFORD TOWN CENTRE dataset as the test data, and we constructed a camera object to facilitate subsequent calls

#构建摄像头对象
cap = cv2.VideoCapture('./data/data.flv')

Then, we used OpenCV to create a HOG+SVM detector directly according to the method mentioned above to help us build this system.

#初始化HOG+SVM检测器
hog = cv2.HOGDescriptor()

The SVM classification model that has been trained by OpenCV is used here. The model can segment pedestrians. The method of directly loading this model is as follows.

hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())

At this point, the initialization work has been completed, and the next step is to process the input image.

Step 3: Image preprocessing

The effect of directly using the original image is often unsatisfactory, so we need to process the image appropriately to improve the feature extraction effect of HOG, so that the SVM classifier can better find out the location of pedestrians.

First, we need to reduce the amount of calculation by correcting the size of the image.

img = cv2.resize(img,(1280, 720)) 

Subsequently, considering that HOG is more suitable for processing using grayscale images, it is necessary to convert the image to grayscale. This step can be achieved through the cvtColor() function.

gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)

The last is to encapsulate the above two functions into one function, which can make the program clearer and easier to understand and easier to maintain.

def preProcessing(img):
    """预处理图片"""
    
    #修正图像大小
    img = cv2.resize(img,(1280, 720)) 
    
    #得到图像的灰度图
    gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
    
    return img,gray

It is worth noting that, in addition to the processed image, the return value of the function also contains the original image with the corrected size, which can facilitate subsequent calls.

Step 4: Build the overall system

The overall idea of ​​the system is to use a large loop to continuously extract the frame images of the video, and use the HOG+SVM model to detect pedestrians after preprocessing.

For this, we first create a loop program to read the video according to the method mentioned in 3.5.

#进入主循环部分
while(True):
    #读取视频下一帧率
    ret, frame = cap.read()

    ###用作处理图像的部分###

Subsequently, we can use the HOG+SVM model to detect through OpenCV's detectMultiScale method. This method returns two arrays: the border information (x, y, length, width) of all pedestrians and the weight of this border (ie confidence, The higher the probability is that there is a pedestrian in the box).

#进入主循环部分
while(True):
    #读取视频下一帧率
    ret, frame = cap.read()

    frame,gray = preProcessing(frame)

    #进行扫描
    rects, weights = hog.detectMultiScale(gray)

Information alone cannot directly reflect the processing effect of the system, so we need to use the rectangle() function to draw a border on the image.

#进入主循环部分
while(True):
    #读取视频下一帧率
    ret, frame = cap.read()

    frame,gray = preProcessing(frame)

    #进行扫描
    rects, weights = hog.detectMultiScale(gray)

    #绘图部分
    for i, (x, y, w, h) in enumerate(rects):
            #参数分别为输入图像,左下角坐标,右上角坐标,RGB颜色及边框粗细
        cv2.rectangle(frame, (x,y), (x + w,y+ h),(255,255,255),2)

        #显示该帧
    cv2.imshow("process", frame)

The mission noted that the demand is based on a threshold value to determine the color of the border, and we have to set the color according to this condition.

#进入主循环部分
while(True):
    #读取视频下一帧率
    ret, frame = cap.read()

    frame,gray = preProcessing(frame)

    #进行扫描
    rects, weights = hog.detectMultiScale(gray)

    #设置边框颜色,当超过限制时边框变为红色
    color = (0,0,255) if len(rects) > LIMIT else (0,255,0)

    #绘图部分
    for i, (x, y, w, h) in enumerate(rects):
        cv2.rectangle(frame, (x,y), (x + w,y+ h),color,2)

    #显示该帧
    cv2.imshow("process", frame)

At this time, the video can be drawn normally, but an error will appear when the video ends. In order to solve this problem, we need to detect whether the video has been read or an abnormal situation has occurred.

At the same time, considering the termination of the main loop, use release() to release the memory destroyAllWindows() to close the open window.

#进入主循环部分
while(True):
    #读取视频下一帧率
    ret, frame = cap.read()
    
    #判断下一帧是否存在
    if ret:
        #修正图像大小
        frame,gray = preProcessing(frame)

        #进行扫描
        rects, weights = hog.detectMultiScale(gray)
        
        #设置边框颜色,当超过限制时边框变为红色
        color = (0,0,255) if len(rects) > LIMIT else (0,255,0)
        
        #绘图部分
        for i, (x, y, w, h) in enumerate(rects):
            cv2.rectangle(frame, (x,y), (x + w,y+ h),color,2)
            
        #显示该帧
        cv2.imshow("process", frame)
    else:
        print("视频已结束或遇到未知错误")
        break

#释放内存并关闭窗口
cap.release()
cv2.destroyAllWindows() 

The basic part has been completed, but in the process of the video, it is found that there are situations where the frame overlaps. We can determine whether this frame needs painting by adding weight detection.

#进入主循环部分
while(True):
    #读取视频下一帧率
    ret, frame = cap.read()
    
    #判断下一帧是否存在
    if ret:
        #修正图像大小
        frame,gray = preProcessing(frame)

        #进行扫描
        rects, weights = hog.detectMultiScale(gray)
        
        #设置边框颜色,当超过限制时边框变为红色
        color = (0,0,255) if len(rects) > LIMIT else (0,255,0)
        
        #绘图部分
        for i, (x, y, w, h) in enumerate(rects):
            #如果该框的权重较小,则不进行绘画
            if weights[i] < 0.7:
                continue
            cv2.rectangle(frame, (x,y), (x + w,y+ h),color,2)
            
        #显示该帧
        cv2.imshow("process", frame)
    else:
        print("视频已结束或遇到未知错误")
        break
        
#释放内存并关闭窗口
cap.release()
cv2.destroyAllWindows() 

Finally, in order to enable the operator to exit the loop early, we added a button detection function to determine whether it ended early.

#进入主循环部分
while(True):
    #读取视频下一帧率
    ret, frame = cap.read()
    
    #判断下一帧是否存在
    if ret:
        #修正图像大小
        frame,gray = preProcessing(frame)

        #进行扫描
        rects, weights = hog.detectMultiScale(gray)
        
        #设置边框颜色,当超过限制时边框变为红色
        color = (0,0,255) if len(rects) > LIMIT else (0,255,0)
        
        #绘图部分
        for i, (x, y, w, h) in enumerate(rects):
            #如果该框的权重较小,则不进行绘画
            if weights[i] < 0.7:
                continue
            cv2.rectangle(frame, (x,y), (x + w,y+ h),color,2)
            
        #显示该帧
        cv2.imshow("process", frame)
    else:
        print("视频已结束或遇到未知错误")
        break
    
    #如果检测到Q键,则退出循环
    if cv2.waitKey(100) & 0xFF == ord("q"): 
        break

        
#释放内存并关闭窗口
cap.release()
cv2.destroyAllWindows() 

Use a simple test to see the effect

It can be seen that the system can recognize normally, but there is a problem of insufficient accuracy.

5 Knowledge development

Pedestrian detection is essentially a target detection problem. The target detection problem is the integration of object recognition and object positioning, that is, not only to identify the classification of the object, but also to obtain the specific location of the object.

We can use other systems to analyze pedestrians in the frame based on the system built above. For example, a convolutional neural network can be used to analyze whether each person is wearing a mask, etc.

In addition, we can use neural networks to segment the pedestrian image. This method can get a pedestrian mask (rather than a box) accurate to the pixel level, which can greatly improve the accuracy of detection and avoid Various possibilities of misjudgment.

Guess you like

Origin blog.csdn.net/NikkiElwin/article/details/107601133