Traditional methods of crowd counting: object detection, regression-based

Data labeling method:

(1) When there are few people and very large people, use the bounding box to frame people from head to toe into a rectangular box. This box only records the coordinates of three points, bottom left, top left, and bottom right; test set When predicting, in addition to the coordinates of the point, the confidence that this box may be a person is also output

(2) When there are many, dense, and small people, because people’s bodies often have overlapping occlusions, a red dot is placed on the head to indicate that it is a person.

 

Evaluation indicators Metrics

MAE is the absolute error between the predicted number and the actual number, and MSE is the mean square error between the predicted number and the actual number.

 

Crowd counting algorithms are mainly divided into three categories: detection-based, regression-based, density estimation

A Method of Model Classification

 

The oldest method: Jacobs' method: cut the entire area into many units according to regulations unit dividing the area occupied by a crowd into sections, estimating how many people are in each unit determining an average number of people in each section, and then unit multiplying by the number of sections occupied

(1)object detection + object tracking最早

Pedestrian detector

we use a shaped window-like detector to identify people based on different classifiers in an image or video and count the number. W

overview

 

detection-based approach. [1]. Detection-based methods are mainly divided into two categories, one is whole-body-based detection, and the other is part-body-based detection. (1) Overall-based detection methods, such as [2,3,4,5], typical traditional methods, use a sliding window to detect people in the scene, train a classifier, and use wavelets extracted from the whole body of pedestrians, HOG, Features such as edges to detect pedestrians. Learning algorithms mainly include methods such as SVM, boosting and incremental random forest. The method based on overall detection is mainly suitable for sparse crowd counting. As the crowd density increases, the occlusion between people becomes more and more serious. (2) So methods based on partial body detection are used to deal with the crowd counting problem. Adding a classifier to a specific part of the body, [6, 7] mainly counts the number of people by detecting parts of the body, such as the head and shoulders. Using the method of shape learning, we use elliptical three-dimensional graphics to model humans, and use stochastic processes to estimate the number and shape parameters of the previous foreground mask . Compared with the whole-based detection, this method has a slight improvement in the effect.

[2] Histograms of oriented gradients for humandetection, [3] Pedestrian detection incrowded scenes [4] Monocular pedestrian detection: Survey and experiments. [5] Pedestrian detection via classification on riemannian manifolds. [6] Object detection with discriminatively trained part-based models. [7] Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detectors.

(S1) First do target detection, use a sliding window detector to detect the crowd in the scene, and count the corresponding number of people, use a sliding window to slide from the upper left corner to the lower right corner of the picture (I don’t think it’s right, this box is in The size of different positions is different), identify all the heads, put a bouding box on it, use a moving window-like detector to identify the target objects in an image, and then (S2) pass the number of boxes count how many there are

Traditional machine learning method: HOG-based detector (the list you list here is not comprehensive, if it is not complete, it will not make much sense)

The method of deep learning: YOLOs or RCNNs

Requirements for this method:

Because to distinguish whether there is a head or no head in this area, you need to do two classifications, so the classifier you provide must have the ability to extract low-level features (I think it is wrong, the basis of classification is the ability to extract features, as for what to extract, not necessarily Need to focus on extracting low-level feature) require well-trained classifiers that can extract low-level feature

Advantages: (1) The effect is better in scenes where the crowd is relatively sparse may perform dramatic detection accuracy in sparse scenes. (1) Can locate the specific location of a person; (2) The model has strong generalization and is suitable for various scenes

shortcoming:

(1) The scene with dense crowd is not very effective, and there will be a large number of missed detection present unsatisfactory results when encountered the situation of occlusion and background clutter in extremely dense crowds. Crowds with more than 2000 people cannot be counted, because A. This kind of scene is often occluded (occlusion, overlap) seriously B. The features of the target of this kind of scene cannot be clearly identified. The target features are not clearly distinguishable and/or visible while working with a dense crowd. C. The randomness or clutter in the background is high

For example, the following figure compares the detection-based and regression-based methods. We found that if the number of people in the picture exceeds 20, the detection score of the detection-based method will be lower than 0.3, but for the regression method, with the As the number of people increases, the number of people counted also increases

 

(2) The requirements for training data are high, and the specific location of the target needs to be marked.

Example: (1) Make a rectangular frame from head to toe (2) Only put a square frame on the person's head (3) Recognize, without a frame, write a number above the person's head Indicates the number of people passing by

Surveillance video traditional crowd segmentation based on motion information to count

Blog URL: https://mp.weixin.qq.com/s?__biz=MzI1

Compared with discrete images, video contains motion information, so most video-based crowd counting algorithms are generally divided into three steps: 1) foreground segmentation; 2) feature extraction; 3) population regression. Next, these three steps will be described in detail. 1) Foreground Segmentation The purpose of foreground (crowd) segmentation is to separate the crowd from the image to facilitate subsequent feature extraction . The quality of the segmentation performance is directly related to the final counting accuracy, so this is an important factor that limits the performance of traditional algorithms . Commonly used segmentation algorithms include: Optical Flow, Mixture of Dynamic Textures, Wavelets, etc. The disadvantage of this motion-based foreground segmentation algorithm is obvious: if the person in the video is standing still, then the still person will be classified into the background , thus affecting the performance of crowd counting. (Foreground extraction is also a kind of feature engineering) 2). After the foreground segmentation is completed, the feature extraction is followed by extracting various low-level features (Low-level Features) from the segmented foreground (crowd). Commonly used features There are: Area and Perimeter of Crowd Mask, Edge Count, Edge Orientation, Texture Features, Minkowski Dimension, etc. 3), the number of regressionThe purpose of this step is to regress the features extracted in the previous step to the number of people in the image. The regression can be simple linear regression or complex nonlinear regression. Commonly used regression methods are: Linear Regression, Piece-wise Linear Regression, Ridge Regression, Gaussian Process Regression, etc. Next, let us learn more about the entire algorithm process through a representative paper Privacy Preserving Crowd Monitoring: Counting People without People Models or Tracking. published in CVPR08. Figure 3. Shows a classic video crowd counting system: First, the moving crowd is segmented by mixing dynamic textures.

 

Due to the perspective, people close to the camera occupy more pixels in the image than those far away from the camera. The author introduces Perspective Normalization to the crowd.

 

 

Calibrate a ground plane (Ground Plane), measure the height $$h_{1}$$ of the person on the line $$ab$$ in front of the horizontal direction. Then measure the height $$h_{2}$$ on the line $$cd $$ behind the horizontal plane. Multiply the pixels of $$ab$$ and $$cd$$ by weight 1 and $$\frac{h_{1}|\overline{ab}|}{h_{2}|\overline{cd}|} $$., the weight of the pixel in the middle is obtained by linear interpolation between these two lines . Next, various features (area, perimeter, edge direction, texture) are extracted on the normalized crowd block (Crowd Blob). Finally, Gaussian process regression is used to regress the extracted features to the number of people in the image. Mean absolute error (MAE) and mean square error (MSE) are commonly used standards to measure the performance of algorithms. The former characterizes the accuracy of the algorithm, while the latter characterizes the stability of the algorithm. The definitions of the two are as follows:

 

Where $$N$$ is the number of pictures tested (number of video frames) $$z_{i}$$ is the actual number of people in the $$i$$th picture, $$\hat{z_{i}}$$ is the algorithm Estimated number of people

Advantages: The traditional crowd segmentation algorithm based on motion information also has its own advantage, that is, it can count the number of people flowing in different directions in the video, which is difficult to achieve with a convolutional neural network that uses a single image as input.

FairMOT model under Baidu Paddle (not running yet)

code:

  • How the code runs is very detailed (jupyer notebook): https://aistudio.baidu.com/aistudio/projectdetail/2421822

  • Written in more detail than the following: https://aistudio.baidu.com/aistudio/projectdetail/4171185

  • PaddlePaddle/PaddleDetectionhttps://github.com/PaddlePaddle/paddledetection#%E4%BA%

  • PP-Human real-time pedestrian analysis of the whole process: very detailed writing, how to install, how to run are well written: https://aistudio.baidu.com/aistudio/projectdetail/3842982 ——;

  • Real-time pedestrian analysis PP-Human - https://toscode.gitee.com/zyt111/PaddleDetection/tree/release/2.4/deploy/pphuman

  • Use Baidu AI to realize video traffic statistics (static + dynamic) code and effect demonstration  https://blog.csdn.net/weixin_419

  • Based on PaddleDetection to realize human flow statistics and human detection https://blog.csdn.net/m0_63642362/article/details/121434604

(The article is very detailed, and all parts are perfect, very good!)

People counting task needs to be in

(1) While detecting the category and location information of the target

(2) Identify the correlation information between frames to ensure that the same person in the video will not be identified and counted multiple times . In this case, the FairMOT model in the PaddleDetection target tracking algorithm is selected to solve the problem of people flow statistics.

(3) FairMOT is based on Anchor Free's CenterNet detector ,

(4) The deep and shallow feature fusion enables the detection and ReID tasks to obtain the required features respectively , realizes the fairness between the two tasks, and obtains a higher level of real-time multi-target tracking accuracy.

(5) According to the different shooting angles (flat angle or depression angle) and the density of personnel, different training methods are designed in this case:

  • For scenes with relatively sparse people: based on Caltech Pedestrian, CityPersons, CHUK-SYSU, PRW, ETHZ, MOT16 and MOT17 data sets, it performs full-body detection and tracking of pedestrians in the scene. As shown in Figure 2, the model will identify the pedestrians detected in the scene, and display the number of pedestrians in the scene in the upper left corner of the frame to realize the statistics of pedestrian flow.

 

  • For scenes with relatively dense population: the occlusion problem between people will be very serious. At this time, if you choose to detect pedestrians as a whole, the missed detection rate will increase. Therefore, the head tracking method is used in this scene . Based on the HT-21 data set for training, head detection and tracking are performed on pedestrians in the scene , and the statistics of the flow of people are counted based on the detected heads, as shown in Figure 3.

 

model selection

PaddleDetection mainly provides three models for multi-target tracking algorithms, DeepSORT, JDE and FairMOT.

  • DeepSORT (Deep Cosine Metric Learning SORT) (1) extends the original SORT (Simple Online and Realtime Tracking) algorithm, (2) adds a CNN model for extracting features in the human body part image limited by the detector , in depth On the basis of the appearance description , the appearance information is integrated , and the detected targets are assigned and updated to the existing corresponding tracks to perform a ReID re-identification task. The detection frame required by DeepSORT can be generated by any detector, and then the saved detection results and video pictures can be read in for tracking prediction. For the ReID model, select the model provided by PaddleClas here .PCB+Pyramid ResNet101

  • JDE (Joint Detection and Embedding) is (1) simultaneously learning target detection tasks (Anchor Base's YOLOv3 detector) and embedding tasks (ReID branch learning embedding ) in a single shared neural network , (2) and simultaneously output detection The algorithm that matches the result with the corresponding appearance embedding . A model has two outputs, so the training process is framed as a multi-task joint learning problem. The advantage of doing this: taking into account both accuracy and speed.

  • FairMOT [Finally use this] (1) Based on Anchor Free's CenterNet detector, (2) Overcome the anchor and feature misalignment problem in the Anchor-Based detection framework, (3) Deep and shallow feature fusion makes detection and ReID tasks Each obtains the required features, (4) and uses low-dimensional ReID features, (5) proposes a simple baseline composed of two homogeneous branches to predict pixel-level target scores and ReID features , and realizes the two tasks. The fairness among them is obtained, and a higher level of real-time multi-target tracking accuracy is obtained. In terms of accuracy and speed, here we choose the FairMOT algorithm for people counting/human detection.

Faster RCNN

SOTA model in the Detection field

It can be done, but I didn't find any code to do it

SSD

YOLOv5+deepsort+Fast-ReID

SOTA model in the Detection field

https://blog.csdn.net/zengwubbb/article/details/113422048

The content of the following two articles is the same, the code is very rich and can be used directly

yolov5 pedestrian detection + deepsort tracking tracking + in the case of Fast-ReID occlusion can better prevent reid model misidentification

——Task: Count the total number of people who have appeared in the camera, and count the pedestrians crossing the custom yellow line

——Model: yolov5 implements pedestrian detection + deepsort for tracking,

——Code address: (It is said that it can be run directly)

Only for tracking and identification: https://github.com/zengwb-lx/Yolov5-Deepsort-Fastreid

Do counting after tracking and identification—— https://github.com/zengwb-lx/yolov5-deepsort-pedestrian-counting

  How to solve the above two errors: https://blog.csdn.net/qq_35054151/article/details/118815485

——Example:

 

OpenCV traditional machine learning for traffic statistics, does not involve deep learning

There is code: https://blog.csdn.net/qq_35054151/article/details/118815485

Haar Cascade people Detection Algorithm

Learning a cascade function from positive and negative examples It is an ML-based approach where a cascade function is trained from a lot of positive and negative images.

Computer Vision — Detecting objects using Haar Cascade Classifier https://towardsdatascience.com/computer-vision-detecting-objects-using-haar-cascade-classifier-4585472829a9

OpenCV background subtraction

In all these cases, first, you need to extract the person or vehicles alone. Technically, you need to extract the moving foreground from static background. She is faster , which is more suitable for real-time people recognition. It is a relatively faster method for real-time people detection.

Opencv has implemented three such algorithms:

  1. BackgroundSubtractorMOG

  2. BackgroundSubtractorMOG2

  3. BackgroundSubtractorGMG

How to Use Background Subtraction Methods:https://docs.opencv.org/3.4/d1/dc5/tutorial_background_subtraction.html

Simple HOG detection

It was proposed in 2004 that HOG is a feature descriptor that can count the number of gradient-oriented occurrences of a certain location in the picture. HOG (Histogram of Gradients) is a type of “feature descriptor”. The technique counts occurrences of gradient orientation in localized portions of an image and thereby in a video.

Histogram of Oriented Gradients explained using OpenCV:https://learnopencv.com/histogram-of-oriented-gradients/

HOG with linear SVM algorithm

The accuracy of the Simple HOG detection method can be further improved by using an SVM classifier to classify positive and negative features from sample images.

 

HOG+Fourier+POI

CVPR2013的论文Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images

A method for estimating the number of people in a single image with very dense crowds using Multiple Sources of Information. The author first divides a picture into many smaller regions

 

The number of people in each area is then estimated from three different and complementary sources, and the final number of people in this image is the sum of the number of people in each area. The three sources are: 1) HOG based Head Detections; 2) Fourier Analysis; 3) Interest Points based Counting. Finally, the number of people in each region is obtained from the above three sources.

Since the difference in the number of people in the adjacent areas of the actual image is not very large, the author uses the Markov Random Field to make the estimated number of people in the adjacent areas smoother. Figure 6. Shows the difference in the number of people in each region before and after MRF.

SIFT

don't know the details

C entroid Tracking Algorithm target tracking - for video instead of pictures

(He provided the code, I didn't read it in detail, I don't know if it can run - he sells paid courses, maybe you can watch the content for free when you register for the first time, if you don't directly read the code now, don't register first, don't waste it once chance to learn for free)

https://pyimagesearch.com/2018/07/23/simple-object-tracking-with-opencv/

The importance of Object tracking for people counting person counters :

After the Object tracking is completed, the people will be counted: Object tracking allows us to apply a unique ID to each tracked object, making it possible for us to count unique objects in a video.

A good object tracking algorithm has several characteristics An ideal object tracking algorithm will:

  • Only one target detection is required (??? Why is it only needed once? Don’t new people need to be re-identified when they enter the camera recording range?): Only require the object detection phase once (ie, when the object is initially detected)

  • 必须快:Will be extremely fast — much faster than running the actual object detector itself

  • If the tracked object disappears within the line of sight, it must be handled: Be able to handle when the tracked object “disappears” or moves outside the boundaries of the video frame

  • Robust enough Be robust to occlusion

  • Be able to pick up objects it has “lost” in between frames

Operation process of The centroid tracking algorithm

centroid tracking as it relies on the Euclidean distance between (1) 现存的质心existing object centroids质心 (i.e., objects the centroid tracker has already seen before) and (2)新的质心 new object centroids between subsequent frames in a video.

Step #1: Draw the box Accept bounding box coordinates and calculate the center of mass coordinates compute centroids

(1) Put a box on the object, and the coordinates of the four corners of the box are naturally known (the methods for target detection include: color thresholding + contour extraction, Haar cascades, HOG + Linear SVM, SSDs, Faster R-CNNs)

(2) According to the coordinates of the four corners of the box, calculate the coordinates of their respective centroid points (that is, the coordinates of the centers of the two boxes)

 

Step #2: Calculate the distance between the old point and the new point, and determine which pair of points looks like after moving Compute Euclidean distance between new bounding boxes and existing objects

Calculate the distance between the centroid of the new bounding box and the centroid of the old bounding box.

In the figure below, purple is the centroid position of the target that has appeared in Step1 , and there are two. Yellow is the position of the centroid of the bounding box of the centroid point of the newly appeared target in the second picture .

The upper two yellow dots are the moved positions of the previous purple dots. The yellow dot below is the new target object.

Taking the two purple points that have appeared in Step1 as the center, calculate the distance from each point to the three new yellow points, which are represented by green and red respectively

 

In order to judge whether there is a relationship between the new centroid points new object centroids (yellow) and old object centroids (purple), we center the existing object centroids and calculate the distance between them and the new object centroid. This distance is Euclidean distances. The distance between the purple point in the upper left corner and the other three yellow points is shown in red, and the distance between the purple point on the right and the other yellow points is shown in green.

 

Step #3: Update the ID coordinates of each point, discard the past coordinates, and add the new coordinates. Update (x, y) -coordinates of existing objects

What if it is judged that the old point and the new point are associated with each other ? (How do you know which new point corresponds to which previous point?) Find the yellow point closest to the Euclidean distance of each old point (purple point), the nearest yellow point, indicating that they are associated with the purple point . Explain that the yellow dot is the next movement track of the purple dot.

 

using the minimum Euclidean distances to associate existing object IDs and then registering a new object.

For the bottom left point, it did not appear in the first picture, but it appeared in the second picture. (1) Assign a new object ID, Assigning it a new object ID (2) Save the position of the centroid Storing the centroid of the bounding box coordinates for that object

 

Step #5: Deregister ID_Deregister old objects if there are points that are not there now

There are things in the first picture, but not in the second picture, which need to be removed

Any reasonable object tracking algorithm needs to be able to handle when an object has been lost, disappeared, or left the field of view.

Limitations and drawbacks

(1) It requires that the object detection step to be run on every frame of the input video, and the calculation cost is very high. It requires that object detection step to be run on every frame of the input video.

The centroid object detector used here is very slow , incomparable with the following fast object detectors, color thresholding and Haar cascades. But once you use the following detectors that consume a lot of computing power, a significantly more computationally expensive object detector, such as the following algorithms such as HOG + Linear SVM or deep learning-based detectors, especially On a resource-constrained device such as Raspberry Pi, your whole frame-making steps will slow down. Your frame processing pipeline will slow down enormously . It may prevent your computer vision pipeline from running in real- time .

(2) It does not handle overlapping objects, the assumption of your centroid tracking algorithm may be wrong the underlying assumptions of the centroid tracking algorithm itself — this assumption is "two sequential boxes, their centroids must lie Very close" centroids must lie close together between subsequent frames.

If there are two objects, one is close to the camera and the other is far away from the camera, the two just look close on the picture, but they are actually far away, and they are two completely different things, what happens when an object overlaps with another one ?

object ID switching could occur:If two or more objects overlap each other to the point where their centroids intersect and instead have the minimum distance to the other respective object, the algorithm may (unknowingly) swap the object ID.

Other tracker (detection+tracker) recommendations:

https://pyimagesearch.com/2018/07/30/opencv-object-tracking/

  1. BOOSTING tracker

    1. Its algorithm is based on AdaBoost, and the underlying principle of the Haar cascade algorithm is the same, but this is an old algorithm 10 years ago, and the speed is slower. It is an old algorithm ten years ago.

  2. MIL tracker

    1. The accuracy rate is higher than the previous one , but the error reporting is not good . Better accuracy than BOOSTING tracker but does a poor job of reporting failure.

  3. KCF tracker

    1. Full name: Kernelized Correlation Filters . It is faster than the above two , but the mutual shading and overlap between objects are not handled well . Faster than BOOSTING and MIL. Similar to MIL and KCF, does not handle full occlusion

  4. CSRT tracker、

    1. Full name: Discriminative Correlation Filter with Channel and Spatial Reliability. The accuracy rate is higher than KCF , but a little slower. Tends to be more accurate than KCF but slightly slower.

  5. MedianFlow tracker

    1. Reporting failures does a nice job reporting failures here; however, if there is too large of a jump in motion, such as fast moving objects, or objects that change quickly in their appearance, the model will fail.

  6. TLD tracker

    1. Not recommended

  7. MOSSE tracker

    1. The speed is very fast, but the accuracy is not as good as CSRT and KCF Very, very fast. Not as accurate as CSRT or KCF but a good choice if you need pure speed.

  8. GOTURN tracker

    1. This is the only deep learning-based model in the model provided by OpenCV

My personal suggestion is to:

  • Use CSRT when you need high tracking accuracy higher object tracking accuracy and can tolerate slower FPS ( Frames Per Second ) throughput

  • Use KCF when you need faster FPS throughput but can handle slightly lower object tracking accuracy

  • Use MOSSE when you need pure speed

06-MobileNet-ssd

I can’t read paid articles, but more than 10,000 people have bought them https://blog.csdn.net/xiao__run/article/details/93196347

YOLO Detector or RCNN

The traditional machine learning method is HOGtraditional HOG-based detector or

The method of deep learning is YOLO and RCNN deeplearning-based detector like YOLOs or RCNNs

(2) Regression-based is closer to now

Number regression

Used more in 2008-2014

The reason for the appearance: detection based methods do not perform well in the following situations, when there is a (1) dense crowd crowd occlusion problem is serious and the (2) randomness or clutter in the background is high. The regression-based methods are designed to solve the previous two problems because they can extract low-level features, such as edge values ​​and foreground pixels.

What is it: directly predict the number of people on the way by regression based on the picture

(S1) Crop the picture and patch each small fragment

(S2) Capture some features from original images from these patches. [ 1 ] Global features include (texture features texture-, gradient features gradient, edge features edge features). ;[ 2 ] Local feature local feautes (SIFT, Local Binary Pattern LBP, Histogram Oriented Gradient HOG, Gray Level Co-occurrence Matrix GLCM). ; [ 3 ] Use background subtraction to extract foreground features from foreground segments[ 4 ] Manual design And extract various features (Hand-crafted Features

——I don’t know whether these features are local or global, but they are also features. They are placed here foreground pixels-foreground features-the scene on the screen that seems closest to the viewer. For example, contours represent contour representations, shapes represent shape descriptions, and gradient characteristics

——Feature construction is particularly important for the regression-based method. The main work of scholars is to find more effective features

(S3) Use machine learning methods to establish the mapping relationship between these features and the number of people. use machine-learning models to map the relation between features and numbers, and map the scalar number of people from the image matrix. Regression methods are able to directly map the input images to scalar values.; some use linear regression linear regression, piecewise linear regression, ridge regression and Gaussian process regression and neural network

———Pay attention to the second and third steps, the process of extracting features + building a map, which is a process of building a map, can be implemented with CNN. We call this method an end-to-end regression method. Realizing this idea with code is done by the first place in the counting competition: https://www.kaggle.com/competitions/noaa-fisheries-steller-sea-lion-population-count/overview, his model The structure is, using VGG-16 as a feature extractor used VGG-16 as the feature extractor and the last layer of the fully connected layer is a regression layer created a fully connected architecture with the last layer being a regression layer

There are two methods of crowd counting based on CNN

——In the scene where the crowd is relatively sparse, direct regression counting method, input (picture)—CNN network—>outout (number of people)

——When the crowd is relatively dense, the density map estimation method, input (picture) -> output (density map) -> estimated number of people. The prediction is only as good as the quality of the density map.

Advantages: are successful in dealing with the problems of (1) occlusion and (2) background clutter clutter (I don't know what it is) (3) compared to detection based method, avoiding the dependence on learning detectors

Disadvantages:

(1) unable to correctly understand crowd distributions / ignore spatial information. , locating the location of the crowd --- unable to know the specific location of the crowd individual l ocalise the crowd . This important feature (it can only tell you in general How many people are there in this picture, and you can’t tell you how many people are in the upper left corner and how many people are in the lower right corner. The density based methods successfully solve this problem because it can provide pixel-wise regression as they perform pixel-wise regressions for getting better performance of the model., which coordinates on the density map are marked with color, and you can locate the position of the crowd

(2) In this method, the feature engineering in the early stage determines the upper limit of the model performance, which relies heavily on excellent feature construction capabilities. However, the method of deep learning directly establishes the mapping relationship between the original image and its numbers, which is much better than hand-crafted features, so the regression-based method is slowly abandoned

(3) This method is very sensitive to image resolution. If the image resolution is low, the regression effect will be poor.

(4) There may be cases of poor robustness, because its network learning features are not strongly correlated with people, and when the test environment changes, it is easy to get out of control, and the interpretability is also poor.

Comparison of three methods

​​​​​​​

 

(To add to you, I didn't understand either)

The article mentioned that there is no single method to calculate the picture counting problems caused by low resolution, severe occlusion, perspective and perspective, and found a mathematical spatial relationship that can constrain the counting estimation of adjacent areas, so by looking at the crowded crowd Form irregular and non-uniform textures , use Fourier analysis and head detection , and filter information points in the vicinity, combine Fourier, information points, and head detection, calculate in the local patch, and in Global constraints within a multiscale MRF framework

When sparse and imbalanced data are encountered, cumulative attribute representations for learning regression patterns are proposed to map features into a cumulative attribute space.

Guess you like

Origin blog.csdn.net/Albert233333/article/details/130433437