Summary and detailed introduction of crowd counting datasets, the most complete in the whole network, crowd counting datasets

Crowd Counting dataset summary

Video surveillance=video surveillance

https://github.com/gjy3035/Awesome-Crowd-Counting/blob/master/src/Datasets.md Progress | Dense crowd distribution detection and counting: https://www.sohu.com/a/338406719_823210

Free-view

2022_Pedestrian Attribute Recognition

https://github.com/wangxiao5791509/Pedestrian-Attribute-Recognition-Paper-List ; https://paperswithcode.com/task/pedestrian-attribute-recognition/

Released on September 15, 2022

The original task: Pedestrian Attribute Recognition (this is actually an advanced task of crowd counting, user portraits of traffic flow in business circles), each map is marked with a wealth of pedestrian-related information, such as the length of a person's hair, whether he wears glasses , What color are the socks...

Can take his data set and do my count

2022_FUDAN-UCC

2022, Congested, Unlabeled

Note that some irrelevant photos, not a continuous video

I'm curious how people count them? Did I count it myself with the data set, or did the author of the data set provide it?

press photos

sample

some black and white old photos

Some color photos, this one is very crowded

Color photos, not many people

Some news illustrations are artificially drawn pictures with many people in them

Manually drawn illustration with few people

2021_RGBT-CC

Congested, Thermal heat map

data sample

http://lingboliu.com/RGBT_Crowd_Counting.html

2020_NWPU-Crowd

(Compared with the newly released data sets, the list rankings are still being refreshed)

https://gjy3035.github.io/NWPU-Crowd-Sample-Code/

There are many people on the Congested picture, Localization

[146] proposed NWPU-Crowd. It is currently the largest dataset for crowd counting with 5109 images and 2133238 labeled entities. In addition, this dataset also contains some negative samples, which help to enhance the robustness of the model. Furthermore, it contains various lighting scenarios with the largest density range [0,20033].

If you want to make a list: there are 5,109 images, and the picture is a crawl from the Internet; you can submit 6 times a month, and the model will select the one with the best performance in the verification set as the final result; the list is updated every week, and the newly submitted The performance of the model will be updated.

Each of his pictures has annotations, some are marked with boxes, and some are marked with points. The annotations are not directly marked on the picture, but are recorded in the json file mat, etc.

 

 

data sample

2020_JHU-CROWD++

http://www.crowd-counting.com/

Congested

 

 

 

 

 

2018_UCF-QNRF

https://www.crcv.ucf.edu/data/ucf-qnrf/

Congested

People are very dense, count the number

(Discrete pictures) 2016_ShanghaiTech Part A

Other codes using this dataset: https://paperswithcode.com/dataset/shanghaitech

Dataset first model paper: Single Image Crowd Counting via Multi Column Convolutional Neural Network

Part A has 482 images randomly grabbed from the Internet, the angle is free view, completely irrelevant pictures, not continuous

Crowd counting/pedestrian counting

The shanghai tech A+B dataset contains 1198 labeled images and 330,165 labeled heads, divided into two parts a and b. Part A contains 482 images, of which 300 are in the training set and 182 in the test set. These pictures are collected from the internet. Part B is 400 training images and 316 testing images taken on the urban streets of Shanghai. Compared with District A, the population density of District B is relatively small. This dataset covers multiple scenes and different density levels. The ShanghaiTech dataset is a very challenging dataset on which recent crowd counting work is based for comparison.

——What are the advantages ?

  —— More images: 1,198 images

   - Marked by dots. People are too dense, and the bounding boxes drawn will overlap in a large pile, and it is impossible to see clearly, let alone count. So a dot is marked in the center of each person's head, indicating that each person in a crowd image is annotated with one point close to the center of the head.

  —— There are more annotations on people's marks: the dataset consists of 330,165 annotated people

  ——Shooting angle: The shooting angle and clarity of each picture are different different perspectives and resolutions

   ——Then count the number of people in the ground truth, and write down how many people are counted by your model

- sample data set

The number of people came out

(Discrete images) 2013_UCF_CC_50

https://www.crcv.ucf.edu/data/ucf-cc-50/

Congested, extremely dense crowds.

First model paper:, Multi-Source Multi-Scale Counting in Extremely Dense Crowd Images

The UCF_CC_50 dataset consists of 50 images of different resolutions. Each image contains an average of 1280 people ( extremely dense ). The entire dataset includes a total of 63075 people. The number of individuals in each image ranges from 94 - 4543, and some images contain very dense crowds. The data set also contains a variety of scenes such as concert halls, demonstrations, and stadiums.

Surveillance-view

2022_VSCrowd

(New data set released in September 2022,)

https://github.com/HopLee6/VSCrowd-Dataset

首发论文:"Video Crowd Localization with Multi-focus Gaussian Neighborhood Attention and a Large-Scale Benchmark

634 videos, Localization

For each video, the annotations are saved in the corresponding TXT file (train_XXX.txt or test_XXX.txt), with the following format:

 
 

FrameID HeadID x1 y1 x2 y2 p1 p2 HeadID x1 y1 x2 y2 p1 p2 …

FrameID HeadID x1 y1 x2 y2 p1 p2 HeadID x1 y1 x2 y2 p1 p2 …

2020_DISCO

https://zenodo.org/record/3828468

Audiovisual (with sound), extreme conditions

There are at most 709 people in a picture, and at least 88 people

The reason why the dimension of sound is added is because the louder the sound of a person's place, it means that the person is at the louder we perceive the ambient sound to be, the more people there are

Original paper: https://arxiv.org/abs/2005.07097

Photo sample --

The data set is relatively large, and it will take a long time to download 4G

2019_Crowd Surveillance

Fill in your personal email address to download: https://ai.baidu.com/broad/download

Free scenes

The picture definition is much higher than other datasets the largest dataset with the highest average resolution for crowd counting at present. Everyone has made a mark. Some of the data is taken from the security surveillance video of the partner, and some is crawled from the Internet by crawlers.

2019_ShanghaiTechRGBD

Download address: https://github.com/svip-lab/RGBD-Counting

It is a 3D map of RGBD, and a depth map with the dimension of Depth

annotation is marked with mat file

 
 

ShanghaiTechRGBD/ ├── train_data/ ├── train_img/*.png ├── train_depth/*.mat └── train_gt/*.mat

Sample + Population Characteristics

The ones with more people and fewer people are all taken from the sky looking down

2019_Fudan-ShanghaiTech

(3) Download address: https://github.com/sweetyy83/Lstn_fdst_dataset; https://drive.google.com/drive/folders/19c2X529VTNjl3YL1EYweBg60G70G2D-w

in video form

100 videos captured from 13 different scenes,

Dataset Features:

A video takes screenshots of each frame from the inside, it is continuous, and it is a photo from the sky.

2019_GCC

400 Fixed Scenes, Synthetic

https://gjy3035.github.io/GCC-CL/

The GCC data set based on the game scene of GTA5 has rich scenes and a large number of pictures, many of which are artificially synthesized

The images in GCC are taken from GTA V. Wang et al. designed a data collector and tagger to capture in-game stable images and their head tags. The dataset covers 400 scenes, and the individuals in the scenes have different skin colors, genders, appearances, etc. The authors also use a step-by-step approach to break the limit on the maximum number of individuals in an image. This is the largest dataset for crowd counting, both in terms of sample size and scenes covered. Use it to pre-train the model, then fine-tune it for pattern analysis and application 1 3 , and usually achieve better counting performance for models with real data.

(1) Selection of scenes

Based on GTA5, 100 locations are selected, and then each location changes the pose of the camera to finally obtain 400 scenes, and then obtain different pictures through different ROIs

(2) Design of human model

265 models of people are selected, and each model has different skin color, gender, shape, and then 6 different appearance parameters such as clothes and hairstyles, and uses random actions in the scene

(3) Scene synthesis

In the original GTA5, there are no more than 256 people in each scene. In order to generate a picture with a large number of people, scene synthesis is required, which is to put together people in different scenes

2019_Venice

Various models hit this dataset: https://paperswithcode.com/sota/crowd-counting-on-venice

I can't find the picture sample, and I can't find the explanation

Video taken by a camera on top of a square in Venice (same location), photos taken from the video, the photos are continuous

Where are the people, how many people there are, annotation, the author uses the mat file to store

2019_CityStreet

Multi-View Crowd Counting Dataset

http://visal.cs.cityu.edu.hk/research/citystreet/

The same place, at the same time, he used cameras in different positions to take pictures of the same place, and the pictures he got

2019_Beijing-BRT

the ground-truth locations of the i-th person in this image, where each row is the location[x,y]

The place where the bus is waiting for the bus, count the number of people

2018_SmartCity

https://paperswithcode.com/dataset/smartcity

First paper: Crowd counting via scale-adaptive convolutional neural network

50 images in total collected from ten city scenes including office entrance, sidewalk, atrium, shopping mall etc..;特点就是人少: the average number of pedestrians is only 7.4 with minimum being 1 and maximum being 14.

Discrete, unrelated images, not continuous

 

2017_CityUHK-X

http://visal.cs.cityu.edu.hk/downloads/#cityuhk-xCityUHK-X: crowd dataset with extrinsic camera parameters

First paper: 2017_Cite=57_Incorporating Side Information by Adaptive Convolution.

There are some people in the same scene, I shoot the same scene at the same time from different heights and angles, and then do counting

The upper left corner of the picture is the angle and height of the camera

2016_ShanghaiTech Part B

https://www.datafountain.cn/datasets/5670

Part B has 716 images taken from the bustling streets of the Shanghai metropolis, which are surveillance videos

The picture is similar to the above partA''

2016_AHU-Crowd

http://cs-chan.com/downloads_crowd_dataset.html;https://drive.google.com/file/d/1pN35I5MmJA4Ase2dZRdcwsFiOM286fXc/view

Crowds in multiple scenarios: pilgrimage, station, marathon, rallies and stadium

Ground truth is a txt file

old crowd pictures

(Video)2015_WorldExpo 10

http://www.ee.cuhk.edu.hk/~xgwang/expo.html

First paper: 2015_Zhang_Cite=1148_Cross-scene Crowd Counting via Deep Convolutional Neural Networks

The WorldExpo 10 dataset [14] consists of 3980 576 × 720 video frames with a total of 199,923 labeled pedestrians. Its training set comes from 1127 1-minute video sequences in 103 scenes, and its test set comes from 5 1-hour video sequences in 5 different scenes. Each test scene contains 120 frames of images, and the number of individuals in each frame is between 1 and 220.

——What kind of "people counting" task did you do ? What are the characteristics?

——Mark the crowded crowd with a polygon, add color, how many people are there, it has no mark. When doing segmentation, a total of so many polygons with different degrees of crowding were separated from the picture 53 637, crowd segments with polygon boundaries

——Many scenes: cross-scene; 108 scenes 108 Fixed Scenes Surveillance videos captured by 108 cameras at Shanghai World Expo 108 surveillance cameras, all from Shanghai 2010 WorldExpo

——What are the advantages ?

——Rich scenes: these scenes of crowds, roads, queues, squares, the front few mixed

—— Bounding box marks more: 3,980 annotated frames

(1) Whether it is a continuous screenshot of a similar video frame by frame: I don’t know if I haven’t seen the original data set

(2) Whether the number of people in the photo is counted - no

(3) He didn't count the number of people, so can I count them manually? ——Can’t count, too many people, too dense, can count but it takes a long time, it’s possible to count too many or too few

(4) Whether the bounding box is marked - a dot indicates that this is a person, and a polygon is drawn in a crowded area to mark that there are many people in this place.

 

(Video) 2012_Mall

http://personal.ie.cuhk.edu.hk/~ccloy/downloads_mall_dataset.html

collected from a publicly accessible webcam

People have crosses on their heads

The Mall dataset consists of 2000 320 × 230 video frames containing 6000 labeled pedestrians. Annotated individuals are provided by labeling pedestrian heads for all frames. Compared with the USCD dataset, the Mall dataset has higher crowd density and richer scenes.

(Video) 2008_UCSD

http://www.svcl.ucsd.edu/projects/peoplecnt/; http://www.svcl.ucsd.edu/projects/peoplecnt/demo.htm

The people who come in and out are counted, and the standard deviation between your forecast and the actual is counted. Red, yellow and green indicate the accuracy of the estimate

The UCSD dataset contains 2000 frames in a video sequence, and every 5 frames corresponds to a ground truth. It was filmed by stationary cameras mounted on sidewalks, so the scene is relatively devoid of variety. In addition, the density of people on sidewalks also varies from sparse to crowded. This dataset is the first dataset created in Crowd Counting . Due to the early release of the data set, there are many limitations in the data set, such as a single image collection location and an unavoidable single scene. The data distribution does not match many real-world scenarios and is not suitable for more general applications.

Drone-view

https://github.com/VisDrone/DroneVehicle

2020_DroneVehicle

high altitude - shooting of cars

2019_DroneCrowd

The VisDrone2019 dataset collected by Tianjin University is collected by the AISKYEYE team at Lab of Machine Learning and Data Mining , Tianjin University, China

The picture is not a continuous video, it is some irrelevant pictures

People photographed at high altitude, the size of the person looks very small

There are also larger sizes

2019_DLR-ACD

https://www.dlr.de/eoc/en/desktopdefault.aspx/tabid-12760/22294_read-58354/

It contains 33 large aerial images acquired through 16 different flight campaigns at various mass events and over urban scenes involving crowds, such as sport events, city centers, open-air fairs and festivals.

Compare pictures taken from above.

Other tasks you do can be used for counting

Caltech Pedestrian Identification

Caltech Pedestrian Detection Benchmark: http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/

The Caltech Pedestrain dataset is provided by Caltech with video captured by cameras mounted on vehicles routinely driving in urban environments . The dataset contains about 10 hours of 640x480 30Hz video with 350,000 bounding boxes and 2300 pedestrians annotated in about 250,000 frames (about 137 minutes of clips). For more information, please refer to: Caltech Pedestrain Detection Benchmark

The data set is a video, and pedestrians are marked with boxes

Image credit: Caltech Pedestrian Detection Benchmark

CityPersons - Identification

CityPersons: https://github.com/cvgroup-njust/CityPersons

The CityPersons dataset is a dataset specially established in the field of pedestrian detection based on the CityScapes dataset. It selects 5,000 finely-labeled pictures in CityScapes and marks the pedestrians in them with bounding boxes. The training set contains 2975 images, the validation set contains 500 images, and the test set contains 1575 images. The average number of pedestrians in the picture is 7, and the annotations provide full-body annotations and visible area annotations. For more information, please refer to: CityPersons

CUHK-SYSU identification

CUHK-SYSU: http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html

CUHK-SYSU is a large-scale people search benchmark dataset, which contains 18184 images with 8432 pedestrians, and 99,809 annotated bounding boxes. According to the source of the image, the data set can be divided into two parts: the collection in the street scene and the collection in the film and television drama. In the street scene, the images are captured by hand-held cameras and contain hundreds of scenes, trying to include as many different perspectives, lighting, resolutions, occlusions and backgrounds as possible. Another part of the data set is collected from film and television dramas, because they can provide more diverse scenes and more challenging perspectives.

This dataset provides annotations for pedestrian detection and person re-identification. Each queryer appears in at least two images, and each image can contain multiple queryers and many more other people. The dataset is divided into training set and test set. The training set contains 11206 images and 5532 query persons, and the test set contains 6978 images and 2900 query persons. For more information, please refer to: End-to-End Deep Learning for Person Search

PRW identification+counting

PRW: https://github.com/liangzheng06/PRW-baseline

PRW (Person Re-identification in the Wild) is a person re-identification dataset. The data set was collected at Tsinghua University, and a total of 10 hours of video were collected through six cameras . The dataset is divided into training, validation and test sets. The training set contains 5134 frames and 482 IDs, the validation set contains 570 frames and 482 IDs, and the test set contains 6112 frames and 450 IDs. Counting is done: all pedestrians appearing in each frame will be marked with a bounding box and assigned an ID. For more information, please refer to: PRW

ETHZ-Identification

ETHZ: https://data.vision.ee.ethz.ch/cvl/aess/dataset/

The ETHZ dataset --- video --- was captured by a pair of onboard AVT Marlins F033C cameras at 640x480 resolution at 13-14 fps. The data set gives the original image, calibration information and pedestrian annotation information. For more information, please refer to: ETHZ

MOT16-Recognition and Tracking of Pedestrians

MOT16: https://motchallenge.net/data/MOT16/

MOT17: https://motchallenge.net/data/MOT17/

The MOT16 dataset is a dataset proposed in 2016 to measure the standard of multi-target tracking detection and tracking methods, specifically for pedestrian tracking. Its main marking targets are moving or stationary pedestrians and moving vehicles. Based on MOT15, MOT16 adds more detailed annotations and more bounding boxes. It has richer pictures, different shooting angles and different weather conditions. There are 14 videos in the MOT16 dataset, 7 of which are labeled training sets and 7 are test sets. Because it provides marked detection results, it can save the target detection part and pay more attention to the target tracking part. For more information, please refer to: MOT16

Head Tracking 21

  1. Head Tracking 21: https://motchallenge.net/data/Head_Tracking_21

CrowdHuman

http://www.crowdhuman.org/

useless things

All are high-quality papers on identification, tracking, and counting-data sets are available, and ideas for solving problems can be referred to

(2) UAV shooting, target detection and recognition of people and vehicles

VisDrone2018-test-dev

2018_Zhu_Cite=94_VisDrone-DET2018: The Vision Meets Drone Object Detection in Image Challenge Results (This article summarizes many existing data sets)

http://aiskyeye.com/

——What kind of "people counting" task did you do ? What are the characteristics?

——Recognize pedestrians, cars, motorcycles, tricycles, and box the object

- What is the name of the dataset used (if open source)? What are the advantages ?

——Scene: There are also traffic senario, sports field, high-altitude images, and low-altitude images

- What do the photos of the dataset look like? give an example

The object has been marked with a box, and the number of people needs to be counted by yourself; whether it is a continuous picture, I don’t know

(1) Counting the number of objects (object/pedestrian/crowd counting) (2) Tracking (3) Deduplication

(1) UAV tracking cars and people

2020_Li_Cite=173_AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization

——What kind of "people counting" task did you do ?

—— Tracking drones, cars, people tracking drone, people and cars,

- What is the name of the dataset used (if open source)? What are the advantages ?

——DTB70

   ——various cluttered scenes and objects with different sizes as well as aspect ratios are included.

   ——These photos taken during high-speed motion, the pictures are very blurry, and it is difficult to identify and track them, primarily addresses the problem of severe UAV motion.

——Video clips composed of 70 difficult UAV image sequences,

——annotation多:manually annotated the ground-truth bounding boxes in all video frames.

(1) Whether it is a continuous screenshot of a similar video frame by frame - the probability is

(2) Whether the number of people in the photo is counted - no

(3) He didn't count the number of people, so can I count them manually? ——It can be counted, but the picture is too blurry, so it is a little difficult

(4) Is the bounding box set? - In the example given, a box is set in a picture. I don't know how many boxes are set in a picture of the real data set.

(5) What are the similarities with the task of "people flow counting"? ——Sequential photos, repeated objects appear

(6) What is the difference with the task of "people flow counting"? ——It's a bit mushy, but everything else is fine

- What do the photos of the dataset look like? give an example

Follow people, just tag a

tracking drones

Agricultural scene, raising cattle and sheep, counting

Track people surfing the ocean to see how many

How many people are playing on the basketball court,

(1) UAV target tracking

UAV123@10fps

2016_Mueller_Cite=1100_A Benchmark and Simulator for UAV Tracking

The data set of pictures taken by drones is well summarized by one person: https://zhuanlan.zhihu.com/p/421968291 (There are also these target tracking data sets: OTB50, OTB100, VOT2014, VOT2015, TC128, and ALOV300++)

——What kind of "people counting" task did you do ? What are the characteristics?

——Target detection, object draws a box

- What is the name of the dataset used (if open source)? What are the advantages ?

  ——Scene: There are various scenes

——Task: target detection, draw a box, put bike, building, car, truck, person, wakeboard water skateboard, UAV, boat

(1) Whether it is a continuous screenshot of a similar video frame by frame - I didn't read the original text, I don't know

(2) Whether the number of people in the photo is counted - no

(3) He didn't count the number of people, so can I count them manually? - can be counted

(4) Whether the bounding box is hit - not sure

- What do the photos of the dataset look like? give an example

(-1) Use drones to count vehicles

2020_Yang_Cite=77_Reverse Perspective Network for Perspective-Aware Object Counting

UAV-based Vehicle Counting Dataset

——What kind of "people counting" task did you do ? What are the characteristics?

Vehicle Couting Task counts either drones or cars. I don’t know the details.

- What is the name of the dataset used (if open resources)? What are the advantages ?

  The data is collected by myself, not a public dataset of 800 photos

  Rich scenarios: 50 different scenarios for vehicle counting

  The mark is more detailed-Precise annotations: manually annotated with 32,770 bounding boxes, not the kind of rough point mark instead of traditional point annotations

  There are many perspectives for taking pictures, front-view in high latitude, front-view in low latitude, side-view, and topview.

  Collected pictures in different weather conditions, sunny, raining, fog, night, and raining night (the purpose of doing the above: increase the diversity of the dataset and make it closer to the real traffic circumstances.)

   The frame refers to the bounding box

 

(1) Whether it is a continuous screenshot of a similar video frame by frame - hard to say, it is very likely that it is not continuous (it does not make much sense to put two consecutive pictures with small differences)

(2) Whether the number of people in the photo is counted - no

(3) He didn't count the number of people, so can I count them manually? ——Manual counting is more difficult, because the cars are more dense

(4) Is the bounding box hit——

(6) What is the difference with the task of "people flow counting"? ——(1) The image is a high-altitude image, and the people flow statistics are recorded by a ground camera. (2) The people on the ground look relatively large, and the cars inside are too small

(-1) Identify drones from photos with drones

2021_Ashraf_Cite=11_Dogfight: Detecting Drones from Drones Videos

——What kind of "people counting" task did you do ? What are the characteristics?

——Find the drone in the picture and mark a box

- What is the name of the dataset used (if open source)? What are the advantages ?

- NPS-Drones, collected by the US Navy published by Naval Postgraduate School (NPS)

—— With the naked eye, even if you make the picture full screen, it is difficult to find the location of the drone. So this application scenario is very valuable. Because people can't do this

- Difficulty: The erratic motion, small size, arbitrary shape, large intensity variation, and occlusion of source and target drones make this task quite challenging

   ——More videos contains 50 videos

   ——High definition that were recorded at HD resolutions(1920×1080 and 1280×760)

   ——The size of the object changes greatly. The minimum, average and maximum size of drones are 10 × 8, 16.2 × 11.6, and 65 × 21, respectively.

   ——annotation多:The total number of frames in the dataset are 70250.

- What model are you using ? What is the execution process ? What are the components and what does each component do?

- What do the photos of the dataset look like? give an example

(1) Whether it is a frame-by-frame continuous screenshot of a similar video - the high probability is a frame-by-frame screenshot of a continuous video

(2) Have you counted the number of people in the photo——no, you can count the number of bounding boxes

(3) He didn't count the number of people, so can I count them manually? ——It can be counted, there are less than 10 bounding boxes in a picture.

(4) Whether to hit the bounding box - yes

(5) What are the similarities with the task of "people flow counting"? ——It is a continuous video screenshot, several pictures have repeated drones (there is a need to deduplicate), and the number can be counted

(6) What is the difference with the task of "people flow counting"? ——(1) The objects are too small, and the people counted by people are relatively large and can even see their faces clearly, but the drones are basically the same in appearance, and the details of no one can be seen clearly, so it is difficult to deduplicate this part (2) The camera is flying along with the drone it is on, not fixed in one place to see how many planes have passed by today

(-1) Use drones for target detection and tracking

UAVDT

The Unmanned Aerial Vehicle Benchmark: Object Detection and Tracking", European Conference on Computer Vision (ECCV),

https://sites.google.com/view/grli-uavdt/%E9%A6%96%E9%A1%B5

——What kind of "people counting" task did you do ? What are the characteristics?

——In the video, the target detection of people and cars, mark these objects with boxes

- What is the name of the dataset used (if open source)? What are the advantages ?

  ——There are many scenes and complex complexes, which emphasizes vehicle tracking in various scenarios. The purpose of this is to improve the generalization of the model, so that my model is not only suitable for traffic senario-main roads, but also can be used in forest parks , to the beach

  ——There are many changes in these conditions: Weather condition, flying altitude and camera view have 14 scene change factors. up to 14 kinds of attributes (eg, weather condition, flying altitude, camera view, vehicle category, and occlusion

  ——Photos taken during high-speed motion, very blurry, camera motion

  ——The object inside is very dense, high density

  ——The object inside is very small small object

  ——Selected from 10 hours raw videos,

  ——about 80, 000 representative frames are fully annotated with bounding boxes

(1) Whether it is a continuous screenshot of a similar video frame by frame - the probability is

(2) Have you counted the number of people in the photo—it should be no

(3) He didn't count the number of people, so can I count them manually? ——It can be done, but it takes time and eyes

(4) Whether to hit the bounding box - hit

(5) What are the similarities with the task of "people flow counting"? - don't want to explain

(6) What is the difference with the task of "people flow counting"? - don't want to explain

- What do the photos of the dataset look like? give an example

The senario is marked in the lower right corner, day or night, what's the weather like? Is there any frog fog, rain or not, whether the viewing angle is front view or side view, and whether the altitude is particularly high, medium or low

Target detection, cars and buses are marked

(-1) UAVs are used for identification, tracking and counting

2020_Wen_Cite=21Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

https://github.com/VisDrone/DroneCrowd

(combing existing datasets for crowd recognition and counting); the authors say this is the largest dataset to date, containing head-annotated trajectories for density map estimation, crowd localization, and drone tracking.

——What kind of "people counting" task did you do ? What are the characteristics?

——(1) Crowd counting and estimated density density map estimation

   ——(2) Positioning and tracking of crowds.

- What is the name of the dataset used (if open source)? What are the advantages ?

——wide range of scenarios, e.g., campus, street cameras,park, parking lot, playground and plaza

——Collected by myself, collected with drone-mounted installed cameras.

   ——A lot of Annotation, high definition: The videos are recorded at 25 frames per seconds (FPS) with a resolution of 1920×1080 pixels.

   ——There are many people in the photo, and there are few people: The maximal and minimal numbers of people in each video frame are 455 and 25 respectively. and the average number of objects is 144.8.

   ——标记量巨大:more than 20 thousands of head trajectories of people are annotated with more than 4.8 million head points in individual frames of 112 video

   ——High quality and high cost, 20 industry experts reviewed the annotation for 2 months to complete: Over 20 domain experts annotate and double-check the dataset using the vatic software for more than two months clips. Large amount of data, high quality .

(1) Whether it is a continuous screenshot of a similar video frame by frame - it should be

(2) Whether the number of people in the photo has been counted - there may be

(3) He didn't count the number of people, so can I count them manually? - Too many people to count

(4) Whether to hit the bounding box - no

(5) What are the similarities with the task of "people flow counting"? - don't want to explain

(6) What is the difference with the task of "people flow counting"? ——People are too dense, it is an image taken at high altitude

- What do the photos of the dataset look like? give an example

(1) UAVs recognize human behavior

2021_Li_Cite=47_UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

——What kind of "people counting" task did you do ? What are the characteristics?

——Identify human behavior and classify human behavior

- What is the name of the dataset used (if open source)? What are the advantages ?

——UAV-Human Dataset

- Scenario: It was collected by a flying drone over multiple urban and rural areas day and night over a three-month period

——Tasks: Action recognition, pose estimation, understanding and analysis of human behavior in drone images, and classifying what people in the picture are doing? Shake hands, lock the car...

   - Contains 67,428 multimodal video sequences and

   - 119 objects for action recognition, 22,476 frames for pose estimation, 41,290 frames and 1,144 identities for person re-identification, and 22,263 frames for attribute recognition

   - Covers a wide variety of objects, backgrounds, lighting, weather, occlusions, camera motion and drone attitude

(1) Is it a frame-by-frame continuous screenshot similar to a video—there is a high probability that it is not

(2) Whether the number of people in the photo is counted - no

(3) He didn't count the number of people, so can I count them manually? ——Can be counted, after all, there are very few people

(4) Whether to hit the bounding box - no

- What do the photos of the dataset look like? give an example

Guess you like

Origin blog.csdn.net/Albert233333/article/details/130431598