FLIR dataset explained in detail

Free FLIR hot datasets for algorithm training

Preface

Released in July 2018, the FLIR Thermal Dataset helps developers develop and train convolutional neural networks, enabling the automotive industry to use FLIR’s cost-effective thermal cameras to develop a new generation of safer and more efficient ADAS and autonomous vehicles. system.
Insert image description here

Why does ADAS use FLIR thermal sensing technology?

In ADAS environments, the ability to sense thermal infrared radiation, or heat, provides unique and complementary advantages to existing sensor technologies such as visible light cameras, lidar and radar systems:

With more than 15 years of service experience in the automotive field, FLIR has created the only thermal sensor that perfectly matches the car and has been deployed in the driver warning system of more than 500,000 vehicles.
FLIR thermal sensors detect and distinguish pedestrians, cyclists, animals and motor vehicles in challenging weather conditions such as total darkness, smoke, inclement weather and glare by providing a complementary data set to lidar, radar and visible light cameras vehicle. The detection range is four times that of ordinary headlights.
When combined with visible light data and range scan data obtained from lidar and radar, thermal data is paired with machine learning capabilities to create a more comprehensive detection and classification system.

Detailed explanation of data sets

project Notice
content Simultaneously annotated thermal images and unannotated RGB images for reference. Thermal camera centerlines are approximately 2 inches apart and calibrated to minimize parallax
image 14,000 images in total, 10,000 from short video clips and 4,000 BONUS images from a 140-second video
Image capture refresh rate Recorded at 30Hz frame rate. The dataset sequence is sampled at a rate of 2 frames/second or 1 frame/second. Video annotations are recorded at 30 frames/second.
Total number of frame annotation tags There are 10,228 frames in total, 9,214 of which have bounding boxes.
1. People (28,151)
2. Automobiles (46,692)
3. Bicycles (4,457)
4. Dog (240)
5. Other vehicles (2,228)
Total number of video annotation tags There are 4,224 frames in total, 4,183 of which have bounding boxes.
1. People (21,965)
2. Automobiles (14,013)
3. Bicycles (1,205)
4.Dog(0)
5. Other vehicles (540)
driving environment Driving on the streets and highways of Santa Barbara, California is sunny to cloudy weather during the day (60%) and nighttime (40%) from November to May.
Capture thermal imaging camera specifications IR Tau2 640×512, 13mm f/1.0 (HFOV 45°, VFOV 37°) FLIR BlackFly (BFS-U3-51S5C-C) 1280×1024, Computar 4-8mm f/140-1, 6-megapixel lens (field of view) Set to match Tau2)
Dataset file format 1.14-bit TIFF thermal image (without AGC)
2.8-bit JPEG thermal image (AGC applied), no bounding box embedded in the image
3.8-bit JPEG thermal image (with AGC applied), with bounding boxes embedded in the image for easy viewing
4.RGB – 8-bit JPEG image
5. Annotation: JSON (MSCOCO format)
Example results The mAP score obtained by fine-tuning RefineDetect512 using this dataset is 0.587 (50% IoU) and tested using the holdout validation set.

Marker notes:

Annotators are asked to make the bounding box as tight as possible. Tight bounding boxes that omit small parts of an object (such as limbs) are preferred over wide bounding boxes. Personal items are not included in the in-person package box. When occlusion occurs, only the non-occluded parts of the object are annotated. For both humans and dogs, the head and shoulders are easier to include in this bag than other parts of the body. When occlusions allow only part of a limb or other minor part of an object to be visible, they are not annotated. Wheels are an important part of the bike category. Parts of the bike that are typically obscured by riders, such as handlebars, are not included in this package. Cyclists and bicycles are labeled separately. When an object is segmented by an occlusion, two separate annotations are given for the two visible parts of the object.

Create annotations for thermal images only. Thermal imaging cameras and RGB cameras are not positioned at the same location on the vehicle and therefore have different viewing geometries, so the thermal annotations are not representative of the object's location in the RGB image . You cannot use RGB images from this dataset for training because there are no Ground Truelabels.

file format:

  • train (8862 images) serial number: 1-8862
    • Annotated_thermal_8_bit: This folder contains 8-bit hot data that has been processed so that the ground trueannotation bounding boxes from the folder are overlaid on these data.
    • thermal annotations.json: These ground truetags are usually formatted in MSCOCO comment style.
    • thermal_8_bit8-bit, AGC applied, image in .jpeg format, otherwise the same image as in the thermal_16_bit folder.
    • RGB: 8-bit RGB (three-channel) image. Note that 499 images in Training, 109 images in Validation, and 29 images in Video do not have RGB corresponding images. The image resolution is generally 1600 X 1800, but there are also images with different resolutions such as 480 X 720, 1536
  • val (1366 images) serial number: 8863-10228
  • video (4224 images) serial number: 1-4224

Data set download

Official website link: https://www.flir.com/oem/adas/adas-dataset-form/Link
: https://link.zhihu.com/?target=https%3A//pan.baidu.com/s /1yJ5LDpvuj4M27MLmD9OsJw
extraction code: 9il7

Object detection network

At present, I see that most of your training is on yolov3, yolov4, and yolov5. I also trained the Yolo series on yolov5. The whole debugging process is relatively simple. Of course, if you need it, I can also write a new blog. Let me explain to you the training process on yolov5. If necessary, you can see me in the comment area=_=||.

First, here is a picture of the training results:
Please add image description
the picture is a bit small, with val-loss in the lower left corner. You can see that it is a bit overfitting in the later stage. After training for 150 epochs, it is actually almost there. Two detection results can also be exposed for your reference!
Insert image description here
Insert image description here
I would also like to emphasize that such detection results are really unexpected to me. It can effectively drive at night. The detection effect is greatly improved compared to RG images. It can also be seen from the results that the network is in-depth. The learning of single-channel images also dispelled my concerns.

Faster RCNN training FLIR data set

Study link

YOLOV5 training FLIR data set

Study link

The writing is relatively simple, and the key content is written. For some content, you can refer to Faster RCNN.

Guess you like

Origin blog.csdn.net/frighting_ing/article/details/121198328#comments_28586136