[Computer Vision | Image Classification] Commonly used data sets for image classification and their introduction (12)

一、SIDD-Image (Segmented Intrusion Detection Dataset)

This is the first image-based network intrusion detection dataset. This large-scale dataset includes images of network traffic protocol-based communications from 15 different observation sites in different countries in Asia. This dataset is used to identify two different types of anomalies from benign network traffic. Each image of size 48×48 contains 128 seconds of multi-protocol communication. The SIDD dataset can be applied to a wide range of tasks, such as machine learning-based network intrusion detection, non-independently identically distributed federated learning, etc.

Insert image description here

2. Sports10

The game dataset contains 100,000 game images covering 175 video games in 10 sports categories - American football, basketball, bike racing, racing, fighting, hockey, soccer, table tennis, and tennis.

Hand-curated images to remove menu/transition frames and contain only gameplay sequences.

Games are divided into three visual style categories: Retro (arcade style, 1990s and earlier) Modern (circa 2000s) PHOTOREAL (circa late 2010s).

Insert image description here

3. Stream-51

A new dataset for stream classification, consisting of temporally correlated images from 51 different object classes and additional evaluation classes outside of the training distribution, is used to test novelty recognition.

四、ASIRRA ((Animal Species Image Recognition for Restricting Access)

Web services are often protected by challenges that are easy for humans to solve but difficult for computers. This challenge is often called CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans apart) or HIP (Human Interactive Proof). HIP is used for a variety of purposes, such as reducing email and blog spam and preventing brute force attacks on website passwords.

Asirra (Animal Species Image Recognition with Restricted Access) is a HIP that works by asking users to identify photos of cats and dogs. This task is difficult for computers, but research shows that people can complete it quickly and accurately. Many even thought it was funny! Here is an example of the Asirra interface:

Asirra is unique in its partnership with Petfinder.com, the world's largest website dedicated to finding homes for homeless pets. They provided Microsoft Research with more than three million images of cats and dogs that had been manually classified by people at thousands of animal shelters across the United States. Kaggle fortunately provides a subset of this data for fun and research purposes.

5. AdvNet

AdvNet is a dataset of traffic sign images. Specifically, it includes adversarial traffic sign images (i.e., traffic sign images with stickers on their surfaces) that can fool state-of-the-art neural network-based perception systems, as well as clean traffic sign images without any stickers.

If you use AdvNet, please cite the following paper:

Y. Kantaros, T. Carpenter, K. Sridhar, I. Lee, J. Weimer: “Real-time detector of adversarial digital and physical inputs for perceptual systems”, 12th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS), 2021

Insert image description here

六、BCNB (Early Breast Cancer Core-Needle Biopsy WSI)

Breast cancer (BC) has become the biggest threat to women's health worldwide. Clinically, identifying axillary lymph node (ALN) metastasis and other tumor clinical features such as ER and PR is of great significance for assessing the prognosis and guiding treatment of BC patients.

Several studies have aimed to predict ALN status and other tumor clinical characteristics using clinicopathological data and genetic testing scores. However, these methods are often limited by the relatively poor predictive value and high cost of genetic testing. In recent years, deep learning (DL) has enabled the rapid development of computational pathology. DL can perform high-throughput feature extraction on medical images and analyze the correlation between primary tumor features and the above-mentioned states. To date, there are no relevant studies on preoperative prediction of ALN metastasis and other tumor clinical characteristics based on WSI of primary BC samples.

Our paper introduces a new dataset of early breast cancer core biopsy WSI (BCNB), which includes core needle biopsy whole-section images (WSI) and corresponding clinical data of early breast cancer patients. WSIs were examined and annotated by two independent and experienced pathologists who were blinded to all patient-related information.

Based on this data set, we studied a deep learning algorithm for preoperative prediction of ALN metastasis status using multiple instance learning (MIL) and achieved the best AUC of 0.831 in an independent test cohort. For more details, check out our paper.

There are WSIs for 1058 patients, and only some tumor regions are annotated in the WSIs. In addition to WSI, we also provide clinical characteristics of each patient, including age, tumor size, tumor type, ER, PR, HER2, HER2 expression, histological grade, surgery, Ki67, molecular subtype, number of lymph node metastases, and Metastatic status of axillary lymph nodes (ALN). The data set has been desensitized and does not contain private patient information.

Based on this dataset, in our paper we study the prediction of axillary lymph node (ALN) metastasis status, which is a weakly supervised classification task. However, other studies based on our dataset are also feasible, such as prediction of histological grade, molecular subtype, HER2, ER, and PR. We do not limit the specific content of your research and welcome any research based on our datasets.

Please note that this dataset is for educational and research purposes only and commercial and clinical applications are not permitted. Use of this dataset must be in accordance with the license agreement.

Insert image description here

七、Deep PCB (Deep Printed Circuit Board)

Deep PCB
dataset link:
The dataset contains 1,500 image pairs, each consisting of a defect-free template image and an aligned test image, annotated with the location of the 6 most common PCB defect types: opens, Short circuit, rat bite, spur line, pinhole and miscellaneous copper.
Dataset Description
Image Set
All images in this dataset were acquired from a linear scan CCD with a resolution of approximately 48 pixels per 1 mm.
Manually inspect and clear defect-free template images from the sampled images in the manner described above.
The original dimensions of the template and test images are approximately 16k x 16k pixels.
They are then cropped into many sub-images of size 640 x 640 and aligned through template matching techniques.
Next, the threshold is carefully chosen to employ binarization to avoid lighting interference.
Note that preprocessing algorithms can vary based on the specific PCB defect detection algorithm, however, image registration and thresholding techniques are common processes for high-precision PCB defect localization and classification.
The figure below shows a pair of examples from the DeepPCB dataset, where the right side is a defect-free template image and the left side is a defective test image with real annotations.
Image annotation
We use an axis-aligned bounding box with a class ID for each defect in the test image. As shown in the figure above, we have marked six common types of PCB defects: open circuit, short circuit, rat bite, burr, pinhole and miscellaneous copper. Since there are only a few defects in the real test images, we manually contend some artificial defects on each test image based on the PCB defect pattern, which results in approximately 3 to 12 defects in each 640 x 640 image. The number of PCB defects is shown in the figure below. We separate 1,000 images as the training set and the rest as the test set. Each annotation image has an annotation file with the same name. For example, 00041000_test.jpg, 00041000_temp.jpg, and 00041000.txt are test images, template images, and corresponding annotation files respectively. Each defect on the test image is annotated with the format: x1,y1,x2,y2,type, where (x1,y1) and (x2,y2) are the upper left and lower right corners of the defect bounding box. type is an integer ID followed by the following matches: 0-background (not used), 1-open, 2-short, 3-rat bite, 4-spur, 5-copper, 6-pinhole.

The source code for the annotation tools is now located in the ./tools directory.

Benchmarks
are evaluated using average accuracy and F-score. A detection is correct only if the Intersection over Units (IoU) between the detected bounding box and any ground truth box with the same category is greater than 0.33. The calculation formula of F-score is: F-score=2PR/(P+R), where P and R are precision and recall. Note that the F-score is threshold-sensitive, meaning you can adjust the score threshold to get better results. Although F-score is not as fair as the mAP criterion, it is more practical since a threshold should always be given when deploying a model and not all algorithms have a score evaluation for the target. Therefore, both F-score and mAP are considered in the benchmark.

The evaluation scripts of mAP and F-score are based on the evaluation script of Icdar2015, with slight modifications (you can register an account first). Here, we give the modified evaluation script and the groundtruth gt.zip file of the test set in the evaluation/ directory. You can evaluate your own method by following these instructions: * Run your algorithm and save the detection results for each image as image_name.txt, where image_name should be the same as in gt.zip. You should follow the format of evaluation/gt.zip, except that the output description for each defect in the algorithm should be: x1,y1,x2,y2,confidence,type, where (x1,y1) and (x2,y2) are defects The upper left and lower right corners of the bounding box. Confidence is a floating point number that represents how confident you are in such a detection. type is a string and should be one of the following: open, short, mousebite, spur, copper, pin-hole. Note that there are no spaces except commas. * Compress .txt files to res.zip. (The res.zip file should not contain any subdirectories) * Run the evaluation script: python script.py -s=res.zip -g=gt.zip

Methods
The source code for this section will be made public upon acceptance of the paper.

Experimental results
Here we show some results of our model based on deep neural networks. Our model achieves 98.6% mAp, 98.2% F-score at 62FPS. More statistical analyzes will be made public upon acceptance of the paper. The green bounding boxes are the predicted locations of PCB defects, with confidence levels at the top of each bounding box.

Insert image description here

八、Endotect Polyp Segmentation Challenge Dataset

The challenge consists of three tasks, each targeting different requirements for clinical use. The first task is to classify gastrointestinal tract images into 23 different categories. The second task focuses on efficient classification measured by the time spent processing each image. The final task involves automatic segmentation of polyps.

If you use this dataset, please cite "EndoTect 2020 Challenge: Evaluation and Comparison of Endoscopic Classification, Segmentation, and Inference Times."

Insert image description here

九、FMD (materials) (Flickr Material Dataset)

Shallan, Lavanya, Ruth Rosenholz, and Edward Adelson. "Material Perception: What Can You See at a Glance?" 》 Journal of Vision 9.8 (2009): 784-784.
http://people.csail.mit.edu/celiu/CVPR2010/FMD/FMD.zip

十、Image and Video Advertisements

The image and video ads collection consists of an image dataset containing 64,832 image ads and a video dataset containing 3,477 ads. The data contains rich annotations covering the theme and sentiment of the ad, questions and answers describing what actions the viewer is prompted to take, and the reasoning presented by the ad to persuade the viewer (“What should I do based on this ad, and why should I do this ?”), as well as symbolic references in advertising (e.g., doves symbolize peace).

Guess you like

Origin blog.csdn.net/wzk4869/article/details/133132635