20 open source image datasets for computer vision

What is computer vision?

Computer vision enables computers to understand the content of images and videos. The goal of computer vision is to automate tasks that the human visual system can complete. Computer vision tasks include image acquisition, image processing and image analysis. Image data can take different forms, such as video sequences, images viewed from multiple cameras at different angles, or multi-dimensional data from medical scanners.

AI from entry to proficiency: 20 open source image datasets for computer vision


Image dataset for computer vision training

  • Labelme: A large data set created by the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), containing 187,240 images, 62,197 annotated images and 658,992 labeled objects.
  • Lego Bricks: Approximately 12,700 images of 16 different Lego bricks classified by folders and a computer rendered using Blender.
  • ImageNet: The actual image data set used for the new algorithm. Organize according to the WordNet hierarchy, where each node of the hierarchy is depicted with thousands of images.
  • LSUN: Scene understanding with many auxiliary tasks (room layout estimation, saliency prediction, etc.)
  • MS COCO: COCO is a large-scale object detection, segmentation and captioning data set containing more than 200,000 tagged images. It can be used for object segmentation, context recognition, and many other use cases.
  • Columbia University Image Library: COIL100 is a data set that contains 100 different objects that are imaged at every angle of 360 degree rotation.
  • Visual Genome: It is a data set and knowledge base designed to connect structured image concepts with language. The database has a detailed visual knowledge base with 108,077 image captions.
  • Google's Open Images: There are 9 million URL image collections under the "Creative Commons" project. These images have been annotated with more than 6,000 categories of tags.

AI from entry to proficiency: 20 open source image datasets for computer vision


  • Youtube-8M: A large-scale tagged data set consisting of millions of YouTube video IDs with annotations of more than 3,800 visual entities.
  • Labelled Faces in the Wild: 13,000 labeled face images used to develop applications involving facial recognition.
  • Stanford Dogs Dataset: Contains 20,580 images and 120 different dog categories, each category has about 150 images.
  • Places: A scene-centric database, which contains 205 scene categories and 2.5 million images with category tags.
  • CelebFaces: A face dataset with more than 200,000 celebrity images, each with 40 attribute annotations.

AI from entry to proficiency: 20 open source image datasets for computer vision


  • Flowers: A dataset of flower images commonly found in the UK, containing 102 different categories. Each flower category contains 40 to 258 images, these images have different postures and light changes.
  • Plant Image Analysis: A data set covering more than one million plant images, you can choose from 11 plants.
  • Home Objects: A data set containing random objects from the family, mainly random objects from the kitchen, bathroom and living room, divided into training and testing data sets.
  • CIFAR-10: A large image data set containing 60,000 32×32 color images, divided into 10 categories. The data set is divided into five training batches and one test batch, each batch contains 10,000 images.
  • CompCars: Contains 163 car models, including 1,716 car models. Each car model is marked with five attributes, including maximum speed, displacement, number of doors, number of seats, and car type.
  • Indoor Scene Recognition: Very specific data set, very useful, because most scene recognition models perform better "outside". Contains 67 indoor categories, a total of 15,620 images.
  • VisualQA: VQA is a data set containing open-ended questions about 265,016 images. These problems require an understanding of vision and language. For each image, there are at least 3 questions, and each question has 10 answers.

Artificial intelligence will subvert many scenarios and even entire jobs. We are most excited about the many ways to enable video content analysis in the enterprise. Compared with images that have flourished in deep learning models so far, videos provide more valuable information. Computer vision algorithms can maximize the value generated by video.

AI from entry to proficiency: 20 open source image datasets for computer vision


Object recognition

After the machine learning algorithm digests the video frames, the "object recognition" process will identify the various topics in it. The object recognition of artificial intelligence is a collection of related tasks, rather than a single step of human visual perception. The key elements of object recognition include image classification, object location and final object detection. Combining object recognition and motion detection can realize intelligent analysis and prediction.

Video structured

Following pictures, video structuring has also become another hot spot in the field of deep learning. Compared with pictures, video content is undoubtedly more complex. Video structuring is a technology for extracting video content information. It uses temporal and spatial segmentation, feature extraction, object recognition and other processing methods to organize the video content into text information that can be understood by computers and humans in accordance with semantic relationships. From the perspective of data processing flow, video structured description technology can transform surveillance video into information that can be understood by humans and machines.

Whether video images can be structured through intelligent analysis technology is the key to the implementation of video big data in the security field.

AI from entry to proficiency: 20 open source image datasets for computer vision


TSINGSEE Qingxi video intelligent video analysis platform EasyCVR can automatically analyze live video surveillance images, such as target detection, target recognition, target tracking, face recognition, scene segmentation, character and vehicle attribute analysis, etc., based on AI intelligent analysis, video structure To understand and describe the target behavior in the surveillance scene.

TSINGSEE Qingxi Video will also incorporate more emerging technologies based on AI algorithms, deep learning, big data intelligent analysis, edge computing, 5G, etc., to enable more application scenarios and accelerate the implementation of video AI in more industries.

Guess you like

Origin blog.csdn.net/TsingSee/article/details/115175280