Introduction to Visual Genome Dataset

reference

Zhihu article two
Visual Genome datasets combing
Visual Genome dataset introduction

VG Storyline

According to my understanding after reading the reference article, explain the story line about VG

Visual Genome (VG) is a large-scale image semantic understanding dataset released by Li Feifei Group of Stanford University in 2016. They hope that this dataset can promote research on advanced semantic understanding of images like ImageNet.

The data set includes many pictures, and each picture has four annotations: Region Description, Region Graph, Scene Graph, and QA. Among them, "Scene Graph" is shown in the figure below:

However, the statistics found that the object types and relationship types in the data set follow the long-tail distribution . In other words, there is a bias in the data set.

Therefore, Li Feifei's later work Scene Graph Generation by Iterative Message Passing proposed VG150 , taking 150 objects and 50 relationships with the highest frequency.

However, this does not alleviate the bias problem. In the paper Neural Motifs: Scene Graph Parsing with Global Context , the author proposes a simple and crude baseline : use the object detector to get the object on the graph, and for each pair of objects, the most frequent predicate will appear only based on the statistical results of the training set as a forecast result. However, the irony is that this very simple baseline is already much better than many models at the time.

Therefore, someone later proposed the data set VrR-VG , and they deliberately avoided the bias problem in some ways. The uniformity of this distribution was measured with a data visualization.

Neural Motifs: Scene Graph Parsing with Global Context

Introduction:
We study the problem of how to generate structured graph representations of visual scenes. Our work analyzes the role of motifs in scene graphs: that is, frequently occurring substructures. We provide new quantitative insights into these repeat structures on the Visual Genome dataset. Our analysis shows that object labels highly predict relation labels, but not vice versa . We also found that repeated patterns were also present in larger subgraphs: more than 50% of the graphs contained motifs involving at least two relations. We introduce a new baseline based on object detection: predicting the most frequently occurring relationship between pairs of objects given labels, as seen in the training set . This baseline improves on average around 3.6% over the previous state-of-the-art, and is improved in the evaluation setting. Then, we introduce Stacked Motif Networks, a new architecture designed to capture higher-order model features in scene graphs and further improve the performance of strong baselines with an average gain of around 7.1%. Our code is available

VrR-VG

See Visual Genome dataset combing

How to measure the similarity between two scene graphs?

Scene graph is a structured data form representing image content, where nodes represent objects or entities, and edges represent the relationship between them. Measuring the similarity between two scene graphs usually involves comparing the similarity between their nodes and edges.

Here are some metrics that might be used to measure the similarity of two scene graphs:

  • Node overlap: Calculate the ratio of the intersection and union of nodes in two scene graphs. The higher the ratio, the more similar the two scene graphs.
  • Edge overlap: Calculate the ratio of the intersection and union of edges in two scene graphs. The higher the ratio, the more similar the two scene graphs.
  • Consistency: Calculate the number of matches between nodes and edges in two scene graphs. The higher the number of matches, the more similar the two scene graphs.
  • Shared object count: Calculate the number of common objects in two scene graphs. The higher the number, the more similar the two scene graphs.

Alternatively, it is possible to use P/R.

Guess you like

Origin blog.csdn.net/duoyasong5907/article/details/129841088