IMGpedia: Enriching the Web of Data with Image Content Analysis
1. Summary section
Linked Data rarely takes into account the multimedia content that forms a core part of the Web. To explore the combination of linked data and multimedia , IMGpedia is being developed: Computing content-based descriptors for images used in WIKIPIDIA articles and then proposing to link these descriptions with traditional encyclopedic knowledge bases such as DBpedia and WIKIDATA . On the basis of this extended knowledge base, the goal is to consider a unified query system, accessing both encyclopedia data and image data. Consider rule-based or content-based analysis of co-occurrence entities in images to enhance encyclopedic knowledge. Therefore, this short paper describes work in progress at IMGpedia, focusing on image descriptors .
2. The directory structure of the essay
3. Arranging and describing the relationship between several concepts in the article
So with all that said, what is IMGpedia? The definition in the paper is: IMGpedia, is a new knowledge base that uses metadata from DBpedia COMMONS and enriches it with visual descriptors for all images from the WIKIMEDIA COMMONS dataset . The concept still needs to be clarified, otherwise it will be very confusing.
so:
- What is DBpedia COMMONS?
- WIKIMEDIA COMMONS AGAIN?
Here's a recap of the abstract: IMGpedia: Computing content-based descriptors for images used in WIKIPIDIA articles (so understood this way, isn't this image from the WIKIMEDIA COMMONS dataset ), and then proposing to compare these descriptions with traditional encyclopedias Links to book-wide knowledge bases such as DBpedia and WIKIDATA
Now, the relationship should be clear, and we can proceed to the next step. How to calculate the descriptors of the images in the WIKIMEDIA COMMONS dataset?
4. Calculate the visual descriptor of the image (the focus of this article) and draw relevant conclusions
The descriptors to be computed are:
- Gray Histogram Descriptor (Gray Histogram Descriptor)
- Oriented Gradients Descriptor (Oriented Gradients Descriptor)
- Edge Histogram Descriptor (edge histogram descriptor)
- DeCAF7
Before performing the calculation of the 4 descriptors, we must first clarify a few issues about the image visual descriptor
- Where did the descriptor come from?
Descriptors are vectors generated after performing different operations on pixel matrices - The role of the descriptor?
to obtain certain characteristics such as color distribution, shape and texture.
Subsequently, these vectors are stored as part of a metric space where a similarity search can be performed - The data preparation before calculating the descriptors is actually included in the summary of the figure below. Here I will mention again that the
first three descriptors require a preprocessing step, that is, use the intensity Y=0.299 R + 0.587 G + 0.114*B to convert the image to gray Spend - How are descriptors extracted and computed?
Read the image ( image source: Download a copy of the WIKIMEDIA COMMONS image dataset, 14 million (out of 16 million) downloaded, 19TB) and load it into memory, extract the descriptor itself and save the vector on disk
The specific calculation method of the four visual descriptors (supplement):
Conclusions from benchmarking the descriptors:
Conclusion 1 : DeCAF7 is the most expensive descriptor to run
Conclusion 2 : The computation time of the first three descriptors improves by an order of magnitude, but DeCAF7 does not bring any benefit
5. The main progress of this article
6. Next step plan
- Linking IMGpedia with DBpedia, Wikidata, etc. Linking IMGpedia with DBpedia, Wikidata, etc.
- Efficient algorithms for ending relations between similar images
In order to facilitate querying IMGpedia, it is recommended to calculate static relations between images.
Intended to explore: building an index structure for similarity search on datasets; using approximate similarity search algorithms; using self-similar join techniques - Labeling images for multimedia applications Image labels in multimedia applications
By linking IMGpedia to the existing knowledge base, it is hoped to use categories, types, entities, etc. to mark
articles where images appear based on images, and DBpedia/WikiData can also be used to add specific entities to images Label