Literature reading notes (six)

An empirical study of instance-based ontology matching finishing notes

 

 

First, the paper organize your thoughts flow

1.1 Related research papers

  1. The current top priority is to automatically match the entity
  2. The current entity matching technology is divided into four categories: lexical similarity-based matching, based on matching entity structure, based on background knowledge matching, based on matching entity instances match.
  3. The main study is based on an entity instance matching match. The main idea is to match the instance of the more specific examples of two concepts (entities) overlap, the greater the relationship between the two entities. Examples of the difficulty is how to match the definition of the degree of overlap.
  4. Systematic approach proposed mainly considered measure (calculate two entities overlap ratio), the threshold (a threshold measure in front), the hierarchy (considered as an extension of the example of heir entity) three dimensions.
  5. In this paper, the system calculates the degree of match tables
  6. This article answers metrics, thresholds, how to choose a hierarchy to make the best system working condition issues.
  7. This paper describes the use case scenarios: Contains information set all Dutch books printed publications and books including scientific information collection Dutch history and culture of language
  8. Based matching system proposed herein, an example of a frame:

1.2 thesis problem

Papers problem-solving process 1.3

1) shows two entities S (Source) and T (target), the goal is to find a triplet (S, T, R) where R is the relation between the S and T, R of the type comprising {≡, ⊑ , ⊓, ⊥} these four (equal, are overlapping, non-intersecting).

2) the matching target entity instance implemented mainly consider two examples of an overlapping portion of the set of entities.

3) Based on examples in the form of matching concepts jealous dependent entities, it must be considered different forms of the same concept set forth herein. The primary consideration is that this concept is just an example set by its label or by its individual set of instances and examples of expansion marked.

4) Second, taking into account the reality of instances mislabeled, data is scattered, ambiguous concept and other issues, it is very difficult to calculate the degree of overlap, so we use other metrics to evaluate: the first use other metrics to calculate two sets of instances relevance, then consider using statistical threshold obtained to exclude unreliable information.

  1. Then the article gives other measures of specific use:

1) Jaccard: used to measure small amounts designated by the two concepts at the same time, while these two concepts is an example of the relationship included. The final ratio of the calculated overlap two examples of the concept set

 

 

2) Corrected Jaccard: Jaccard on the basis of, for example a small number of markers appear to lower scores,

 

3) PMI: Pointwise Mutual Information, a label in order to reduce the uncertainty concept requires another concept marked, N being the number of labeled instances

 

4) Log similarity rate

5) obtain information entropy

  1. Since the method mentioned above requires a large amount of labeled instances to ensure statistical feasibility, thus setting a threshold value (an example of the number of hits) to discard the concept of the range is too small. In addition to a different sort of entity can find one among more than one entity or relationship.
  2. Finally Through the experiment, Jaccard, corrected Jaccard, PMI, LLR, and IGB 1 to 10 and a threshold value which is suitable
  3. Data set is provided: 243,886 book information between any two tagging by using the concept of from Brinkman and GTT. This article will mark the book as an instance of the concept of tagging.
  4. experimental method

1.4 Experimental methods used paper

1) gold standard: using the obtained gold standard manual annotation

2) average precision: Ni is the number of instances evaluation obtained before i match matching, Ngoodi in which the number of correct matches

 

3) Approximate recall

4) F Standard:

 

  1. The final goal of this experiment is to set a different measure in case matching being played what role, if there is the most appropriate measure to set the standard combinations and thresholds. Conclusion is, the experiment is designed to answer:

1) How to match the results of the newly inserted affect the final match result

2) What is the impact threshold is selected

What impact 3) using the extended concept of information

4) What measures based on the best match of choice is an example

The final evaluation of the results of experiments 1.5

  1. Affect the nature of the mapping of the results: consider three nature of the relationship: ONLYEQ (including only equal relationship), NOTREL (includes three kinds of relationships in addition to "related" relationship), ALL (taking into account all relationships except no link), and found that no matter what One measure of portfolio standards, the best performance in ONLYEQ relationship. (Hence the article in the experiment after only consider ONLYEQ relations)
  2. Impact threshold selection: Use threshold can improve the accuracy of the index, will result in the loss of a single rate of recall
  3. The best measure of choice: JC and JCcorr have the highest f-measures, precision and recall in all matches in the

Second, the paper innovation

    Based on a large number of experiments on the Dutch National Library application proposed an empirical study based on matching of instances.

Comparison of five common methods based on similarity measure and the use of thresholds and levels of information, through a combination of experiments to find out the best measure is based on matching entity instances

 

Guess you like

Origin www.cnblogs.com/hwx1997/p/12444121.html