Literature reading notes (XII)

Instance-Based Ontology Mapping finishing notes

 

 

First, the paper organize your thoughts flow

1.1 Related research papers

  1. CIM (Common Information Model): in the computer industry are used to define a standard application equipment and methods, so that the system administrator may control and management program and application equipment and sources from different vendors in the same manner. For example, a company purchased from different companies in different types of storage devices can see the same kind of information for each device (for example: the relationship between the device name and model number, serial number, capacity, and network addresses, and other devices or applications) or from an application to get information.
  2. CATO: ontology integration engine for mapping between different entities represented match, in order to achieve tight integration and to avoid ambiguous interpretations of Ontology. But the drawback is that when the amount of data constituting the source entity, the details of the performance is relatively small, resulting in a description of the entities defined incomplete.
  3. Database pattern matching: given two modes A, B, if there is a match mapping μ, such a concept in the concept of A and B has a b a = μ (b), is considered a, b have the same meaning.
  4. This paper presents a method to add CATO, incomplete or incorrectly defined the time when the original resource representation, may improve outcomes
  5. The CATO and knowledge management integration elements automatically.
  6. The article first summarizes the tasks CATO body integration engine to be carried out. Integration of heterogeneous resources based on CIM can be likened to the expression pattern matching database. In the CATO engine, if we assume that each resource by an independent body, said there must be a mechanism for consultation body associated support. Associated with this kind of negotiation can ensure that the goal is to find the middle of the contact associated with expression.
  7. CATO engine combines a variety of algorithms, such as nlp, similarity measure, compare tree. The initial version is based on the java CATO and used with the api-jena java. CATO principle of the system is the syntactic and semantic analysis of the content analysis of mixed body, specific measures to deal with the level and structure of the language is relatively level, interspersed between the body concept.
  8. CATO linking process: First, the comparison between the two concepts of text similarity, the text on a similar concept semantics should be similar, but there are some concepts may have ambiguity, it reintroduced structural comparison, the tree is specific Compare, parent and child class concept conceptual comparison concepts. The final draw two concepts of equal relationships. The results end up with a document that contains OWL concept of equivalence relations
  9. CATO success depends on the input body information encoding quantity and quality. The more rich and complete information, better matching results. Conversely, if the input is not properly defined body, incomplete or missing, data integration engine is almost no body to be processed, it is not possible to provide sufficiently reliable results.
  10. In order to solve the problem CATO, the article proposes the use of case-based approach, not taking matching body type, but consider the example of the body. Body stores not only categories, further comprising instances. Examples of the method is based on the classification match is found between the different entities. The central idea of ​​the article is the result of processing the query raised by two different bodies, as a reliable method to estimate the model design, the model indicates a match rate between aligned pairs of concepts. Therefore, the article proposes Each instance has a GTIN identification.
  11. Articles then began detailed case-based method proposed

1.2 thesis problem

Papers problem-solving process 1.3

1) Query stages: first, to make a body represents O, C represents its set of categories, including classification [c1 ... cn]. Examples of each category may correspond to a plurality of O, with [rk1 ... rkn] shows an example of the type of ck. Examples of instances when ra rb and Ob, Oa of objects that represent the same real world, it is considered two examples are equivalent relationship, but also can be considered equivalent to ca and cb. In this process, for each classification (ca, cb), needs to be calculated for each occurrence of ra and rb logarithm of the sum of n-(ca, cb) (i.e. ra ≡ rb when ca and cb are ra and rb of classification); class ca cb class mapped to a frequency estimation P (ca, cb) = n (ca, cb) / n (ca)

2) Analysis Phase: After the calculated n (ca, cb), and P (ca, cb), using the result set from the query to n (ca), n (ca, cb), and P (ca, cb) recalculation

 

α is a calibration coefficient, the value set is {0.01, 0.1, 0, 1, 10, 100}

Δ (ca, cb) is the number of current ca and cb arise.

Δ (ca) is the number of ca arise.

n (ca, cb) is the sum of ca and cb occur.

n-(ca) is the sum of ca occur.

It is a smoothing coefficient Ψ

1.4 Experimental methods used paper

  1. Using 6-fold cross validation techniques, the validation data set has been manually labeled.
  2. Validation set using different combinations of coefficients comparative experiment respectively.
  3. The final accuracy of the system obtained was 89.7%, 81.3% recall rate
  4. Coefficient setting, the actual matching rate α = 1, the data set the best performance 0.4

The final evaluation of the results of experiments 1.5

Second, the paper innovation

This paper presents a method of supplementary CATO, when the original resource represents incomplete or incorrectly defined, you can improve the results of the CATO and knowledge management integration elements automatically.

 

Guess you like

Origin www.cnblogs.com/hwx1997/p/12444177.html