Mapping knowledge - knowledge extraction - entity extraction (named entity)

Knowledge Mapping

Knowledge Mapping is a semantic network to reveal the relationship between the entities can be real-world things and their relationships formally described. Now knowledge map has been used to refer to a variety of large-scale knowledge base.

Is a universal triplet mapping knowledge representation, i.e. G (E, R, S). The basic form of the triplet 1 includes entities, relationships, and conceptual entities 2, attributes, attribute values, etc., the entity is the most basic element of the pattern of knowledge, there are different relationships between different entities.

Knowledge Extraction

Knowledge extraction is mainly open to the data link, the available knowledge extraction unit knowledge element including entity (extension of the concept), and the relationship attribute knowledge elements 3 through automated techniques, and on this basis, a series of high the fact that the quality of expression, to lay the foundation for the construction of the upper layer of the model.

Entity extraction (the NER)

Also referred to as entity extraction Early Learning named entity (named entity learning) or NER (named entity recognition), is the time from the original corpus automatically recognized named entities. Since entities are mapping knowledge in the most basic elements, the extraction of completeness, accuracy, recall rate will directly affect the quality of the knowledge base. Therefore, entity extraction knowledge extraction is the most basic and crucial step.

Entity extraction method is divided into three kinds: rule-based approach usually need to write a template for the target entity, then matched the original corpus; machine learning method based on statistical methods mainly through machine learning of the original corpus for training, and then using these models to identify the entity; open field for extraction will be mass-oriented Web corpus.

Rule-based and dictionary method

Early entity extraction is defined text field is defined for the semantic unit type conditions, mainly is based on the rules and dictionary methods, such as rules using defined, extracted text names, place names, organization names , and other entity specific time. However, rule-based template approach not only need to rely on a large number of experts to write rules or templates, limited areas of coverage, and it is difficult to adapt to the new changes in data requirements.

Statistical method based on machine learning

Subsequently, the researchers try to extract the problems in machine learning supervised learning algorithm named entities. Simple supervised learning algorithms on performance
not only limited training set, and the precision and recall rate algorithms are less than ideal. The researchers acknowledged the constraints of related supervised learning algorithm, the supervised learning algorithm and try to rule combined with each other, and achieved certain results.

A method for extracting an open field

How auto-discovery mode for a discriminative from a small entity instance, and then extended to the massive text classification and clustering problems do go to the entity, the entity proposed an expansion of the corpus in an iterative manner solutions, the basic idea is wherein a small amount of entity instance established model, then this model is applied to obtain a new data set of new named entities. No open domain clustering algorithm supervised learning, the basic idea is based on
semantic feature to search for a known entity named entities identified in the log, and then cluster.

Released eight original articles · won praise 14 · views 470

Guess you like

Origin blog.csdn.net/qq_39304851/article/details/103870022