Machine Learning Project (two) auxiliary artificial intelligence information extraction (a)

Information Extraction foundation

The concept of mapping knowledge, application and Construction

What is knowledge map

Usually mapping knowledge in == "entity (Entity)"FIG expressed in the node, with"Relations (Relation)" == map is expressed in the "edge"
triple -> (Audi, Ashkenazi, brand)
Here Insert Picture Description
knowledge map is mainly made by Google for search engine optimization

What information is extracted
for the structured and semi-structured data to be processed and define the extraction data table wrapper complex like manner.
Unstructured plaintext data to automatically extract the required information by means of natural language processing techniques. This process is commonly referred to asInformation Extraction
Here Insert Picture Description

The nature of natural language understanding

Natural language understanding is essentially a structure prediction.

Many tasks of natural language understanding, including, but not limited to Chinese word segmentation, POS tagging, named entity recognition, coreference resolution, syntactic analysis, semantic role labeling, are in the text behind the sequence specific semantic structure prediction.
The main task of information extraction
Named Entity Recognition (Name Entity Recognition)
Relation extraction (Relation Extraction)
Unified entity (Entity Resolution)
Anaphora resolution (Coreference Resolution)

Building knowledge map

Build a knowledge map system is not the focus of the development of algorithms and, in fact, the most important is to understand the core of the business as well as design their own patterns of knowledge
of specific business problems 1. Define
the phone & data preprocessing 2.
3. Knowledge map design
4. the map data stored in the knowledge
evaluation development the upper application, and the system

Construction of diabetes knowledge map

By diabetes-related textbooks, research papers do diabetes literature mining and construction of diabetes knowledge map.
1. Based on the entity diabetes clinical guidelines and research papers marked building
between 2 based on clinical guidelines and research papers diabetes entity off building

Entity system

Disease-related

1. The names of diseases (Disease)
2. etiology (Reasono)
3. clinical manifestations (the Symptom)
4. inspection method (the Test)
5. The inspection index value (test_value)

Treatment-related

6. The name of the drug (Drug)
7. The administration frequency (Frequency)
8. The dose (the Amount)
9. The method of administration (Method,)
10. The non-drug treatment (Treatment)
11. The operation (Operation)
12. The adverse reactions (SideEff)

General entities:

13 parts (Anatomy)
14. A degree (Level)
15. A duration (Duration)

Relations System

Disease-related

1. Check Method -> disease (Test_Disease)
2. Clinical manifestations -> disease (Symptom_Disease)
3. Non-drug treatment -> disease (Treatment_Disease)
4. Drug name -> disease (Drug_Disease)
5. The site -> disease (Anatomy_Disease)

Drug-related

6. The frequency of administration -> Drug Name (Frequency_Drug)
7. The duration -> Drug Name (Duration_Drug)
8. The dosage -> Drug Name (Amount_Drug)
9. The method of administration -> Drug Name (Method_Drug)
10. The adverse reactions -> Drug name (SideEff_Drug)

Annotation tool brat

File tagging work based brat software, http: //brat.nlplab.org/. Wherein .txt file as the original document, file labels .ann information denoted entities beginning with T, followed by the entity ID, an entity type, entity and corresponding to the starting position in the word document. If you need to see marked results brat software, you need to add .conf file.
BRAT official website

Published 46 original articles · won praise 1 · views 853

Guess you like

Origin blog.csdn.net/qq_33357094/article/details/104754121