Knowledge Graph Getting Started Study Notes (6)-Relationship Extraction

table of Contents

0 Preface:

1 Semantic relationship:

1.1 Syntactic relations

1.1.1 substitutes

1.1.3 Relations of Co-occurence

2 Usefulness of relation extraction

2.1 Features in relation extraction

2.1.1 Methods of learning semantic relations

2.1.2 Features

2.1.3 Basic entity features

2.1.4 Relationship characteristics

3 Relation extraction data set

3.1 Annotated data for semantic relationship learning

3.2 Entity Relationship Extraction Based on Template

3.2.1 Template-based approach

4 Extraction of supervised entity relationship

4.1 Method based on feature vector

​4.2 Nuclear classification

5 Weakly supervised entity relationship extraction

5.1 Bootstrapping

​5.2 Label Propagation

5.3 Co-learning method

6 Remote supervision of entity relationship extraction

6.1 Remote supervision

7 Unsupervised entity relationship extraction

8 Relation extraction based on deep learning

Word embedding

8.1 Word Embedding-Neural Network NMLM Model

Classic paper: Efficient Estimation of Word Representations in Vector Space

8.1.1 CBOW

8.1.2 Skip-gram model

Distributed Representations of Words and Phrases and their Compositionality

Linguistic Regularities in Continuous Space Word Representations [Mikolov &al.2013]

Semantic Compositionality through Recursive Matrix-Vector Spaces

Relation Classification via Convolutional Deep Neural Network [Zeng&al., 2014]

Classifying relations by ranking with convolutional neural networks.


0 Preface:

  • Introduction to relation extraction
  • Semantic relationship
  • Features in relation extraction
  • Relation extraction data set
  • Template-based relation extraction
  • Supervised entity relationship extraction
  • Weakly supervised entity relationship extraction
  • Remote supervision of entity relationship extraction
  • Unsupervised entity relationship extraction
  • Relation extraction based on deep learning

1 Semantic relationship:

It refers to the relationship established by the semantic category of the word hidden behind the syntactic structure, and various detailed semantic relationships are recommended to be viewed against the PPT (in fact, it is what we think is a grammatical problem). At the same time, it has a certain development history, so this part is not described here. This article focuses on explaining relation extraction under machine learning
 

1.1 Syntactic relations

Positional relationship (relation of positioin)
substitution relationship (relationship of substitutability)
co-occurrence relationship (relation of co-occurrence)

1.1.1 Substitution relationship

1.1.3 Relations of Co-occurence

2 Usefulness of relation extraction

  • Build knowledge base text analysis NLP application
  • Information extraction
  • Information retrieval.
  • Automatic summary
  • machine translation
  • Q&A
  • Paraphrase
  • Textual implication reasoning
  • Thesaurus builds a semantic network
  • Word Meaning Disambiguation
  • Language modeling

2.1 Features in relation extraction

2.1.1 Methods of learning semantic relations

Supervised learning

  • Advantages: very good performance

  • Disadvantages: Need a lot of labeling data and feature representation

Unsupervised learning

  • Advantages: Scalable, suitable for open information extraction
  • Disadvantages: poor performance

2.1.2 Features

Purpose: Map data into entity characteristics and relationship characteristics

2.1.3  Basic entity features

The basic entity features include the string value of each candidate parameter and the individual words that mark these parameters , which may be lemmatization or stemming,
for example:
string value\individual words, morphological words or stemming

  • Advantages: In most cases, such features are informative enough for a good relationship
  • Disadvantages: Such features tend to be sparse

Background entity features

  • Syntactic information, such as: grammatical roles
  • Semantic information, such as: semantic category

A list of specific tasks may be used, such as: ACE entity types
may use generalized vocabulary resources, such as: WorldNet

  • Advantages: solve the problem of data sparsity
  • Disadvantages: requires human resources
  • Advantages: It is possible to capture its meaning by aggregating a word and interacting with all other words in a large text collection.
  • Disadvantages: will mix different meanings together

2.1.4 Relationship characteristics

  • Directly characterize the relationship,
  • For example, modeling the context of an entity.
  • Word between two parameters
  • Word in a specific window or side of the parameter
  • Dependent path of link parameters
  • A complete dependency graph
  • Minimal ruler

Background relational features

  • Encode knowledge about how entities usually interact, not just context
  • Relational representation through paraphrase
  • Placeholder pattern
  • Find similar contexts through clustering

3 Relation extraction data set

3.1 Annotated data for semantic relationship learning

At last year's EMNLP2018, FewRel released a large-scale fine-labeled relation extraction data set by the Natural Language Processing Laboratory of Tsinghua University led by Professor Maosong Sun. Using Wikipedia as the corpus and Wikidata as the knowledge graph, it contains 100 categories and 70,000 examples, which completely surpasses the previous similarly labeled data sets.
Significance:
Applied to classic supervision/remote supervision relationship extraction tasks, it also has great exploratory value and broad application prospects in emerging few-shot learning tasks

3.2 Entity Relationship Extraction Based on Template

3.2.1 Template-based approach

  • Use patterns (rules) to mine relationships, based on trigger words/strings, etc.
  • Dependency-based syntax
  • Relationship mining model

Hearst's list of patterns

Dependency-based syntax

advantage:

  • Artificial rules have high accuracy (high-precision)
  • Can be customized for specific fields (tailor)
  • Easy to implement on small-scale data sets, simple to construct

Disadvantages:

  • Low recall rate (low-recall)
  • The template for a specific field needs to be constructed by experts. It is necessary to consider all possible patterns thoroughly.
  • Difficult and time-consuming
  • Need to define the pattern for each relationship
  • Difficult to maintain
  • Transplantability difference
     

4 Extraction of supervised entity relationship

  • Feature vector-based method --- extract a series of features from contextual information, part of speech, grammar, etc.
  • Nuclear classification relationship features---may have a complex structure
  • Sequence labeling method---The span of the parameters in the relationship is variable.
     

4.1 Method based on feature vector


4.2 Nuclear classification

5 Weakly supervised entity relationship extraction

5.1 Bootstrapping

Semantic drift example

5.2 Label Propagation

method

5.3 Co-learning method

Basic process:

  • Choose 2 different classifiers,

  • Use independent features to train on 2 training sets and test on the unlabeled set respectively
  • Select instances with high confidence to extend to the training set of another classifier
  • Iterate several times in this way, and stop when the accuracy reaches the threshold.
     

6 Remote supervision of entity relationship extraction

6.1 Remote supervision

7 Unsupervised entity relationship extraction

8 Relation extraction based on deep learning

Word embedding

What is it? How to
     map a word to a real-valued low-dimensional space vector
?

  • neural networks (e.g., CBOW, skip-gram) [Mikolov&al.2013a] 
  • dimensionality reduction (eg, LSA, LDA, PCA
  • explicit representation (words in the context)

Why should we pay attention to it?
     Word embedding is important for many NLP tasks, including relation extraction

8.1 Word Embedding-Neural Network NMLM Model

Classic paper: Efficient Estimation of Word Representations in Vector Space

This article is the first one of word2vec, and proposes two big models of CBOW and Skip-gram

8.1.1 CBOW

frame:

What CBOW has to do is to use the words before and after
a word, called its context, to predict it.

8.1.2 Skip-gram model

It seems to be the opposite of CBOW. It accepts a word as input and predicts the words in its context. As mentioned in the article, increasing the context window can improve the quality of the word vector.

Distributed Representations of Words and Phrases and their Compositionality

This article is the second attack of word2vec, and proposes some tips for improving
Skip-gram:


The nature of projection with PCA Skip-gram:
word embedding has a linear structure, which can achieve analogy with vector arithmetic. Due to the training goal, the input and output (before softmax) have a linear relationship

Linguistic Regularities in Continuous Space Word Representations [Mikolov &al.2013]

This article mainly introduces the characteristics of word vectors supporting basic algebraic operations, and uses this feature to apply them to SemEval 2012 Task to measure
the word vectors in this paper are learned through RNNLM.
s(t) = f(Uw(t) + Ws(t-1))


Semantic Compositionality through Recursive Matrix-Vector Spaces

MV-RNN: Matrix-Vector R NN
 

The model in this article assigns a vector and a matrix to each node in the parse tree:

  • The vector captures the inner meaning of the ingredients
  • The matrix captures how it changes the meaning of adjacent words or phrases

Experimental results (using the data set and evaluation of SemEva1-2010 Task 8)

Relation Classification via Convolutional Deep Neural Network [Zeng&al., 2014]

Experimental results

Classifying relations by ranking with convolutional neural networks.

Guess you like

Origin blog.csdn.net/qq_37457202/article/details/108478104