table of Contents
1.1.3 Relations of Co-occurence
2 Usefulness of relation extraction
2.1 Features in relation extraction
2.1.1 Methods of learning semantic relations
2.1.4 Relationship characteristics
3 Relation extraction data set
3.1 Annotated data for semantic relationship learning
3.2 Entity Relationship Extraction Based on Template
4 Extraction of supervised entity relationship
4.1 Method based on feature vector
5 Weakly supervised entity relationship extraction
6 Remote supervision of entity relationship extraction
7 Unsupervised entity relationship extraction
8 Relation extraction based on deep learning
8.1 Word Embedding-Neural Network NMLM Model
Classic paper: Efficient Estimation of Word Representations in Vector Space
Distributed Representations of Words and Phrases and their Compositionality
Linguistic Regularities in Continuous Space Word Representations [Mikolov &al.2013]
Semantic Compositionality through Recursive Matrix-Vector Spaces
Relation Classification via Convolutional Deep Neural Network [Zeng&al., 2014]
Classifying relations by ranking with convolutional neural networks.
0 Preface:
- Introduction to relation extraction
- Semantic relationship
- Features in relation extraction
- Relation extraction data set
- Template-based relation extraction
- Supervised entity relationship extraction
- Weakly supervised entity relationship extraction
- Remote supervision of entity relationship extraction
- Unsupervised entity relationship extraction
- Relation extraction based on deep learning
1 Semantic relationship:
It refers to the relationship established by the semantic category of the word hidden behind the syntactic structure, and various detailed semantic relationships are recommended to be viewed against the PPT (in fact, it is what we think is a grammatical problem). At the same time, it has a certain development history, so this part is not described here. This article focuses on explaining relation extraction under machine learning
1.1 Syntactic relations
Positional relationship (relation of positioin)
substitution relationship (relationship of substitutability)
co-occurrence relationship (relation of co-occurrence)
1.1.1 Substitution relationship
1.1.3 Relations of Co-occurence
2 Usefulness of relation extraction
- Build knowledge base text analysis NLP application
- Information extraction
- Information retrieval.
- Automatic summary
- machine translation
- Q&A
- Paraphrase
- Textual implication reasoning
- Thesaurus builds a semantic network
- Word Meaning Disambiguation
- Language modeling
2.1 Features in relation extraction
2.1.1 Methods of learning semantic relations
Supervised learning
-
Advantages: very good performance
- Disadvantages: Need a lot of labeling data and feature representation
Unsupervised learning
- Advantages: Scalable, suitable for open information extraction
- Disadvantages: poor performance
2.1.2 Features
Purpose: Map data into entity characteristics and relationship characteristics
2.1.3 Basic entity features
The basic entity features include the string value of each candidate parameter and the individual words that mark these parameters , which may be lemmatization or stemming,
for example:
string value\individual words, morphological words or stemming
- Advantages: In most cases, such features are informative enough for a good relationship
- Disadvantages: Such features tend to be sparse
Background entity features
- Syntactic information, such as: grammatical roles
- Semantic information, such as: semantic category
A list of specific tasks may be used, such as: ACE entity types
may use generalized vocabulary resources, such as: WorldNet
- Advantages: solve the problem of data sparsity
- Disadvantages: requires human resources
- Advantages: It is possible to capture its meaning by aggregating a word and interacting with all other words in a large text collection.
- Disadvantages: will mix different meanings together
2.1.4 Relationship characteristics
- Directly characterize the relationship,
- For example, modeling the context of an entity.
- Word between two parameters
- Word in a specific window or side of the parameter
- Dependent path of link parameters
- A complete dependency graph
- Minimal ruler
Background relational features
- Encode knowledge about how entities usually interact, not just context
- Relational representation through paraphrase
- Placeholder pattern
- Find similar contexts through clustering
3 Relation extraction data set
3.1 Annotated data for semantic relationship learning
At last year's EMNLP2018, FewRel released a large-scale fine-labeled relation extraction data set by the Natural Language Processing Laboratory of Tsinghua University led by Professor Maosong Sun. Using Wikipedia as the corpus and Wikidata as the knowledge graph, it contains 100 categories and 70,000 examples, which completely surpasses the previous similarly labeled data sets.
Significance:
Applied to classic supervision/remote supervision relationship extraction tasks, it also has great exploratory value and broad application prospects in emerging few-shot learning tasks
3.2 Entity Relationship Extraction Based on Template
3.2.1 Template-based approach
- Use patterns (rules) to mine relationships, based on trigger words/strings, etc.
- Dependency-based syntax
- Relationship mining model
Hearst's list of patterns
Dependency-based syntax
advantage:
- Artificial rules have high accuracy (high-precision)
- Can be customized for specific fields (tailor)
- Easy to implement on small-scale data sets, simple to construct
Disadvantages:
- Low recall rate (low-recall)
- The template for a specific field needs to be constructed by experts. It is necessary to consider all possible patterns thoroughly.
- Difficult and time-consuming
- Need to define the pattern for each relationship
- Difficult to maintain
- Transplantability difference
4 Extraction of supervised entity relationship
- Feature vector-based method --- extract a series of features from contextual information, part of speech, grammar, etc.
- Nuclear classification relationship features---may have a complex structure
- Sequence labeling method---The span of the parameters in the relationship is variable.
4.1 Method based on feature vector
4.2 Nuclear classification
5 Weakly supervised entity relationship extraction
5.1 Bootstrapping
Semantic drift example
5.2 Label Propagation
method
5.3 Co-learning method
Basic process:
-
Choose 2 different classifiers,
- Use independent features to train on 2 training sets and test on the unlabeled set respectively
- Select instances with high confidence to extend to the training set of another classifier
- Iterate several times in this way, and stop when the accuracy reaches the threshold.
6 Remote supervision of entity relationship extraction
6.1 Remote supervision
7 Unsupervised entity relationship extraction
8 Relation extraction based on deep learning
Word embedding
What is it? How to
map a word to a real-valued low-dimensional space vector
?
- neural networks (e.g., CBOW, skip-gram) [Mikolov&al.2013a]
- dimensionality reduction (eg, LSA, LDA, PCA )
- explicit representation (words in the context)
Why should we pay attention to it?
Word embedding is important for many NLP tasks, including relation extraction
8.1 Word Embedding-Neural Network NMLM Model
Classic paper: Efficient Estimation of Word Representations in Vector Space
This article is the first one of word2vec, and proposes two big models of CBOW and Skip-gram
8.1.1 CBOW
frame:
What CBOW has to do is to use the words before and after
a word, called its context, to predict it.
8.1.2 Skip-gram model
It seems to be the opposite of CBOW. It accepts a word as input and predicts the words in its context. As mentioned in the article, increasing the context window can improve the quality of the word vector.
Distributed Representations of Words and Phrases and their Compositionality
This article is the second attack of word2vec, and proposes some tips for improving
Skip-gram:
The nature of projection with PCA Skip-gram:
word embedding has a linear structure, which can achieve analogy with vector arithmetic. Due to the training goal, the input and output (before softmax) have a linear relationship
Linguistic Regularities in Continuous Space Word Representations [Mikolov &al.2013]
This article mainly introduces the characteristics of word vectors supporting basic algebraic operations, and uses this feature to apply them to SemEval 2012 Task to measure
the word vectors in this paper are learned through RNNLM.
s(t) = f(Uw(t) + Ws(t-1))
Semantic Compositionality through Recursive Matrix-Vector Spaces
MV-RNN: Matrix-Vector R NN
The model in this article assigns a vector and a matrix to each node in the parse tree:
- The vector captures the inner meaning of the ingredients
- The matrix captures how it changes the meaning of adjacent words or phrases
Experimental results (using the data set and evaluation of SemEva1-2010 Task 8)
Relation Classification via Convolutional Deep Neural Network [Zeng&al., 2014]
Experimental results