NLP relation extraction and event extraction

relation extraction

insert image description here

Relation extraction, also known as entity relationship extraction, is based on entity recognition. After entity recognition, it is one of the important supporting technologies for text content understanding to determine whether any two entities in a given text constitute a pre-defined relationship. Systems, intelligent customer service and semantic search applications are all very important.

insert image description here

Current deep learning methods have achieved good results in relation extraction tasks, because deep learning can automatically extract text features. There are many methods for deep learning to do relationship extraction, such as relationship extraction based on convolutional neural network and relationship extraction based on pre-trained models . Among them, the method based on convolutional neural network is one of the most typical methods.

Relation Extraction Algorithm Based on Convolutional Neural Network

A core algorithm of convolutional neural network applied to the field of relation extraction is PCNN algorithm . First, the sentence is converted into a vector representation through the word embedding and position embedding of the word, and then the feature vector of the sentence vector is extracted through the convolution operation and pooling operation of the convolutional neural network, and finally the relationship prediction is performed.

The PCNN algorithm consists of four modules: input module, convolution module, pooling module and classification module. The model structure is as follows:
insert image description here
Word vector: Computers cannot recognize human characters, so we use word vector construction tools such as word2vec to convert each word in the sentence It is converted into a low-dimensional real-valued vector, so that the computer can understand and recognize each word.

Position vector: The position vector represents the relative position of each word and entity pair in the sentence, and the relative distances from the word was to the entities Bob and Canada are 1 and -3, respectively.
insert image description here
Feature extraction: Feature extraction is to extract the main features in the sentence to represent the semantic information of the sentence. Since the length of the input sentence is not uniform, the relationship information between the head and tail entities may be distributed anywhere in the sentence, which means that different local features must be extracted from the sentence to predict the relationship type of the target entity pair. In convolutional neural networks, convolution operations are a common way to obtain these local features.

Segmented pooling: Commonly used pooling operations include maximum pooling, average pooling, and global pooling. Taking maximum pooling as an example, the most representative features in each feature vector are captured. For segmented pooling, According to the positions of two given entities, the convolutional sentence feature vector is divided into three segments, and then the maximum pooling operation is performed on each segment.
insert image description here

Relationship classification: Input the feature vector of each sentence into the softmax classifier for relationship classification. The probability that the feature belongs to different relationships is calculated, and the relationship with the highest probability is selected as the relationship of the entity pair in the sentence.
insert image description here

Backpropagation optimization based on loss function: use stochastic gradient descent (SGD) technique to maximize the log-likelihood J(θ). insert image description here
The parameters of neurons in different layers of the neural network are optimized by using the backpropagation algorithm.insert image description here

Relation Extraction Based on Distant Supervision

Cause: There are problems in the relationship extraction based on supervision above: In many fields, there are problems of insufficient labeling corpus, high cost of manual labeling, and time-consuming and laborious problems.
Strategy: Whether the corpus can be quickly annotated by an automated method.
Method: remote supervision .

The basic assumption based on remote supervision is: if the triplet R(E1,E2) can be obtained from the knowledge map (note: R stands for relationship, El and E2 represent two entities), and El and E2 co-occur in sentence S , then S expresses the relationship R between E1 and E2, which is marked as a positive training example.

Problems in remote supervision: remote supervision assumes that an entity pair only corresponds to one relationship, but in fact, entity pairs can have multiple relationships at the same time.

Small Sample Relation Extraction

Cause: In many fields, there are problems of insufficient annotation corpus, high cost of manual annotation, and time-consuming and labor-intensive problems.
Strategy: Whether it is possible to achieve relationship extraction that meets performance requirements with a small amount of labeled corpus.
Method: Small sample relation extraction .

Small sample learning task: The small sample training set contains many categories, and each category has multiple samples. In the training phase, C categories are randomly selected from the training set, and each category has K samples to construct a task, which is input as the support set of the model; and then a batch is drawn from the remaining data in the C categories The (batch) sample is used as the prediction object of the model, the query set (Query set). That is, the model is required to learn how to distinguish these C categories from C*K data. Such a task is called the C-way K-shot problem.

joint extraction

Broadly speaking, the relation extraction task is divided into two sub-tasks: entity extraction task and relation extraction task . The entity extraction task is similar to the named entity recognition. The essence of the relationship extraction task is to classify the relationship between the extracted entities. It is a classification task. The relationship extraction based on this method is called the pipeline method.

Joint extraction , also known as end-to-end extraction, refers to sending the original text directly into the model for training without modifying the input text, and outputting the final result. Joint extraction is to combine two subtask models in Pipeline into one task model, and extract entities and their relationships at the same time in this model.

The existing joint extraction models are generally divided into two categories: the joint extraction model of shared parameters, and the joint extraction model of joint decoding.

Joint Extraction Models with Shared Parameters

The joint is realized by parameter sharing between the two sub-models, and entities or relationships are obtained using different decoding methods.

Shared parameters can strengthen the interaction between sub-models, mainly sharing the word embedding layer and the shared encoding layer. Afterwards, the two sub-tasks generally follow the characteristics of the two sub-tasks and use their own models. For example, LSTM+CRF is often used to obtain sequence annotation results for entity extraction. ; For the sub-task of relation extraction, CNN is often used for feature extraction, and finally Softmax is used for relation classification.
insert image description here

Joint Extraction Model for Joint Decoding

Use one decoding method to get entities and relationships at the same time. Joint decoding can realize the interaction between entities, entities and relationships, and relationships.

The method of sequential labeling is used for joint extraction, that is, entities and their relationships are directly extracted instead of identifying entities and relationships separately. The core idea of ​​using sequence labeling for joint extraction is to label relations and entities at the same time. Since entities are "entities with relationships", the relationship extraction task can be converted into a sequence labeling task. The common labeling methods mainly include overall BIOES labeling, overall BIES callouts, overall BIO callouts, etc.

event extraction

Basic concept: As a form of information, an event refers to the objective fact that specific people and things interact at a specific time and place. Who, where, and what did they do. The purpose of event extraction is to extract structured text that can accurately describe the occurrence of events from unstructured natural language text.
insert image description here
ACE 2005 is currently the most widely used event extraction dataset, involving training data in three languages: English, Chinese and Arabic. ACE 2005 marked the event trigger words, event types, event elements and corresponding element roles in the sentence, and defined 8 event types, with a total of 33 sub-event types. Different event types contain different event element roles.
Subtasks:

1. Event detection

Detect and classify the events contained in the text. The traditional event detection method is to identify the event trigger words in the text, and then classify the trigger words.
The main research methods of event detection are based on template matching and based on machine learning.
insert image description here

The event detection model based on dynamic multi-pooled convolutional neural network includes four modules: word embedding learning, vocabulary-level feature extraction, sentence-level feature extraction, and event classification. The model structure is as follows: Input: vocabulary-level features and sentence-level features
insert image description here
. Among them, the vocabulary-level feature representation is formed by concatenating the word embedding vector from the beginning to the end one by one; the sentence-level feature includes the context word vector feature (CWF) and the position feature vector (PF) composed of the relative distance between each word in the sentence and the candidate trigger word .
Convolution operation: The semantics of the entire sentence is obtained by performing a convolution operation on the sentence-level features through the convolution kernel, and compressing it into the feature map.
Dynamic multi-pooling: This operation divides the feature map obtained by the convolution operation into two parts with the candidate trigger word as the boundary, then performs maximum pooling on each feature map, and then splices all the pooling results get the eigenvectors.
Classification: The feature vector obtained by dynamic multi-pooling and the vocabulary-level feature representation are spliced ​​to obtain a new feature vector, and then the classification result of the event is obtained by using the fully connected layer and the Softmax classifier.

Using DMCNN to realize event detection:
The following is the specific implementation code of DMCNN event detection based on Pytorch. The overall structure is based on the sentence context represented by the word vector as input, and then the sentence-level features are obtained through convolution and dynamic multi-pooling operations, and Splicing with vocabulary-level features is input to the classifier, and finally the cross-entropy loss function is used to calculate the loss and adjust the model parameters.

def forward(self): 
	x = torch.cat((self.char_lookup(self.char_inputs), self.pf_lookup(self.pf_inputs)), dim=-1)     #x: 句子级特征向量
	y = self.char_lookup(self.lxl_inputs).view(self.config.batch_t, -1)     #y 词汇级特征向量
	x = torch.tanh(self.conv(x.permute(0, 2, 1)))     # 经过卷积操作之后得到的特征向量
	x = x.permute(0, 2, 1) 
	x = self.pooling(x)     # 动态多池化操作得到的特征向量

self.conv = nn.Conv1d(self.config.char_dim+self.config.pf_t, self.config.feature_t, self.config.window_t,  bias=True) # 卷积 
self.L = nn.Linear(2*self.confifig.feature_t + 3*self.confifig.char_dim, self.confifig.num_t, bias=True) # 全连接层 
self.loss = nn.CrossEntropyLoss() 	# 交叉熵损失函数 
def pooling(self, conv): 	#动态多池化
	mask = np.array([[0, 0], [0, 1], [1, 0]]) 
	mask_emb = nn.Embedding(3, 2).cuda() 
	mask_emb.weight.data.copy_(torch.from_numpy(mask)) 
	mask = mask_emb(self.masks) # conv [batch, sen-2, feature] mask [batch, sen-2, 2] 
	pooled, _ = torch.max(torch.unsqueeze(mask*100, dim=2) + torch.unsqueeze(conv, dim=3), dim=1) 
	pooled -= 100 
	pooled = pooled.view(self.config.batch_t, -1) #torch.Size([170, 400]) 
	return pooled

2. Event element extraction

Discover event trigger words from text and judge the role played by elements. Event element extraction can be divided into pattern-matching-based and machine-learning-based methods.
Event trigger word: the core word indicating the occurrence of the event, mostly a verb or a noun;
event element: the event participant, such as person, time, place, etc.

The event element extraction method based on sequence annotation first inputs each word in the sentence into the word embedding module to obtain a word vector, and then inputs it into the bidirectional long-short-term memory neural network (Bi-LSTM) to output the prediction of the word in the sentence on each label Score, and finally get the final prediction label through conditional random field (CRF). The model structure is as follows:
insert image description here
Word embedding vector: Convert each word in the sentence into a word vector form as the input of the next layer.
Bi-LSTM module: The word embedding vector of each word in the sentence is used as the input of the bidirectional LSTM, and then the hidden states output by the forward and reverse LSTM are spliced ​​to obtain a complete hidden state sequence as the feature vector of the sentence.
CRF module: The role of the CRF layer is to increase the constraint rules to reduce the probability of wrong prediction results. For example, the label of the first word in a sentence always starts with the label BX or O, not IX, because the first word at the beginning of a sentence cannot be the middle part of a word that plays a role.

Guess you like

Origin blog.csdn.net/zag666/article/details/128211934