Relation-Aware Collaborative Learning for Unified Aspect-Based Sentiment Analysis(ACL 2020)

Table of contents

Title Translation: Relation-aware Collaborative Learning for Unified Aspect Sentiment Analysis

Original link: https://aclanthology.org/2020.acl-main.340.pdf

Summary

1 Introduction

2 related work

3 methods

3.1 Task Definition

3.2 Model Architecture

3.3 Relationship-aware cooperative learning

3.4 Stack RACL to multiple layers

3.5 Training process

4 experiments

4.1 Datasets and settings

4.2 Comparison results

5 Analysis

5.1 Ablation Study

5.2 Effect of hyperparameters

5.3 Case studies

5.4 Calculation cost analysis

6 Summary 

7 thanks


Title Translation: Relation-aware Collaborative Learning for Unified Aspect Sentiment Analysis

Original link: https://aclanthology.org/2020.acl-main.340.pdf

Summary

Aspect-based sentiment analysis (ABSA) consists of three subtasks, namely aspect word extraction, opinion word extraction, and aspect-level sentiment classification. Most existing research only focuses on one of the subtasks. Some recent studies attempt to solve the complete ABSA problem with a unified framework. However, the interactive relationship among the three subtasks has not been fully explored. We argue that this relationship encodes a cooperative signal between different subtasks. For example, when the evaluation word is "delicious", the aspect word must be "food" and not "place". To make full use of these relations, we propose a Relation-Aware Collaborative Learning (RACL) framework, which allows subtasks to coordinate work through multi-task learning and relation propagation mechanisms in stacked multi-layer networks. Extensive experiments on three real datasets demonstrate that RACL significantly outperforms state-of-the-art methods for the full ABSA task.

1 Introduction

Aspect-Based Sentiment Analysis (ABSA) is a fine-grained task that aims to summarize users' opinions on specific aspects in a sentence. ABSA usually includes three subtasks, namely aspect word extraction (AE), opinion word extraction (OE) and aspect-level sentiment classification (SC). As an example, given your comment, "This place is small and crowded, but the food is delicious.", AE aims to extract a set of aspect terms {"place", "food"}. OE aims to extract a set of opinions The terms {"small", "cramped", "delicious"}. Meanwhile, SC is expected to assign emotional polarity of "negative" and "positive" in terms of "place" and "food", respectively.

Most existing works regard ABSA as a two-step task containing AE and SC, and they develop a separate method for each subtask (Tang et al., 2016; Xu et al., 2018; Li et al., 2018a; Hu et al. al, 2019), or use OE as an auxiliary task for AE (Wang et al, 2017; Li et al, 2018b). In order to perform ABSA in practical applications, individual methods need to be tied together. Recently, some studies have attempted to solve ABSA under a unified framework (Wang et al., 2018a; Li et al., 2019; He et al., 2019; Luo et al., 2019).

Despite their effectiveness, in our opinion, these methods are not sufficient to produce satisfactory performance for the full ABSA task. The key reason is that existing research largely ignores the interaction relationship between different subtasks. These relationships convey cooperative signals that can enhance subtasks in a mutual manner. For example, the opinion term "delicious" can be used as evidence for the aspect term "food" and vice versa. In the following, we first analyze the interaction relations between different subtasks, and then introduce our RACL framework, which was developed to exploit these relations. The detailed relations are summarized in Fig. 1 (left), each arrow represents a specific relation Ri.

→ R1 represents the pairwise relation of AE to OE. In practice, aspect terms must be the object of opinion, which suggests that most aspect terms such as "place" can only be modified by corresponding opinion terms such as "small" and "cramped", but not by terms such as "delicious". Revise. Therefore, AE and OE may have information cues for each other.

→ R2 represents the ternary relationship between SC and R1. A key issue in SC is to determine the dependencies between aspects and their contexts. For example, the context "small and cramped" plays an important role in predicting the polarity of "place". This dependency is highly consistent with R1 emphasizing the interaction between aspect terms and opinion terms. Therefore, SC and R1 can help each other refine the selection process.

→ R3 represents the binary relation between SC and OE. Specific opinion terms often convey specific polarities. For example, "fantastic" is usually positive. Opinion terms extracted in OE deserve more attention in predicting emotional polarity in SC.

→ R4 represents the binary relationship between SC and AE. In the full ABSA task, the aspect terms are unknown and SC will assign a polarity to each word. Aspect words, such as "place", "food", have their corresponding polarity, while other words are considered as context words, without emotion. That is, the results of AE should help supervise the training of SC.

When reviewing the literature on the ABSA task, we found that existing stand-alone methods either do not exploit any relation, or only utilize R1, with OE as an auxiliary task to AE. Meanwhile, the unified approach exploits R3 and R4 explicitly at most. In view of this, we propose a novel relation-aware collaborative learning (RACL) framework to fully exploit the interactive relations throughout the ABSA task. We compare our model with existing methods by their ability to exploit interaction relations in Table 1.

RACL is a multi-layer multi-task learning framework that mutually enhances the performance of sub-tasks through a relation propagation mechanism. For multi-task learning, RACL employs a shared-private scheme (Collobert and Weston, 2008; Liu et al, 2017). The subtasks AE, OE and SC first jointly train the low-level shared features, and then independently train the high-level private features. In this way, shared features and private features can embed task-invariant knowledge and task-oriented knowledge, respectively. For relation propagation, RACL improves the capability of the model by exchanging information cues among the three subtasks. Furthermore, RACL can be stacked to multiple levels for collaborative learning at different semantic levels. We conduct extensive experiments on three datasets. The results show that RACL significantly outperforms state-of-the-art methods for both individual subtasks and the full ABSA task.

2 related work

Aspect-based sentiment analysis (ABSA) was first proposed by Hu and Liu (2004) and has been extensively studied in recent years (Zhang et al, 2018). We organize existing research on how to perform subtasks and combine them to perform ABSA.

Separate methods  Most of the existing studies regard ABSA as a two-step task consisting of aspect-item extraction (AE) and aspect-based sentiment classification (SC), and develop a separate aspect-item extraction method (Popescu and Etzioni, 2005; Wu et al., 2009; Li et al., 2010; Qiu et al., 2011; Liu et al., 2012; Chen et al, 2014; Chernyshevich, 2014; Toh and Wang, 2014; Vicente et al., 2015; Liu et al., 2015, 2016; Yin et al, 2016; Wang et al, 2016; Li and Lam, 2017; Clercq et al, 2017; He et al, 2017; Xu et al, 2018; Yu et al, 2019), SC (Jiang et al, 2011; Mohammad et al , 2013; Kiritchenko et al., 2014; Dong et al., 2014; Vo and Zhang, 2015; Ma et al., 2017; Wang et al., 2018b; Zhu Qian, 2018; Chen and Qian, 2019; Zhu et al., 2019). Some adopt the auxiliary task Opinion Item Extraction (OE), and exploit the relationship between them to improve the performance of AE. For the complete ABSA task, the results of the two steps must be combined in a pipeline. In this way, the relationship between AE/OE and SC is completely ignored, and the error of upstream AE/OE will be propagated to downstream SC. For the pipeline method, the overall performance of the ABSA task is not ideal.

A Unified Approach   In recent years, several studies have attempted to solve the ABSA task under a unified framework. Unified approaches fall into two categories: fold labeling (Mitchell et al., 2013; Zhang et al., 2015; Wang et al., 2018a; Li et al., 2019) and joint training (He et al., 2019; Luo et al., 2019). The former combines the labels of AE and SC to construct folding labels such as {B-senti, I-senti, O}. Subtasks need to share all trainable features indiscriminately, which is likely to confuse the learning process. Furthermore, the relationship between subtasks cannot be explicitly modeled for this type of approach. At the same time, the latter constructs a multi-task learning framework, each subtask has independent labels, and can share and private features. This enables the interaction between different subtasks to be explicitly modeled for joint training methods. However, none of the existing studies have fully exploited the power of this relationship.

The difference between our work and the aforementioned methods is that we propose a unified framework that exploits all binary and ternary relations between subtasks to enhance learning.

3 methods

3.1 Task Definition

Given a sentence Se = {w1,...,wi,...,wn}, we define the subtasks AE, OE, and SC as three sequence labeling problems. As shown below:

3.2 Model Architecture

Our proposed RACL is a unified multi-task learning framework that allows the propagation of interaction relations (denoted as the same R1..R4 (as shown in Figure 1) that can improve the performance of ABSA and can be stacked to multiple layers, in Interact with subtasks at different semantic levels. We show the overall architecture of RACL in Fig. 2(a) and the details of individual layers in Fig. 2(b).

In particular, a single RACL layer contains three modules: AE, OE, SC, where each module is designed for a corresponding subtask. These modules receive a shared representation of the input sentence and then encode its task-oriented features. Afterwards, they propagate the collaborative learning of relations R1..R4 to further enhance task-oriented features by exchanging information cues. Finally, the three modules will predict the corresponding label sequence Y^{A}, Y^{O}, according to the enhanced features.Y^{S}

In the following, we first illustrate relation-aware collaborative learning in one layer, and then show the stacking and training of the whole RACL.

3.3 Relationship-aware cooperative learning

Input word vectors  Given a sentence Se, we can map the sequence of words in Se with a pretrained word embedding (e.g. GloVe) or a pretrained language encoder (e.g. BERT) to generate word vectors E = {e1,..., ei,...,en}∈ R^{d_{w}\times n}, where dw is the dimension of the word vector. We will examine the effect of these two types of word embeddings in experiments.

Multi-task Learning with Shared-Private Scheme  In order to perform multi-task learning, different subtasks should focus on different features of the shared training samples. Inspired by the shared-private scheme (Kolobert and Weston, 2008; Liu et al., 2017), we extract shared and private features to embed task-invariant and task-oriented knowledge for AE, OE and SC modules.

To encode shared task-invariant features, we simply feed each ei in E into a fully-connected layer and generate a transformed vector hi ∈ R^{d_{h}}. Then we get a sequence of shared vectors H = {h1, ..., hi, ..., hn} ∈ R^{d_{h}\times n}, each sentence will be jointly trained by all subtasks.

Based on the shared task-invariant feature H, the AE, OE and SC modules will encode task-oriented private features for the corresponding subtasks. We choose a simple CNN as the encoder function F because of its high computational efficiency. 

For subtasks AE and OE, the key features to determine the existence of aspect words and opinion words are the representations of original words and adjacent words. Therefore, we construct two encoders to extract local AE-oriented features X^{A}and OE-oriented features X^{O}:

The feature generation process of subtask SC is different from AE/OE. To determine the sentiment polarity of an aspect term, we need to extract relevant semantic information from its context. A key issue in SC is to identify dependencies between aspect terms and their contexts. Furthermore, in the full ABSA task, the aspect term is unknown in SC, and each word in Se needs to be assigned a polarity. Based on these observations, we first encode contextual features from H X^{ctx}

Then we take the shared vector hi as the query aspect and use the attention mechanism to compute the semantic relationship between the query and the context features:

where ds_{i,j}^{(i\neq j)}denotes the dependency strength between the i-th query word and the j-th context word, and M_{i,j}^{ctx}is ds_{i,j}^{(i\neq j)}the normalized attention weight of . We add a factor log this part based on the absolute distance between two words. The rationale is that adjacent context words should contribute more to the polarity of sentiment. Finally, for the aspect query wi, we can get the global SC-oriented features by the weighted sum of all contextual features (except wi):X_{i}^{S}

Propagating Relation for Collaborative Learning  After encoding task-oriented features, we propagate interaction relations (R1..R4) among subtasks to mutually enhance AE, OE and SC modules.

(1) R1 is the pair relationship between AE and OE, indicating that there may be information clues between AE and OE. To model R1, we want AE-oriented features X^{A}and OE-oriented features X^{O}to exchange useful information according to their semantic relations. Taking subtask AE as an example, the semantic relationship between words in AE and words in OE is defined as follows:

For wi in AE, we can perform semantic relationship weighted sum of all words in OE (except wi itself), and get useful clues from OE X_{i}^{O2A}, namely: 

Then we concatenate the original AE-oriented features X^{A}and useful cues in OE X^{O2A}as the final features of AE, and feed them into a fully connected layer to predict the label of the aspect item: 

Among them W^{A} \in R^{3\times 2d_{c}}is the transformation matrix, Y^{A}\in R^{3\times n}which is the prediction mark sequence of AE.

For subtask OE, we use sr_{i,j}^{(i\neq j)}the transposed matrix in Eq. 5 to compute the corresponding M^{A2O}. In this way, the semantic relationship between AE and OE remains consistent regardless of orientation. Then we can get useful cues from AE X^{A2O}and generate predicted label sequences in a similar way Y^{O}\in R^{3\times n}, namely: 

In addition, each wi cannot be an aspect item and an opinion item at the same time, so we add a regularized hinge loss to constrain the Y^{A}sum Y^{O}:

Among them, P is the probability under the given conditions.

(2) R2 is a ternary relationship between SC and R1. Remember that the dependency between aspect items and their contexts is crucial for subtask SC, and we have M^{ctx}computed this dependency using normalized attention weights. Therefore, we can simulate R2 by M^{ctx}propagating . We use M^{O2A}it as a representative of R1 and add it M^{ctx}to indicate the influence of R1 on SC. A more formal definition of R2 is:  

In fact, M^{O2A}the dependency relationship between aspect words and context is represented from the perspective of word extraction, and M^{ctx}the dependency relationship between aspect words and context is represented from the perspective of sentiment classification. The dual-view relation R2 helps to refine the selection process of the extraction subtask and the classification subtask.

(3) R3 is the binary relationship between SC and OE, indicating that more attention should be paid to the extracted opinion items when predicting emotional polarity. Y^{O}To model R3, similar to the approach of R2, we update SC in SC  with the sequence of labels generated in OE :M^{ctx}

By doing so, opinion terms can get a larger weight in the attention mechanism. Therefore, they will contribute more to the prediction of emotional polarity.

After obtaining M^{ctx}the interaction value, we can recompute the SC-oriented features in Equation 4 accordingly X^{S}. Then we X^{S}concatenate H and S as the final features of SC and feed them into a fully connected layer to predict the sentiment polarity of candidate aspect items: 

Among them W^{S} \in R^{3 \times 2d_{h}}is the transformation matrix, Y^{S}\in R^{3\times n}and is the prediction mark sequence of SC. 

(4) R4 is the binary relationship between SC and AE, indicating that the results of AE are helpful to supervise the training of SC. Obviously, only aspect words have emotional polarity. While SC needs to assign a polarity to each word, during training we know the ground truth aspect terms in AE. Therefore, we refine the labeling process directly using \hat{Y}^{A}AE of ground-truth labeled sequences. Specifically, only predicted labels for true facet items are computed during training: 

Equals 1 if wi is an aspect item I(\hat{Y}_{i}^{A}), 0 if not. Note that this method is only used during training.

3.4 Stack RACL to multiple layers

When using a single RACL layer, the AE, OE, and SC modules only extract corresponding features at a lower language level, which may not be sufficient as evidence for labeling each word. Therefore, we stack RACL to multiple layers to obtain high-level semantic features of subtasks, which facilitates deep collaborative learning. 

where T ∈ {A, O, S} represents a specific subtask, and L is the number of layers. This shortcut-like architecture can make the low-level features meaningful and informative, helping the high-level to make better predictions.

3.5 Training process

S_{e}After generating token sequences for sentences Y^{A}, Y^{O}and Y^{S}, we compute the cross-entropy loss for each subtask:

where T ∈ {A, O, S} denotes a subtask, n is the length of Se, J is the category of the label, y_{i}^{T}and \hat{y_{i}^{T}}are the predicted label and the ground truth label. 

The final loss L of RACL is a combination of subtask loss and regularization loss, which is the figure below, where λ is a coefficient. Then we train all parameters with backpropagation.

4 experiments

4.1 Datasets and settings

Datasets  We evaluate RACL on three real ABSA datasets from SemEval 2014 (Pontiki et al, 2014) and 2015 (Pontiki et al, 2015), which include reviews from two domains: restaurants and laptops. The original dataset only has ground truth labels for aspect terms and corresponding sentiment polarity, while the labels for opinion terms are annotated by two previous works (Wang et al, 2016, 2017). All datasets have a fixed train/test split. We further randomly sample 20% of the training data as the development set to tune the hyperparameters, and only use the remaining 80% for training. Table 2 summarizes the statistics of the dataset.

Setup  We examine RACL with two types of word embeddings: pretrained word embeddings and pretrained language encoders. In terms of word embedding implementation, we follow previous studies (Xu et al., 2018; He et al., 2019; Luo et al., 2019) and use two types of embeddings, generic and domain-specific. The former comes from GloVe vectors with 840B tokens (Pennington et al, 2014), and the latter is trained on a large domain-specific corpus using fastText and published by Xu et al. (2018). The two types of embeddings are concatenated as word vectors. In the language encoder implementation, we follow Hu et al. (2019) using BERTLarge (Devlin et al., 2019) as the backbone and fine-tuning it during training. We denote these two implementations as RACLGloVe and RACL-BERT, respectively.

For RACL-GloVe, we set the dimensions dw=400, dh=400, dc=256, and the coefficient λ=1e-5. Other hyperparameters were tuned in the development set. For the three datasets, the kernel size K and the number of layers L of the CNN are set to {3,3,5} and {4,3,4}, respectively. We train models in constant time using the Adam optimizer (Kingma and Ba, 2015) with a learning rate of 1e-4 and a batch size of 8. For RACLBERT, we set dw to 1024, set learning rate to 1e-5 for fine-tuning BERT, and other hyperparameters are directly inherited from RACL-GloVe.

We use four metrics for evaluation, namely AE-F1, OE-F1, SC-F1 and ABSA-F1. The first three represent the f1 score for each subtask, while the last measures overall performance on completing the ABSA. When calculating ABSA-F1, the result of the aspect term is only considered correct if both the AE and SC results are correct. The model that achieves the smallest loss on the dev set is used for evaluation on the test set.

Baselines To demonstrate the effectiveness of RACL on the full ABSA task, we compare  it to the following pipeline and unified baselines. The baseline hyperparameters are set to the optimal values ​​reported in their paper.

→{CMLA, DECNN}+{TNet, TCap}: CMLA (Wang et al, 2017) and DECNN (Xu et al, 2018) are state-of-the-art AE methods, while TNet (Li et al, 2018a) and T(rans )Cap (Chen and Qian, 2019) is the best performing SC method. Then we build four pipeline baselines by combination.

MNN (Wang et al, 2018a): is a unified method using a fold labeling scheme for AE and SC.

E2E-ABSA (Li et al, 2019): is a unified approach using a fold labeling scheme for AE and SC, introducing an auxiliary OE task without explicit interaction.

DOER (Luo et al, 2019): is a multi-task unified approach to jointly train AE and SC, explicitly modeling the relation R4.

IMN-D (He et al, 2019): is a unified method for joint training of separate labels for AE and SC. The OE task is fused into AE to construct 5 categories of labels. It explicitly models relations R3 and R4.

SPAN-BERT (Hu et al, 2019): is a pipeline method using BERTLarge as the backbone. AE employs a multi-objective extractor, and SC employs a polarity classifier.

IMN-BERT : is an extension of the best unified baseline IMN-D with BERTLarge. By doing so, we hope to provide convincing comparisons to BERT-style methods. The input dimension and learning rate of IMN-BERT are the same as our RACL-BERT, and other hyperparameters are inherited from IMN-D.

4.2 Comparison results

The comparison results of various methods are shown in Table 3. These methods are divided into three groups: M1~M4 are glove-based pipeline methods, M5~M9 are glove-based unified methods, and M10~M12 are BER-based methods.

First, among all glove-based methods (M1 ~ M9), we can observe that RACL-GloVe consistently outperforms all baselines in terms of the overall indicator ABSA-F1, and achieves 2.12 on the strongest baseline on the three datasets. %, absolute returns of 2.92% and 2.40%. The results show that jointly training each subtask and comprehensively modeling the interaction relationship is the key to improving the overall performance of the ABSA task. Furthermore, RACL-GloVe also achieves optimal or suboptimal results on all subtasks. This further illustrates that collaborative learning can enhance the learning process of each subtask. It is also found from M1 ~ M9 that the unified method (M5 ~ M9) performs better than the pipeline method (M1 ~ M4).

Second, in the unified glove-based approach, RACL-GloVe, IMN-D, and DOER outperform MNN and E2E-TBSA overall. This may be due to the fact that the first three methods explicitly model the interaction relationship between subtasks, while the latter two do not. We noticed that the SC-F1 score of DOER is very low. The reason may be that it utilizes an auxiliary sentiment vocabulary to enhance the vocabulary with "positive" and "negative" sentiments. Doers had difficulty processing words with "neutral" emotions, which resulted in lower SC-F1 scores.

Third, BER-based methods (M10 ~ M12) exploit the large-scale external knowledge encoded in the pre-trained BERTLarge backbone and achieve better performance than glove-based methods. Specifically, SPAN-BERT reduces the search space by multi-objective extractors and is a strong baseline for AE on subtasks. However, since there is no interaction between subtasks, it cannot capture the dependencies between the extracted aspect items in AE and the opinion items in SC, thus the performance on SC drops a lot. IMN-BERT scores higher on OE and SC, but performs worst among the three on AE without the guidance of R1 and R2 relations. In contrast, RACL-BERT achieves significantly higher overall scores than SPAN-BERT and IMN-BERT on all three datasets. This again shows the superiority of our RACL framework to accomplish the ABSA task by using all interaction relations.

5 Analysis

5.1 Ablation Study

To study the effect of different relations on RACL-GloVe/-BERT, we conduct the following ablation study. We remove each interaction relationship in turn, resulting in four simplified variables. 

As expected, all simplified variants in Table 4 have ABSA-F1 performance degradation. The results clearly demonstrate the effectiveness of the proposed relationship. Furthermore, we find that these relationships play a more important role on small datasets than on large datasets. The reason may be that it is difficult to train complex models on small datasets, and relations can absorb external knowledge from other subtasks.

5.2 Effect of hyperparameters

There are two key hyperparameters in our model: the kernel size K and the number of layers L of the CNN encoder. To study their impact, we first vary K in the range [1,9] of step 2 while fixing L to the value in Section 4.1, and then vary L in the range of [1,7] of step 1, K is fixed at the same time. 

We only show the ABSA-F1 results of RACLGloVe in Figure 3, because the hyperparameters of RACL-BERT are inherited from RACL-GloVe.

In Fig. 3(a), K=1 yields very poor performance because the original features are only generated from the current word. Increasing K to 3 or 5 can widen the receptive field and significantly improve performance. However, when K is further increased to 7 or 9, many irrelevant words are added as noise, degrading the performance. In Fig. 3(b), increasing L can expand the learning ability to a certain extent and achieve high performance. However, too many layers introduce too many parameters, making the learning process too complicated.

5.3 Case studies

Using case studies as examples, this section analyzes in detail the results of several examples, using different approaches. We choose CMLA+TCap (denoted as PIPELINE), IMN-D and RACL-GloVe as three competitors. We did not include BERT-based methods because we wanted to investigate the functionality of the model without external resources.

S1 and S2 verify the validity of relation R1. In S1, two baselines incorrectly extract “offers” as an opinion term as “easy” due to the presence of the conjunction “and”. In contrast, RACLGloVe can successfully filter out “offers” in OE by using R1. The reason is that "offers" never co-occurs as an opinion term with the aspect term "OS" in the training set, and R1, which connects the AE subtask and the OE subtask, will treat them as irrelevant terms. This information will be passed to the OE subtask during the test phase. Similarly, in S2, both baselines fail to recognize “looking” as an aspect term, since it may be a present participle of “look” without opinion information. Instead, RACL-GloVe correctly labels it R1, providing useful cues from the opinion terms "faster" and "sleeker".

S3 shows the superiority of the R2 relationship, which is the key to connect the three subtasks, but has never been used in previous studies. Both baselines successfully extracted "sweet spots" and "to die for" for both AE and OE, but assigned the incorrect sentiment polarity of "neutral" even though IMN-D emphasized opinion terms. The reason is that these two items do not appear at the same time in the training samples, and it is difficult for SC to identify their dependencies. In contrast, since “Dessert” and “die for” are typical words in AE and OE, RACL-GloVe is able to encode their dependencies in R1. By using R2 to propagate R1 to the SC, RACL-GloVe can assign the correct polarity to the "sweet spot". For a closer look, we visualize the average predictions (left) and attention weights (right) across all layers in Figure 4. Obviously, "Dessert" wasn't originally focused on M^{ctx-before}"die for". After being M^{O2A}enhanced with OE, M^{ctx-after}the opinion words were successfully highlighted, and SC made correct predictions. 

S4 shows the benefits of relation R3. IMN-D and RACL-GloVe gave the correct polarity to "Sushi" in SC as they were both directed by "fresh" in OE, while PIPELINE got lost in context without the help of opinion terms Wrong prediction was made. Note that S1~S4 also demonstrate the necessity of R4, since RACL-GloVe is not affected by background words and can make correct sentiment predictions in all examples. 

5.4 Calculation cost analysis

To demonstrate that our RACL model does not incur high computational costs, we compare it with two strong baselines DOER and IMN-D in terms of number of parameters and running time. We run three models on a 1080Ti GPU using the same Restaurant 2014 dataset with a batch size of 8, and the results are shown in Table 6. Obviously, our proposed RACL has similar computational complexity to IMN-D, and they are both much simpler than DOER.

6 Summary 

In this paper, we emphasize the importance of interaction relations in accomplishing the ABSA task. To exploit these relations, we propose a Relation-Aware Collaborative Learning (RACL) framework with multi-task learning and relation propagation techniques. Experiments on three real datasets show that our RACL framework and its two implementations outperform state-of-the-art pipelines and unified baselines for the full ABSA task.

7 thanks

Guess you like

Origin blog.csdn.net/Starinfo/article/details/129930778