One SPRING to Rule Them Both Symmetric AMR Semantic Parsing and Generation without Complex Pipeline

One SPRING to Rule Them Both: Symmetric AMR Semantic Parsing and Generation without a Complex Pipeline

Paper: https://ojs.aaai.org/index.php/AAAI/article/view/17489

Code: https://github.com/SapienzaNLP/spring

Journal/Conference: AAAI 2021

Summary

In text-to-AMR parsing, current state-of-the-art semantic parsers integrate tedious pipelines of several different modules or components and utilize graph reclassification, a set of content-specific heuristics developed on the basis of the training set. However, the generalizability of graph reclassification in the out-of-distribution setting is unclear. In contrast, the state-of-the-art AMR-to-Text generation can be seen as the inverse of parsing, based on the simpler seq2seq model. In this paper, we convert Text-to-AMR and AMR-to-text to symmetric translation tasks and show that by designing careful graph linearization and extending the pretrained encoder-decoder model, the exact same seq2seq can be used method, namely SPRING (Symmetric PaRsIng aNd Generation), achieves state-of-the-art performance in these two tasks. Our model does not require complex pipelines nor heuristics based on a large number of assumptions. The fact that we drop the need for graph reclassification shows that this technique is actually harmful outside of standard benchmarks. Finally, we substantially outperform the previous state-of-the-art on the English AMR 2.0 dataset: on Text-to-AMR we achieve 3.6 Smatch points of improvement, while on AMR-to-Text we outperform the state-of-the-art There are 11.2 BLEU points higher in technology.

1 Introduction

Recent state-of-the-art Text-to-AMR semantic parsing methods have very complex pre- and post-processing pipelines, where the outputs of several different components are integrated. Furthermore, they employ fine-grained, content-specific heuristics developed based on the training set , and thus, these methods can be very fragile across domains and genres. To date, simpler, complete sequence-to-sequence (seq2seq) methods have lagged behind in parsing performance, mainly because they are not as data-efficient as other methods.

When it comes to AMR-to-Text generation (which can be viewed as the reverse task of Text-to-AMR parsing), the original seq2seq method instead achieves state-of-the-art results. This architectural asymmetry is not observed in other bidirectional translation tasks, such as machine translation, where the same architecture is used to process translation from language XX toX to languageYYTranslation of Y and vice versa. Inspired by this, a key goal of this paper is to achieve symmetry in AMR parsing and generation by providing the same architecture for both. Furthermore, we reduce the complexity of the Text-to-AMR architecture by eliminating the need for content modification pipelines and additional syntactic and semantic features, which typically rely on external components and data-specific heuristics. We achieve this by efficiently linearizing the AMR graph and by extending a pretrained seq2seq model, BART (Lewis et al, 2020), to handle AMR-to-Text and Text-to-AMR. In fact, the only external source of consistent benefit to our model is an off-the-shelf system for entity linking—a task that is difficult to perform robustly with pure seq2seq models.

Our contributions are summarized as follows:

  1. We extend the pretrained Transformer encoder-decoder architecture to generate exact linearizations of AMR graphs for sentences, or vice versa, to generate sentences for linearization of AMR graphs.
  2. Contrary to previous reports (Konstas et al 2017), we found that the choice between competing graph isomorphic linearizations does matter. Our proposed depth-first search (DFS)-based linearization with special pointer marking outperforms PENMAN linearization and similar breadth-first search (BFS)-based alternatives, especially in terms of AMR-to-Text.
  3. We propose a novel Out-of-Distribution (OOD) setting to estimate the ability of Text-to-AMR and AMR-to-Text methods to generalize on open world data.
  4. We show that graph reclassification on open-world data should be avoided because, although it slightly improves performance on standard benchmarks, it fails to generalize in the OOD setting.
  5. For the generation task, we outperform the previously reported best results in AMR 2.0 by 11.2 BLEU points and for the parsing task by 3.6 Smatch points.

2. Related work

2.1 Text-to-AMR parsing

pure seq2seq : Seq2seq models Text-to-AMR parsing as a linearization that converts sentences into AMR graphs. Due to its end-to-end nature, this approach is attractive for this task. However, due to the large amount of data required by seq2seq-based methods, their performance on AMR parsing has so far been rather unsatisfactory due to the relatively small number of labeled sentence AMR pairs. Various techniques have been employed to overcome data sparsity: self-training using unlabeled English text (Konstas et al, 2017), using character-level networks (van Noordn and Bos 2017), and concept reclassification as A preprocessing step to reduce open lexical components such as named entities and dates (Peng et al 2017; van Noord and Bos 2017; Konstas et al 2017). Furthermore, seq2seq-based models often incorporate features such as lemmas, POS or named entity recognition (NER) labels, as well as syntactic and semantic structures (Ge et al, 2019).

To counteract sparsity, we use transfer learning by leveraging BART (Lewis et al 2020), a recently published pretrained encoder-decoder, to progressively generate linearized graphs via single-shot autoregressions of the seq2seq decoder . In fact, BART's basic Transformer encoder-decoder is similar to that of Ge et al. (2019), but differs in that it trains the AMR parsing architecture from scratch.

hybrid approaches : State-of-the-art results on Text-to-AMR have been obtained by using methods with more complex multi-module architectures. These methods combine seq2seq methods with graph-based algorithms in two-stage (Zhang et al 2019a) or incremental one-stage (Zhang et al 2019b; Cai and Lam 2020a) procedures. Furthermore, they integrate similar processing pipelines and additional features as the aforementioned seq2seq method (Konstas et al 2017), including fine-grained graph reclassification (Zhang et al 2019a,b; Zhou et al 2020; Cai and Lam 2020a), These all contribute significantly to the achieved performance.

In contrast, our model relies almost entirely on seq2seq, requires no additional features, and uses only a basic post-processing pipeline for graph validity. Nonetheless, our performance is significantly better than previous state-of-the-art methods. Furthermore, we show that extensive reclassification techniques, while improving performance on benchmarks in traditional domains, are detrimental in the OOD setting . Furthermore, while other methods have employed pretrained encoders, such as BERT (Devlin et al 2019), in order to provide powerful capabilities for parsing architectures (Zhang et al 2019a,b; Cai and Lam 2020a), we are the first to demonstrate that pretrained The trained decoder also benefits AMR parsing, although the pre-training only involves English and does not include formal representations.

2.2 AMR-to-Text Generation

There are currently two main approaches to the generation of AMR-to-Text Generation: explicitly encoding the graph structure in a graph-to-text transformation (Song et al 2018; Beck, Haffari, and Cohn 2018; Damonte and Cohen 2019; Zhu et al 2019 ; Cai and Lam 2020b; Yao, Wang, and Wan 2020), or via AMR graph linearization as a pure seq2seq task (Konstas et al 2017; Mager et al 2020). Recent graph-based methods rely on Transformers to encode AMR graphs (Zhu et al 2019; Cai and Lam 2020b; Wang, Wan, and Yao 2020; Song et al 2020; Yao, Wang, and Wan 2020). The model of Mager et al. (2020) is a pretrained Transformer-only decoder model fine-tuned on the sequential representation of the AMR graph. Instead, we use an encoder-encoder architecture, which is better suited to handle conditional generation, and AMR-to-Text, making it symmetric to Text-to-AMR, thus removing the need for a task-specific model.

2.3 Linearized Information Loss

Previous Text-to-AMR parsing methods (Konstas et al 2017; van Noord and Bos 2017; Peng et al 2017; Ge et al 2019) combined the seq2seq method with a lossy linearization technique, in order to reduce the complexity, from Fig. Delete variables and other information. This information is recovered heuristically, which makes it harder to produce some valid output. Instead, we propose two linearization techniques that are completely isomorphic to graphs and do not incur any information loss.

2.4 BART

BART is a Transformer-based encoder-decoder model pre-trained on the self-supervised task of denoising, i.e. reconstructing English text modified with shuffling, sentence permutation, masking, and other types of destruction (Lewis et al, 2020) . BART shows significant improvements on conditional generation tasks where the vocabularies of input and output sequences largely intersect, such as question answering and summarization. Likewise, a large number of AMR labels are extracted from English vocabulary—even though AMR aims to abstract from sentences—thus, we hypothesized that BART's denoising pre-training should also be applicable to AMR-to-text and text-to-AMR. Furthermore, a similarity between BART's pre-training task and AMR-to-Text generation can be seen, as a linearized AMR graph can be viewed as a reordered, partially corrupted version of an English sentence, which the model has to reconstruct.

3. Method

We perform Text-to-AMR Parsing and AMR-to-Text Generation using the same architecture (i.e., SPRING), which leverages BART's transfer learning capabilities for both tasks. In SPRING, AMR graphs are processed symmetrically: for Text-to-AMR Parsing, an encoder-decoder is trained to predict the graph for a given sentence; for AMR-to-Text Generation, another mirror encoder-decoder is trained to predict a sentence given a graph.

To use graphs in a seq2seq model, we convert them to symbolic sequences using a variety of linearization techniques (Section 3.1). Furthermore, we modify the BART vocabulary to adapt it to AMR concepts, frames and relations (Section 3.2). Finally, we define lightweight, non-content modification heuristics to account for the fact that seq2seq can output strings that cannot be decoded as graphs (Section 3.3).

3.1 Graph linearization

In this work, we use a linearization technique that is fully graph isomorphic, i.e., a graph can be encoded as a sequence of symbols and then decoded back to a graph without loss of adjacency information. We propose to use special tokens <R0>,<R1>,...,<Rn>to represent variables in linearized graphs and handle co-referencing nodes. As with variable names in PENMAN, the encoding used in the AMR distribution file, whenever this special marker occurs more than once, in our encoding it signals that a given node plays multiple roles in the graph. With this modification, we aim to resolve the confusion caused by using seq2seq with PENMAN (PM), since variable names have no semantic meaning and thus do not allow a clear distinction between constants and variables. Our special token method is combined with two graph traversal techniques based on DFS and BFS respectively; moreover, we also conduct experiments with PENMAN. In Figure 1, we show the linearization of the AMR plot for "You told me to wash the dog".

DFS-based : DFS, on which PENMAN is based, is attractive because it is closely related to the way natural language syntax trees are linearized: for example, consider the sentence "the dog which ate the bone which my father found is sleeping", where The noun dog is far removed from its head verb sleeping, because the subordinate word of dog is fully "explored" before the head verb. Therefore, we use a DFS-based linearization with special markers to indicate variables and parentheses to mark access depth. Additionally, we took care of redundant slash marks (/). These features significantly reduce the length of the output sequence compared to PENMAN, where variable names are often split into multiple sub-tokens by a subword tokenizer. This is important for efficient seq2seq decoding with Transformers, since Transformers are limited by the quadratic complexity of the attention mechanism.

BFS-based : BFS traversal is used because it enforces a locality principle by which things that belong together are close to each other in a flat representation. Furthermore, Cai and Lam (2019) argue that BFS is cognitively attractive because it corresponds to a core semantic principle that assumes that the most meaningful parts are represented at upper layers of the graph. To this end, we propose a BFS-based linearization that, like our DFS-based linearization, uses special tokens to represent co-references. We apply the BFS graph traversal algorithm, which starts from the graph root rrr start, access by edgeeeAll child nodes wwconnected by ew , append pointer labels tor , er,er,e , then ifwww is a variable then appends the pointer tag, or ifwwIf w is a constant, its value is appended. When attaching a pointer marker for the first time, we also attach its:instanceproperties. At the end of each level of iteration, i.e. after visiting the childwwAfter w , we append a special<stop>token to indicate the end node exploration. In Figure 1, access starts attell-01the beginning, iterates over its children, and<stop>continues execution afterwardwash-01.

Edge ordering : All the above linearizations are decoded to the same graph. However, in PENMAN linearized gold annotations, edge ordering can be extracted from each AMR graph. It has been suggested (Konstas et al, 2017) that annotators exploit this possibility to encode information about the ordering of arguments in the source sentence. Our preliminary experiments confirm that imposing an edge ordering different from PENMAN has a strong negative impact on the evaluation measures of AMR due to the order sensitivity of AMR-to-Text generation. To control this situation, we carefully designed the linearization to preserve the order information.

3.2 Vocabulary

BART uses a subword vocabulary and its tokenization is optimized to handle English, but it is not well suited for AMR tokens. To address this, we extend BART's tokenization vocabulary by adding i) all relations and frames that occur at least 5 times in the training corpus; ii) components of AMR tokens such as; iii) required for various graph :oplinearization special token. Furthermore, we adjust the encoder and decoder embedding matrices to include new symbols by adding a vector initialized as the average of the subword components. Adding AMR-specific symbols to vocabulary expansion avoids extensive sub-symbol partitioning, allowing AMRs to be encoded as more compact sequences of symbols, reducing decoding space and time requirements.

Reclassify . Reclassification is a popular technique for reducing vocabulary size to deal with data sparsity. It simplifies the graph by removing aware nodes, wiki links, polarity attributes, and/or anonymous named entities. To evaluate the contribution of reclassification, we conduct experiments on methods commonly used in the AMR parsing literature (Zhang et al 2019a,b; Zhou et al 2020; Cai and Lam 2020a). The approach is based on string matching heuristics and a mapping tailored to the training data, which also regulates the recovery process at inference time. We refer readers to Zhang et al. (2019a) for more details. We note that, following common practice, we only use reclassification techniques in parsing, as the resulting information loss can be much higher.

3.3 Post-processing

In our approach, we perform light post-processing, mainly to ensure the validity of the generated graphs in parsing. To do this, we restore parenthesis parity in PENMAN and DFS, and remove any tokens whose preceding tokens cannot be consecutive. <stop>For BFS, we recover a valid set of triples between each subsequent token pair. Our method removes content limited to a few tokens, usually duplicates. We note that unrecoverable graphs are very rare, approximately below 0.02% in out-of-distribution data, with negligible impact on overall performance. Furthermore, we integrate an external entity linker to handle wikiification, since edge cases are difficult to handle with pure seq2seq. We use a simple string matching method to search the input sentence for :wikimentions of each attribute that SPRING predicts in the graph, then run the off-the-shelf BLINK entity linker (Wu et al 2020) and overlay the predictions.

4. Experiment

Dataset :

  • Within the distribution range: AMR2.0 (LDC2017T10), AMR3.0 (LDC2020T02).
  • Outside the distribution range: New3, TLP, Bio
  • silver data

model :

Experimental results :

5. Summary

In this paper, we propose a simple, symmetric approach for performing state-of-the-art text-to-AMR parsing and AMR-to-text generation using a single seq2seq architecture. To achieve this, we extend the Transformer encoder-decoder model pre-trained for English text denoising to AMR. Furthermore, we also propose a new DFS-based linearization of AMR graphs, which does not cause any information loss in addition to being more compact than its alternatives. Most importantly, we drop most of the requirements of competing methods: cumbersome pipelines, heavy heuristics (often tailored from the training data), and most external components. Despite this reduced complexity, we substantially outperform previous state-of-the-art in both parsing and generation, achieving 83.8 Smatch and 45.3 BLEU, respectively. We also propose an out-of-distribution setting that enables evaluation on genres and domains different from the training set. Thanks to this setup, we were able to show that reclassification techniques or integration of silver data - popular techniques for improving performance - hurt parsing and generation performance. With simpler methods like ours, based on lighter assumptions, more robust generalization can be achieved. Here we show the model's generalizability across different data distributions and across domains, while leaving the cross-lingual extensions in Blloshmi, Tripodi, and Navigli (2020) and the extensions in Formalism (Navigli 2018) to future work . Finally, we invite the community to use the OOD evaluation to develop more robust automatic AMR methods. Furthermore, we believe our contribution will open more directions for the integration of parsing and generation.

a

Guess you like

Origin blog.csdn.net/qq_45041871/article/details/130986720