Constituency Parsing with a Self-Attentive Encoder

ACL 2018  ·  Nikita Kitaev, Dan Klein ·

We demonstrate that replacing an LSTM encoder with a self-attentive architecture can lead to improvements to a state-of-the-art discriminative constituency parser. The use of attention makes explicit the manner in which information is propagated between different locations in the sentence, which we use to both analyze our model and propose potential improvements. For example, we find that separating positional and content information in the encoder can lead to improved parsing accuracy. Additionally, we evaluate different approaches for lexical representation. Our parser achieves new state-of-the-art results for single models trained on the Penn Treebank: 93.55 F1 without the use of any external data, and 95.13 F1 when using pre-trained word representations. Our parser also outperforms the previous best-published accuracy figures on 8 of the 9 languages in the SPMRL dataset.

PDF Abstract ACL 2018 PDF ACL 2018 Abstract


Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Constituency Parsing CTB5 Kitaev etal. 2018 F1 score 87.43 # 8
Constituency Parsing Penn Treebank Self-attentive encoder + ELMo F1 score 95.13 # 13