Grammar as a Foreign Language

Syntactic constituency parsing is a fundamental problem in natural language processing and has been the subject of intensive research and engineering for decades. As a result, the most accurate parsers are domain specific, complex, and inefficient. In this paper we show that the domain agnostic attention-enhanced sequence-to-sequence model achieves state-of-the-art results on the most widely used syntactic constituency parsing dataset, when trained on a large synthetic corpus that was annotated using existing parsers. It also matches the performance of standard parsers when trained only on a small human-annotated dataset, which shows that this model is highly data-efficient, in contrast to sequence-to-sequence models without the attention mechanism. Our parser is also fast, processing over a hundred sentences per second with an unoptimized CPU implementation.

PDF Abstract NeurIPS 2015 PDF NeurIPS 2015 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Constituency Parsing Penn Treebank Semi-supervised LSTM F1 score 92.1 # 23

Methods


No methods listed for this paper. Add relevant methods here