Fast and Accurate Neural CRF Constituency Parsing

IJCAI 2020  ·  Yu Zhang, Houquan Zhou, Zhenghua Li ·

Estimating probability distribution is one of the core issues in the NLP field. However, in both deep learning (DL) and pre-DL eras, unlike the vast applications of linear-chain CRF in sequence labeling tasks, very few works have applied tree-structure CRF to constituency parsing, mainly due to the complexity and inefficiency of the inside-outside algorithm. This work presents a fast and accurate neural CRF constituency parser. The key idea is to batchify the inside algorithm for loss computation by direct large tensor operations on GPU, and meanwhile avoid the outside algorithm for gradient computation via efficient back-propagation. We also propose a simple two-stage bracketing-then-labeling parsing approach to improve efficiency further. To improve the parsing performance, inspired by recent progress in dependency parsing, we introduce a new scoring architecture based on boundary representation and biaffine attention, and a beneficial dropout strategy. Experiments on PTB, CTB5.1, and CTB7 show that our two-stage CRF parser achieves new state-of-the-art performance on both settings of w/o and w/ BERT, and can parse over 1,000 sentences per second. We release our code at https://github.com/yzhangcs/crfpar.

PDF Abstract IJCAI 2020 PDF IJCAI 2020 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Constituency Parsing CTB5 CRF Parser F1 score 89.80 # 6
Constituency Parsing CTB5 CRF Parser + BERT F1 score 92.27 # 4
Constituency Parsing CTB7 CRF Parser + BERT F1 score 91.55 # 2
Constituency Parsing CTB7 CRF Parser + Electra F1 score 91.92 # 1
Constituency Parsing CTB7 CRF Parser F1 score 88.60 # 3
Constituency Parsing Penn Treebank CRF Parser + BERT F1 score 95.69 # 9
Constituency Parsing Penn Treebank CRF Parser F1 score 94.12 # 18
Constituency Parsing Penn Treebank CRF Parser + RoBERTa F1 score 96.32 # 5

Methods