Improving Neural RST Parsing Model with Silver Agreement Subtrees

Most of the previous Rhetorical Structure Theory (RST) parsing methods are based on supervised learning such as neural networks, that require an annotated corpus of sufficient size and quality. However, the RST Discourse Treebank (RST-DT), the benchmark corpus for RST parsing in English, is small due to the costly annotation of RST trees. The lack of large annotated training data causes poor performance especially in relation labeling. Therefore, we propose a method for improving neural RST parsing models by exploiting silver data, i.e., automatically annotated data. We create large-scale silver data from an unlabeled corpus by using a state-of-the-art RST parser. To obtain high-quality silver data, we extract agreement subtrees from RST trees for documents built using the RST parsers. We then pre-train a neural RST parser with the obtained silver data and fine-tune it on the RST-DT. Experimental results show that our method achieved the best micro-F1 scores for Nuclearity and Relation at 75.0 and 63.2, respectively. Furthermore, we obtained a remarkable gain in the Relation score, 3.0 points, against the previous state-of-the-art parser.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


Ranked #2 on Discourse Parsing on RST-DT (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Discourse Parsing RST-DT Top-down Span-based Parser with Silver Agreement Subtrees (ensemble) RST-Parseval (Span) 87.1 # 2
RST-Parseval (Nuclearity) 75.0 # 2
RST-Parseval (Relation) 63.2 # 1
RST-Parseval (Full) 62.6 # 1
Discourse Parsing RST-DT Top-down Span-based Parser with Silver Agreement Subtrees RST-Parseval (Span) 86.8 # 4
RST-Parseval (Nuclearity) 74.7 # 3
RST-Parseval (Relation) 62.5 # 2
RST-Parseval (Full) 61.8 # 2

Methods


No methods listed for this paper. Add relevant methods here