Cross-lingual RST Discourse Parsing

Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.

PDF Abstract EACL 2017 PDF EACL 2017 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Discourse Parsing RST-DT Transition-Based Parser Trained on Cross-Lingual Corpus RST-Parseval (Span) 81.3* # 10
RST-Parseval (Nuclearity) 68.1* # 10
RST-Parseval (Relation) 56.3* # 10
RST-Parseval (Full) 56.0* # 5

Methods


No methods listed for this paper. Add relevant methods here