Improving Discourse Relation Projection to Build Discourse Annotated Corpora

RANLP 2017  ·  Majid Laali, Leila Kosseim ·

The naive approach to annotation projection is not effective to project discourse annotations from one language to another because implicit discourse relations are often changed to explicit ones and vice-versa in the translation. In this paper, we propose a novel approach based on the intersection between statistical word-alignment models to identify unsupported discourse annotations. This approach identified 65% of the unsupported annotations in the English-French parallel sentences from Europarl. By filtering out these unsupported annotations, we induced the first PDTB-style discourse annotated corpus for French from Europarl. We then used this corpus to train a classifier to identify the discourse-usage of French discourse connectives and show a 15% improvement of F1-score compared to the classifier trained on the non-filtered annotations.

PDF Abstract RANLP 2017 PDF RANLP 2017 Abstract

Datasets


Introduced in the Paper:

Europarl ConcoDisco Dataset

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here