Designing the Business Conversation Corpus

While the progress of machine translation of written text has come far in the past several years thanks to the increasing availability of parallel corpora and corpora-based training technologies, automatic translation of spoken text and dialogues remains challenging even for modern systems. In this paper, we aim to boost the machine translation quality of conversational texts by introducing a newly constructed Japanese-English business conversation parallel corpus. A detailed analysis of the corpus is provided along with challenging examples for automatic translation. We also experiment with adding the corpus in a machine translation training scenario and show how the resulting system benefits from its use.

PDF Abstract WS 2019 PDF WS 2019 Abstract

Datasets


Introduced in the Paper:

Business Scene Dialogue

Results from the Paper


 Ranked #1 on Machine Translation on Business Scene Dialogue JA-EN (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Machine Translation Business Scene Dialogue EN-JA Transformer-base BLEU 13.53 # 1
Machine Translation Business Scene Dialogue JA-EN Transformer-base BLEU 12.88 # 1

Methods