This is the dataset for the 2020 Duolingo shared task on Simultaneous Translation And Paraphrase for Language Education (STAPLE). Sentence prompts, along with automatic translations, and high-coverage sets of translation paraphrases weighted by user response are provided in 5 language pairs. Starter code for this task can be found here: github.com/duolingo/duolingo-sharedtask-2020/. More details on the data set and task are available at: sharedtask.duolingo.com
10 PAPERS • NO BENCHMARKS YET
StyleKQC is a style-variant paraphrase corpus for korean questions and commands. It was built with a corpus construction scheme that simultaneously considers the core content and style of directives, namely intent and formality, for the Korean language. Utilizing manually generated natural language queries on six daily topics, the corpus was expanded to formal and informal sentences by human rewriting and transferring.
2 PAPERS • NO BENCHMARKS YET