We propose a new diagnostics test suite which allows to assess whether a dataset constitutes a good testbed for evaluating the models' meaning understanding capabilities.
In this paper we introduce a new natural language processing dataset and benchmark for predicting prosodic prominence from written text.
Ranked #1 on Prosody Prediction on Helsinki Prosody Corpus
In this paper, we present the University of Helsinki submissions to the WMT 2019 shared task on news translation in three language pairs: English-German, English-Finnish and Finnish-English.
We show that models trained on a natural language inference dataset drawn from one benchmark fail to perform well in others, even if the notion of inference assumed in these benchmarks is the same or similar.
We can show that the sentence embeddings learned in this way can be utilized in a wide variety of transfer learning tasks, outperforming InferSent on 7 out of 10 and SkipThought on 8 out of 9 SentEval sentence embedding evaluation tasks.
Ranked #7 on Natural Language Inference on SciTail