This paper presents the results of the RepEval 2017 Shared Task, which
evaluated neural network sentence representation learning models on the
Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by
Williams et al. (2017). All of the five participating teams beat the
bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in
Williams et al....
The best single model used stacked BiLSTMs with residual
connections to extract sentence features and reached 74.5% accuracy on the
genre-matched test set. Surprisingly, the results of the competition were
fairly consistent across the genre-matched and genre-mismatched test sets, and
across subsets of the test data representing a variety of linguistic phenomena,
suggesting that all of the submitted systems learned reasonably
domain-independent representations for sentence meaning.