Neural IR models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their generalization capabilities.
Ranked #1 on Question Answering on HotpotQA (BEIR)
In this paper, we introduce SciGen, a new challenge dataset for the task of reasoning-aware data-to-text generation consisting of tables from scientific articles and their corresponding descriptions.
Our best methods achieve an average Regret@3 of less than 1% across all target tasks, demonstrating that we are able to efficiently identify the best datasets for intermediate training.
Question answering systems should help users to access knowledge on a broad range of topics and to answer a wide array of different questions.
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
Our framework weights each example based on the biases it contains and the strength of those biases in the training data.
We investigate the model performances on nine benchmarks of answer selection and question similarity tasks, and show that all 140 models transfer surprisingly well, where the large majority of models substantially outperforms common IR baselines.
We show that by separating the two stages, i. e., knowledge extraction and knowledge composition, the classifier can effectively exploit the representations learned from multiple tasks in a non-destructive manner.
Finally, we show that using the coverage information is not only beneficial for improving the performance across different datasets of the same task.
Visual modifications to text are often used to obfuscate offensive comments in social media (e. g., "! d10t") or as a writing style ("1337" in "leet speak"), among other scenarios.
Here, we generalize the concept of average word embeddings to power mean word embeddings.