no code implementations • EMNLP 2020 • Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou
In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks.
no code implementations • 13 May 2024 • Hossein A. Rahmani, Nick Craswell, Emine Yilmaz, Bhaskar Mitra, Daniel Campos
Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems.
no code implementations • 8 May 2024 • Luke Merrick, Danmei Xu, Gaurav Nuti, Daniel Campos
This report describes the training dataset creation and recipe behind the family of \texttt{arctic-embed} text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license).
no code implementations • 14 Nov 2023 • Daniel Campos, Surya Kallumadi, Corby Rosset, Cheng Xiang Zhai, Alessandro Magnani
The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy.
no code implementations • 6 Apr 2023 • Daniel Campos, ChengXiang Zhai, Alessandro Magnani
The success of contextual word representations and advances in neural information retrieval have made dense vector-based retrieval a standard approach for passage and document ranking.
no code implementations • 5 Apr 2023 • Daniel Campos, ChengXiang Zhai
Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise.
no code implementations • 31 Mar 2023 • Daniel Campos, ChengXiang Zhai
Vector-based retrieval systems have become a common staple for academic and industrial search applications because they provide a simple and scalable way of extending the search to leverage contextual representations for documents and queries.
no code implementations • 31 Mar 2023 • Daniel Campos, Alessandro Magnani, ChengXiang Zhai
In this paper, we consider the problem of improving the inference latency of language model-based dense retrieval systems by introducing structural compression and model size asymmetry between the context and query encoders.
no code implementations • 30 Mar 2023 • Daniel Campos, Alexandre Marques, Mark Kurtz, ChengXiang Zhai
In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3. 8 and 24. 3 times faster models without expertise in model compression.
no code implementations • 29 Nov 2022 • Daniel Campos, Daniel Perry, Samir Joshi, Yashmeet Gambhir, Wei Du, Zhengzheng Xing, Aaron Colak
Experience management is an emerging business area where organizations focus on understanding the feedback of customers and employees in order to improve their end-to-end experiences.
no code implementations • 25 May 2022 • Daniel Campos, Alexandre Marques, Tuan Nguyen, Mark Kurtz, ChengXiang Zhai
Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches.
2 code implementations • 14 Mar 2022 • Eldar Kurtic, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, Dan Alistarh
We perform an in-depth study of the accuracy-compression trade-off for unstructured weight pruning of BERT models.
no code implementations • 3 Sep 2021 • Daniel Campos, Heng Ji
A large portion of chemistry literature focuses on new molecules and reactions between molecules.
1 code implementation • 4 Aug 2021 • Daniel Campos
Language Models like ELMo and BERT have provided robust representations of natural language, which serve as the language understanding component for a diverse range of downstream tasks. Curriculum learning is a method that employs a structured training regime instead, which has been leveraged in computer vision and machine translation to improve model training speed and model performance.
no code implementations • 9 May 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin
Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field.
no code implementations • 19 Apr 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees, Ian Soboroff
The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available.
no code implementations • 25 Feb 2021 • Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, Emine Yilmaz
Leaderboards are a ubiquitous part of modern research in applied machine learning.
no code implementations • 17 Feb 2021 • Javier Cristín, Vicenç Méndez, Daniel Campos
While frameworks based on physical grounds (like the Drift-Diffusion Model) have been exhaustively used in psychology and neuroscience to describe perceptual decision-making in humans, analogous approaches for more complex situations like sequential (tree-like) decision making are still absent.
Decision Making Physics and Society Disordered Systems and Neural Networks Adaptation and Self-Organizing Systems
1 code implementation • 15 Feb 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos
This is the second year of the TREC Deep Learning Track, with the goal of studying ad hoc ranking in the large training data regime.
no code implementations • 9 Jun 2020 • Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, Bodo Billerbeck
Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval.
no code implementations • 28 Apr 2020 • Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Daniel Campos
As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality.
2 code implementations • 3 Apr 2020 • Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou
In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks.
2 code implementations • 17 Mar 2020 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees
The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime.
2 code implementations • IJCNLP 2019 • Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, Arnold Overwijk
This paper studies keyphrase extraction in real-world scenarios where documents are from diverse domains and have variant content quality.
12 code implementations • 28 Nov 2016 • Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang
The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.