Search Results for author: Daniel Campos

Found 25 papers, 7 papers with code

XGLUE: A New Benchmark Datasetfor Cross-lingual Pre-training, Understanding and Generation

no code implementations • EMNLP 2020 • Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks.

Natural Language Understanding XLM-R

Paper
Add Code

Synthetic Test Collections for Retrieval Evaluation

no code implementations • 13 May 2024 • Hossein A. Rahmani, Nick Craswell, Emine Yilmaz, Bhaskar Mitra, Daniel Campos

Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems.

Information Retrieval Retrieval

Paper
Add Code

Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models

no code implementations • 8 May 2024 • Luke Merrick, Danmei Xu, Gaurav Nuti, Daniel Campos

This report describes the training dataset creation and recipe behind the family of \texttt{arctic-embed} text embedding models (a set of five models ranging from 22 to 334 million parameters with weights open-sourced under an Apache-2 license).

Retrieval

Paper
Add Code

Overview of the TREC 2023 Product Product Search Track

no code implementations • 14 Nov 2023 • Daniel Campos, Surya Kallumadi, Corby Rosset, Cheng Xiang Zhai, Alessandro Magnani

The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy.

Retrieval

Paper
Add Code

Noise-Robust Dense Retrieval via Contrastive Alignment Post Training

no code implementations • 6 Apr 2023 • Daniel Campos, ChengXiang Zhai, Alessandro Magnani

The success of contextual word representations and advances in neural information retrieval have made dense vector-based retrieval a standard approach for passage and document ranking.

Data Augmentation Document Ranking +3

Paper
Add Code

To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency

no code implementations • 5 Apr 2023 • Daniel Campos, ChengXiang Zhai

Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise.

Decoder

Paper
Add Code

Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval

no code implementations • 31 Mar 2023 • Daniel Campos, ChengXiang Zhai

Vector-based retrieval systems have become a common staple for academic and industrial search applications because they provide a simple and scalable way of extending the search to leverage contextual representations for documents and queries.

Retrieval TriviaQA

Paper
Add Code

Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encoders

no code implementations • 31 Mar 2023 • Daniel Campos, Alessandro Magnani, ChengXiang Zhai

In this paper, we consider the problem of improving the inference latency of language model-based dense retrieval systems by introducing structural compression and model size asymmetry between the context and query encoders.

Knowledge Distillation Language Modelling +3

Paper
Add Code

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

no code implementations • 30 Mar 2023 • Daniel Campos, Alexandre Marques, Mark Kurtz, ChengXiang Zhai

In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3. 8 and 24. 3 times faster models without expertise in model compression.

Knowledge Distillation Model Compression +3

Paper
Add Code

Compressing Cross-Lingual Multi-Task Models at Qualtrics

no code implementations • 29 Nov 2022 • Daniel Campos, Daniel Perry, Samir Joshi, Yashmeet Gambhir, Wei Du, Zhengzheng Xing, Aaron Colak

Experience management is an emerging business area where organizations focus on understanding the feedback of customers and employees in order to improve their end-to-end experiences.

Management Model Compression +3

Paper
Add Code

Sparse*BERT: Sparse Models Generalize To New tasks and Domains

no code implementations • 25 May 2022 • Daniel Campos, Alexandre Marques, Tuan Nguyen, Mark Kurtz, ChengXiang Zhai

Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches.

Quantization

Paper
Add Code

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

2 code implementations • 14 Mar 2022 • Eldar Kurtic, Daniel Campos, Tuan Nguyen, Elias Frantar, Mark Kurtz, Benjamin Fineran, Michael Goin, Dan Alistarh

We perform an in-depth study of the accuracy-compression trade-off for unstructured weight pruning of BERT models.

Quantization

2,904

Paper
Code

IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System

no code implementations • 3 Sep 2021 • Daniel Campos, Heng Ji

A large portion of chemistry literature focuses on new molecules and reactions between molecules.

Image Captioning

Paper
Add Code

Curriculum learning for language modeling

1 code implementation • 4 Aug 2021 • Daniel Campos

Language Models like ELMo and BERT have provided robust representations of natural language, which serve as the language understanding component for a diverse range of downstream tasks. Curriculum learning is a method that employs a structured training regime instead, which has been leveraged in computer vision and machine translation to improve model training speed and model performance.

Language Modelling Machine Translation +1

Paper
Code

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

no code implementations • 9 May 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin

Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field.

Benchmarking

Paper
Add Code

TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime

no code implementations • 19 Apr 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees, Ian Soboroff

The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available.

Selection bias

Paper
Add Code

Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard

no code implementations • 25 Feb 2021 • Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, Emine Yilmaz

Leaderboards are a ubiquitous part of modern research in applied machine learning.

Document Ranking Information Retrieval +1

Paper
Add Code

Informational entropy thresholds as a physical mechanism to explain power-law time distributions in sequential decision-making

no code implementations • 17 Feb 2021 • Javier Cristín, Vicenç Méndez, Daniel Campos

While frameworks based on physical grounds (like the Drift-Diffusion Model) have been exhaustively used in psychology and neuroscience to describe perceptual decision-making in humans, analogous approaches for more complex situations like sequential (tree-like) decision making are still absent.

Decision Making Physics and Society Disordered Systems and Neural Networks Adaptation and Self-Organizing Systems

Paper
Add Code

Overview of the TREC 2020 deep learning track

1 code implementation • 15 Feb 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos

This is the second year of the TREC Deep Learning Track, with the goal of studying ad hoc ranking in the large training data regime.

Passage Retrieval Retrieval

Paper
Code

ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

no code implementations • 9 Jun 2020 • Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, Bodo Billerbeck

Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval.

Information Retrieval Retrieval

Paper
Add Code

On the Reliability of Test Collections for Evaluating Systems of Different Types

no code implementations • 28 Apr 2020 • Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Daniel Campos

As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality.

Fairness Information Retrieval +2

Paper
Add Code

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

2 code implementations • 3 Apr 2020 • Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks.

Natural Language Understanding XLM-R

Paper
Code

Overview of the TREC 2019 deep learning track

2 code implementations • 17 Mar 2020 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime.

Passage Retrieval Retrieval +1

Paper
Code

Open Domain Web Keyphrase Extraction Beyond Language Modeling

2 code implementations • IJCNLP 2019 • Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, Arnold Overwijk

This paper studies keyphrase extraction in real-world scenarios where documents are from diverse domains and have variant content quality.

Keyphrase Extraction Language Modelling

150

Paper
Code

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

12 code implementations • 28 Nov 2016 • Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang

The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.

Benchmarking Machine Reading Comprehension +1

360

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.