Search Results for author: Daniel Campos

Found 23 papers, 7 papers with code

Overview of the TREC 2023 Product Product Search Track

no code implementations14 Nov 2023 Daniel Campos, Surya Kallumadi, Corby Rosset, Cheng Xiang Zhai, Alessandro Magnani

The focus this year was the creation of a reusable collection and evaluation of the impact of the use of metadata and multi-modal data on retrieval accuracy.

Retrieval

Noise-Robust Dense Retrieval via Contrastive Alignment Post Training

no code implementations6 Apr 2023 Daniel Campos, ChengXiang Zhai, Alessandro Magnani

The success of contextual word representations and advances in neural information retrieval have made dense vector-based retrieval a standard approach for passage and document ranking.

Data Augmentation Document Ranking +3

To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency

no code implementations5 Apr 2023 Daniel Campos, ChengXiang Zhai

Sequence-to-sequence language models can be used to produce abstractive summaries which are coherent, relevant, and concise.

Dense Sparse Retrieval: Using Sparse Language Models for Inference Efficient Dense Retrieval

no code implementations31 Mar 2023 Daniel Campos, ChengXiang Zhai

Vector-based retrieval systems have become a common staple for academic and industrial search applications because they provide a simple and scalable way of extending the search to leverage contextual representations for documents and queries.

Retrieval TriviaQA

Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encoders

no code implementations31 Mar 2023 Daniel Campos, Alessandro Magnani, ChengXiang Zhai

In this paper, we consider the problem of improving the inference latency of language model-based dense retrieval systems by introducing structural compression and model size asymmetry between the context and query encoders.

Knowledge Distillation Language Modelling +3

oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes

no code implementations30 Mar 2023 Daniel Campos, Alexandre Marques, Mark Kurtz, ChengXiang Zhai

In this paper, we introduce the range of oBERTa language models, an easy-to-use set of language models which allows Natural Language Processing (NLP) practitioners to obtain between 3. 8 and 24. 3 times faster models without expertise in model compression.

Knowledge Distillation Model Compression +3

Compressing Cross-Lingual Multi-Task Models at Qualtrics

no code implementations29 Nov 2022 Daniel Campos, Daniel Perry, Samir Joshi, Yashmeet Gambhir, Wei Du, Zhengzheng Xing, Aaron Colak

Experience management is an emerging business area where organizations focus on understanding the feedback of customers and employees in order to improve their end-to-end experiences.

Management Model Compression +3

Sparse*BERT: Sparse Models Generalize To New tasks and Domains

no code implementations25 May 2022 Daniel Campos, Alexandre Marques, Tuan Nguyen, Mark Kurtz, ChengXiang Zhai

Our experimentation shows that models that are pruned during pretraining using general domain masked language models can transfer to novel domains and tasks without extensive hyperparameter exploration or specialized approaches.

Quantization

IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System

no code implementations3 Sep 2021 Daniel Campos, Heng Ji

A large portion of chemistry literature focuses on new molecules and reactions between molecules.

Image Captioning

Curriculum learning for language modeling

1 code implementation4 Aug 2021 Daniel Campos

Language Models like ELMo and BERT have provided robust representations of natural language, which serve as the language understanding component for a diverse range of downstream tasks. Curriculum learning is a method that employs a structured training regime instead, which has been leveraged in computer vision and machine translation to improve model training speed and model performance.

Language Modelling Machine Translation +1

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

no code implementations9 May 2021 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin

Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field.

Benchmarking

TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime

no code implementations19 Apr 2021 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees, Ian Soboroff

The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available.

Selection bias

Informational entropy thresholds as a physical mechanism to explain power-law time distributions in sequential decision-making

no code implementations17 Feb 2021 Javier Cristín, Vicenç Méndez, Daniel Campos

While frameworks based on physical grounds (like the Drift-Diffusion Model) have been exhaustively used in psychology and neuroscience to describe perceptual decision-making in humans, analogous approaches for more complex situations like sequential (tree-like) decision making are still absent.

Decision Making Physics and Society Disordered Systems and Neural Networks Adaptation and Self-Organizing Systems

Overview of the TREC 2020 deep learning track

1 code implementation15 Feb 2021 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos

This is the second year of the TREC Deep Learning Track, with the goal of studying ad hoc ranking in the large training data regime.

Passage Retrieval Retrieval

ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

no code implementations9 Jun 2020 Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, Bodo Billerbeck

Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval.

Information Retrieval Retrieval

On the Reliability of Test Collections for Evaluating Systems of Different Types

no code implementations28 Apr 2020 Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Daniel Campos

As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality.

Fairness Information Retrieval +2

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

2 code implementations3 Apr 2020 Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, Rangan Majumder, Ming Zhou

In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks.

Natural Language Understanding XLM-R

Overview of the TREC 2019 deep learning track

2 code implementations17 Mar 2020 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime.

Passage Retrieval Retrieval +1

Open Domain Web Keyphrase Extraction Beyond Language Modeling

2 code implementations IJCNLP 2019 Lee Xiong, Chuan Hu, Chenyan Xiong, Daniel Campos, Arnold Overwijk

This paper studies keyphrase extraction in real-world scenarios where documents are from diverse domains and have variant content quality.

Keyphrase Extraction Language Modelling

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

12 code implementations28 Nov 2016 Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang

The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.

Benchmarking Machine Reading Comprehension +1

Cannot find the paper you are looking for? You can Submit a new open access paper.