Search Results for author: Jamie Callan

Found 35 papers, 18 papers with code

Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval

no code implementations5 Apr 2024 João Coelho, Bruno Martins, João Magalhães, Jamie Callan, Chenyan Xiong

This study investigates the existence of positional biases in Transformer-based models for text representation learning, particularly in the context of web document retrieval.

Decoder Language Modelling +2

Building Retrieval Systems for the ClueWeb22-B Corpus

no code implementations6 Feb 2024 Harshit Mehrotra, Jamie Callan, Zhen Fan

The ClueWeb22 dataset containing nearly 10 billion documents was released in 2022 to support academic and industry research.

Retrieval

Active Retrieval Augmented Generation

2 code implementations11 May 2023 Zhengbao Jiang, Frank F. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, Graham Neubig

In this work, we provide a generalized view of active retrieval augmented generation, methods that actively decide when and what to retrieve across the course of the generation.

Retrieval Sentence

Precise Zero-Shot Dense Retrieval without Relevance Labels

2 code implementations20 Dec 2022 Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan

Given a query, HyDE first zero-shot instructs an instruction-following language model (e. g. InstructGPT) to generate a hypothetical document.

Fact Verification Instruction Following +3

Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer

1 code implementation5 Dec 2022 Zhengbao Jiang, Luyu Gao, Jun Araki, Haibo Ding, Zhiruo Wang, Jamie Callan, Graham Neubig

Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers.

Open-Domain Question Answering Passage Retrieval +1

ClueWeb22: 10 Billion Web Documents with Visual and Semantic Information

no code implementations29 Nov 2022 Arnold Overwijk, Chenyan Xiong, Xiao Liu, Cameron VandenBerg, Jamie Callan

ClueWeb22, the newest iteration of the ClueWeb line of datasets, provides 10 billion web pages affiliated with rich information.

document understanding Retrieval

PAL: Program-aided Language Models

3 code implementations18 Nov 2022 Luyu Gao, Aman Madaan, Shuyan Zhou, Uri Alon, PengFei Liu, Yiming Yang, Jamie Callan, Graham Neubig

Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.

Arithmetic Reasoning GSM8K +2

Long Document Re-ranking with Modular Re-ranker

1 code implementation9 May 2022 Luyu Gao, Jamie Callan

In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework.

Document Ranking Re-Ranking

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

1 code implementation11 Mar 2022 Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan

In this paper, we present Tevatron, a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity.

Retrieval

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

2 code implementations30 Aug 2021 HongChien Yu, Chenyan Xiong, Jamie Callan

This paper proposes ANCE-PRF, a new query encoder that uses pseudo relevance feedback (PRF) to improve query representations for dense retrieval.

Retrieval

COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

1 code implementation NAACL 2021 Luyu Gao, Zhuyun Dai, Jamie Callan

Classical information retrieval systems such as BM25 rely on exact lexical match and carry out search efficiently with inverted list index.

Information Retrieval Retrieval

Assessing the Benefits of Model Ensembles in Neural Re-Ranking for Passage Retrieval

no code implementations21 Jan 2021 Luís Borges, Bruno Martins, Jamie Callan

Our work aimed at experimentally assessing the benefits of model ensembling within the context of neural methods for passage reranking.

Learning-To-Rank Passage Retrieval +2

Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline

1 code implementation21 Jan 2021 Luyu Gao, Zhuyun Dai, Jamie Callan

Pre-trained deep language models~(LM) have advanced the state-of-the-art of text retrieval.

Text Retrieval

PGT: Pseudo Relevance Feedback Using a Graph-Based Transformer

1 code implementation20 Jan 2021 HongChien Yu, Zhuyun Dai, Jamie Callan

Most research on pseudo relevance feedback (PRF) has been done in vector space and probabilistic retrieval models.

Retrieval

Making Information Seeking Easier: An Improved Pipeline for Conversational Search

no code implementations Findings of the Association for Computational Linguistics 2020 Vaibhav Kumar, Jamie Callan

Given an input question, it uses a BERT-based classifier (trained with weak supervision) to de-contextualize the input by selecting relevant terms from the dialog history.

Conversational Search Passage Ranking +2

Generating Categories for Sets of Entities

no code implementations19 Aug 2020 Shuo Zhang, Krisztian Balog, Jamie Callan

Category systems are central components of knowledge bases, as they provide a hierarchical grouping of semantically related concepts and entities.

Abstractive Text Summarization Specificity

Ranking Clarification Questions via Natural Language Inference

no code implementations18 Aug 2020 Vaibhav Kumar, Vikas Raunak, Jamie Callan

Given a natural language query, teaching machines to ask clarifying questions is of immense utility in practical natural language processing systems.

Natural Language Inference Reading Comprehension

Understanding BERT Rankers Under Distillation

no code implementations21 Jul 2020 Luyu Gao, Zhuyun Dai, Jamie Callan

Deep language models such as BERT pre-trained on large corpus have given a huge performance boost to the state-of-the-art information retrieval ranking systems.

Information Retrieval Retrieval

Summarizing and Exploring Tabular Data in Conversational Search

1 code implementation23 May 2020 Shuo Zhang, Zhuyun Dai, Krisztian Balog, Jamie Callan

We propose to generate natural language summaries as answers to describe the complex information contained in a table.

Conversational Search

Complementing Lexical Retrieval with Semantic Residual Embedding

no code implementations29 Apr 2020 Luyu Gao, Zhuyun Dai, Tongfei Chen, Zhen Fan, Benjamin Van Durme, Jamie Callan

This paper presents CLEAR, a retrieval model that seeks to complement classical lexical exact-match models such as BM25 with semantic matching signals from a neural embedding matching model.

Information Retrieval Retrieval

Modularized Transfomer-based Ranking Framework

no code implementations EMNLP 2020 Luyu Gao, Zhuyun Dai, Jamie Callan

Recent innovations in Transformer-based ranking models have advanced the state-of-the-art in information retrieval.

Information Retrieval Retrieval

Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval

2 code implementations23 Oct 2019 Zhuyun Dai, Jamie Callan

When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval.

Passage Retrieval Retrieval +1

Deeper Text Understanding for IR with Contextual Neural Language Modeling

1 code implementation22 May 2019 Zhuyun Dai, Jamie Callan

Neural networks provide new possibilities to automatically learn complex language patterns and query-document relations.

Ad-Hoc Information Retrieval Language Modelling +2

Consistency and Variation in Kernel Neural Ranking Model

no code implementations27 Sep 2018 Mary Arpita Pyreddy, Varshini Ramaseshan, Narendra Nath Joshi, Zhuyun Dai, Chenyan Xiong, Jamie Callan, Zhiyuan Liu

This paper studies the consistency of the kernel-based neural ranking model K-NRM, a recent state-of-the-art neural IR model, which is important for reproducible research and deployment in the industry.

Word Embeddings

Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

no code implementations3 May 2018 Chenyan Xiong, Zhengzhong Liu, Jamie Callan, Tie-Yan Liu

The salience model also improves ad hoc search accuracy, providing effective ranking features by modeling the salience of query entities in candidate documents.

Retrieval

Convolutional Neural Networks for Soft Matching N-Grams in Ad-hoc Search

no code implementations WSDM 2018 2018 Zhuyun Dai, Chenyan Xiong, Jamie Callan, Zhiyuan Liu

This paper presents Conv-KNRM, a Convolutional Kernel-based Neural Ranking Model that models n-gram soft matches for ad-hoc search.

Learning-To-Rank

Word-Entity Duet Representations for Document Ranking

no code implementations20 Jun 2017 Chenyan Xiong, Jamie Callan, Tie-Yan Liu

This paper presents a word-entity duet framework for utilizing knowledge bases in ad-hoc retrieval.

Document Ranking Learning-To-Rank +1

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

1 code implementation20 Jun 2017 Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, Russell Power

Given a query and a set of documents, K-NRM uses a translation matrix that models word-level similarities via word embeddings, a new kernel-pooling technique that uses kernels to extract multi-level soft match features, and a learning-to-rank layer that combines those features into the final ranking score.

Document Ranking Learning-To-Rank +2

Exploratory Learning

no code implementations1 Jul 2013 Bhavana Dalvi, William W. Cohen, Jamie Callan

In multiclass semi-supervised learning (SSL), it is sometimes the case that the number of classes present in the data is not known, and hence no labeled examples are provided for some classes.

Clustering

WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

no code implementations1 Jul 2013 Bhavana Dalvi, William W. Cohen, Jamie Callan

We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus.

Clustering Information Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.