Search Results for author: Sean MacAvaney

Found 73 papers, 43 papers with code

Community-level Research on Suicidality Prediction in a Secure Environment: Overview of the CLPsych 2021 Shared Task

no code implementations NAACL (CLPsych) 2021 Sean MacAvaney, Anjali Mittu, Glen Coppersmith, Jeff Leintz, Philip Resnik

Progress on NLP for mental health — indeed, for healthcare in general — is hampered by obstacles to shared, community-level access to relevant data.

TBD3: A Thresholding-Based Dynamic Depression Detection from Social Media for Low-Resource Users

1 code implementation LREC 2022 Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder

To complement this evaluation, we propose a dynamic thresholding technique that adjusts the classifier’s sensitivity as a function of the number of posts a user has.

Depression Detection

An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-Doc

no code implementations21 May 2025 Aldo Porco, Dhruv Mehra, Igor Malioutov, Karthik Radhakrishnan, Moniba Keymanesh, Daniel Preoţiuc-Pietro, Sean MacAvaney, Pengxiang Cheng

Terms with very high Document Frequencies (DFs) substantially increase latency in production retrieval engines, such as Apache Solr, due to their lengthy posting lists.

Retrieval

Lost in Transliteration: Bridging the Script Gap in Neural IR

no code implementations13 May 2025 Andreas Chari, Iadh Ounis, Sean MacAvaney

This creates a ``script gap" between the performance of the same queries when written in their native or transliterated form.

Information Retrieval Retrieval +1

Artifact Sharing for Information Retrieval Research

no code implementations8 May 2025 Sean MacAvaney

Sharing artifacts -- such as trained models, pre-built indexes, and the code to use them -- aids in reproducibility efforts by allowing researchers to validate intermediate steps and improves the sustainability of research by allowing multiple groups to build off one another's prior computational work.

Information Retrieval Retrieval

Document Quality Scoring for Web Crawling

1 code implementation15 Apr 2025 Francesca Pezzuti, Ariane Mueller, Sean MacAvaney, Nicola Tonellotto

The internet contains large amounts of low-quality content, yet users expect web search engines to deliver high-quality, relevant results.

On Precomputation and Caching in Information Retrieval Experiments with Pipeline Architectures

no code implementations14 Apr 2025 Sean MacAvaney, Craig Macdonald

These approaches allow for the best of both worlds: pipelines can be fully expressed end-to-end, while also avoiding redundant computations between pipelines.

Information Retrieval Retrieval

Breaking the Lens of the Telescope: Online Relevance Estimation over Large Retrieval Sets

no code implementations12 Apr 2025 Mandeep Rathee, V Venktesh, Sean MacAvaney, Avishek Anand

Instead of re-ranking a fixed set of top-k documents in a single step, online relevance estimation iteratively re-scores smaller subsets of the most promising documents while adjusting relevance scores for the remaining pool based on the estimations from the final model using an online bandit-based algorithm.

Re-Ranking Retrieval

Efficient Constant-Space Multi-Vector Retrieval

1 code implementation2 Apr 2025 Sean MacAvaney, Antonio Mallia, Nicola Tonellotto

Multi-vector retrieval methods, exemplified by the ColBERT architecture, have shown substantial promise for retrieval by providing strong trade-offs in terms of retrieval latency and effectiveness.

Management Retrieval

Improving Low-Resource Retrieval Effectiveness using Zero-Shot Linguistic Similarity Transfer

1 code implementation28 Mar 2025 Andreas Chari, Sean MacAvaney, Iadh Ounis

We find that this approach improves the performance of the varieties upon which the models were directly trained, thereby regularising these models to generalise and perform better even on unseen language variety pairs.

Retrieval

Variations in Relevance Judgments and the Shelf Life of Test Collections

no code implementations28 Feb 2025 Andrew Parry, Maik Fröbe, Harrisen Scells, Ferdinand Schlatt, Guglielmo Faggioli, Saber Zerhoudi, Sean MacAvaney, Eugene Yang

The fundamental property of Cranfield-style evaluations, that system rankings are stable even when assessors disagree on individual relevance decisions, was validated on traditional test collections.

Retrieval

mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval

1 code implementation31 Jan 2025 Orion Weller, Benjamin Chang, Eugene Yang, Mahsa Yarmohammadi, Sam Barham, Sean MacAvaney, Arman Cohan, Luca Soldaini, Benjamin Van Durme, Dawn Lawrie

We see strong cross-lingual performance with English-based retrievers that trained using instructions, but find a notable drop in performance in the multilingual setting, indicating that more work is needed in developing data for instruction-based multilingual retrievers.

Instruction Following Retrieval

MechIR: A Mechanistic Interpretability Framework for Information Retrieval

1 code implementation17 Jan 2025 Andrew Parry, Catherine Chen, Carsten Eickhoff, Sean MacAvaney

Mechanistic interpretability is an emerging diagnostic approach for neural models that has gained traction in broader natural language processing domains.

Diagnostic Information Retrieval +1

Guiding Retrieval using LLM-based Listwise Rankers

1 code implementation15 Jan 2025 Mandeep Rathee, Sean MacAvaney, Avishek Anand

In this paper, we propose an adaptation of an existing adaptive retrieval method that supports the listwise setting and helps guide the retrieval process itself (thereby overcoming the bounded recall problem for LLM rerankers).

Retrieval

Training on the Test Model: Contamination in Ranking Distillation

1 code implementation4 Nov 2024 Vishakha Suresh Kalal, Andrew Parry, Sean MacAvaney

By simulating a ``worst-case'' setting where the degree of contamination is known, we find that contamination occurs even when the test data represents a small fraction of the teacher's training samples.

Knowledge Distillation

Quam: Adaptive Retrieval through Query Affinity Modelling

1 code implementation26 Oct 2024 Mandeep Rathee, Sean MacAvaney, Avishek Anand

Our extensive experimental evidence shows that our proposed approach, Quam improves the recall performance by up to 26\% over the standard re-ranking baselines.

Information Retrieval Re-Ranking +1

DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities

1 code implementation10 Oct 2024 Thong Nguyen, Shubham Chatterjee, Sean MacAvaney, Iain Mackie, Jeff Dalton, Andrew Yates

Learned Sparse Retrieval (LSR) models use vocabularies from pre-trained transformers, which often split entities into nonsensical fragments.

Document Ranking Entity Embeddings +3

Genetic Approach to Mitigate Hallucination in Generative IR

1 code implementation25 Aug 2024 Hrishikesh Kulkarni, Nazli Goharian, Ophir Frieder, Sean MacAvaney

We address hallucination by adapting an existing genetic generation approach with a new 'balanced fitness function' consisting of a cross-encoder model for relevance and an n-gram overlap metric to promote grounding.

Answer Generation Hallucination

LexBoost: Improving Lexical Document Retrieval with Nearest Neighbors

1 code implementation25 Aug 2024 Hrishikesh Kulkarni, Nazli Goharian, Ophir Frieder, Sean MacAvaney

For efficiency, approximation methods like HNSW are frequently used to approximate exhaustive dense retrieval.

Re-Ranking Retrieval

Neural Passage Quality Estimation for Static Pruning

1 code implementation16 Jul 2024 Xuejun Chang, Debabrata Mishra, Craig Macdonald, Sean MacAvaney

We refer to this query-agnostic estimation of passage relevance as a passage's quality.

Top-Down Partitioning for Efficient List-Wise Ranking

1 code implementation23 May 2024 Andrew Parry, Sean MacAvaney, Debasis Ganguly

Large Language Models (LLMs) have significantly impacted many facets of natural language processing and information retrieval.

Information Retrieval Re-Ranking

Generative Relevance Feedback and Convergence of Adaptive Re-Ranking: University of Glasgow Terrier Team at TREC DL 2023

1 code implementation2 May 2024 Andrew Parry, Thomas Jaenich, Sean MacAvaney, Iadh Ounis

In re-ranking, we investigate operating points of adaptive re-ranking with different first stages to find the point in graph traversal where the first stage no longer has an effect on the performance of the overall retrieval pipeline.

Language Modeling Language Modelling +3

Exploiting Positional Bias for Query-Agnostic Generative Content in Search

1 code implementation1 May 2024 Andrew Parry, Sean MacAvaney, Debasis Ganguly

We demonstrate such defects by showing that non-relevant text--such as promotional content--can be easily injected into a document without adversely affecting its position in search results.

Position Text Retrieval

A Reproducibility Study of PLAID

no code implementations23 Apr 2024 Sean MacAvaney, Nicola Tonellotto

The PLAID (Performance-optimized Late Interaction Driver) algorithm for ColBERTv2 uses clustered term representations to retrieve and progressively prune documents for final (exact) document scoring.

Re-Ranking Retrieval

Overview of the TREC 2023 NeuCLIR Track

no code implementations11 Apr 2024 Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

The principal tasks are ranked retrieval of news in one of the three languages, using English topics.

Information Retrieval Retrieval

Shallow Cross-Encoders for Low-Latency Retrieval

1 code implementation29 Mar 2024 Aleksandr V. Petrov, Sean MacAvaney, Craig Macdonald

However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window.

Passage Ranking Text Retrieval

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

1 code implementation22 Mar 2024 Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini

First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions.

Information Retrieval Text Retrieval

Analyzing Adversarial Attacks on Sequence-to-Sequence Relevance Models

1 code implementation12 Mar 2024 Andrew Parry, Maik Fröbe, Sean MacAvaney, Martin Potthast, Matthias Hagen

Modern sequence-to-sequence relevance models like monoT5 can effectively capture complex textual interactions between queries and documents through cross-encoding.

Retrieval

Evaluating the Explainability of Neural Rankers

no code implementations4 Mar 2024 Saran Pandian, Debasis Ganguly, Sean MacAvaney

While the increasing complexity of the search models have been able to demonstrate improvements in effectiveness (measured in terms of relevance of top-retrieved results), a question worthy of a thorough inspection is - "how explainable are these models?

Information Retrieval Sentence

A Deep Learning Approach for Selective Relevance Feedback

no code implementations20 Jan 2024 Suchana Datta, Debasis Ganguly, Sean MacAvaney, Derek Greene

Additionally, to further improve retrieval effectiveness with this selective PRF approach, we make use of the model's confidence estimates to combine the information from the original and expanded queries.

Deep Learning Retrieval

Generative Query Reformulation for Effective Adhoc Search

no code implementations1 Aug 2023 Xiao Wang, Sean MacAvaney, Craig Macdonald, Iadh Ounis

GenQR directly reformulates the user's input query, while GenPRF provides additional context for the query by making use of pseudo-relevance feedback information.

Information Retrieval Retrieval

On the Effects of Regional Spelling Conventions in Retrieval Models

1 code implementation1 Aug 2023 Andreas Chari, Sean MacAvaney, Iadh Ounis

One advantage of neural ranking models is that they are meant to generalise well in situations of synonymity i. e. where two words have similar or identical meanings.

Retrieval

Lexically-Accelerated Dense Retrieval

1 code implementation31 Jul 2023 Hrishikesh Kulkarni, Sean MacAvaney, Nazli Goharian, Ophir Frieder

We introduce 'LADR' (Lexically-Accelerated Dense Retrieval), a simple-yet-effective approach that improves the efficiency of existing dense retrieval models without compromising on retrieval effectiveness.

Retrieval

Adaptive Latent Entity Expansion for Document Retrieval

no code implementations29 Jun 2023 Iain Mackie, Shubham Chatterjee, Sean MacAvaney, Jeffrey Dalton

First, we demonstrate that applying a strong neural re-ranker before sparse or dense PRF can improve the retrieval effectiveness by 5-8%.

Re-Ranking Retrieval

Online Distillation for Pseudo-Relevance Feedback

no code implementations16 Jun 2023 Sean MacAvaney, Xi Wang

Model distillation has emerged as a prominent technique to improve neural search models.

Re-Ranking Retrieval

The Information Retrieval Experiment Platform

1 code implementation30 May 2023 Maik Fröbe, Jan Heinrich Reimer, Sean MacAvaney, Niklas Deckers, Simon Reich, Janek Bevendorff, Benno Stein, Matthias Hagen, Martin Potthast

Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are compatible with ir_datasets and ir_measures.

Information Retrieval Retrieval

Adapting Learned Sparse Retrieval for Long Documents

1 code implementation29 May 2023 Thong Nguyen, Sean MacAvaney, Andrew Yates

We investigate existing aggregation approaches for adapting LSR to longer documents and find that proximal scoring is crucial for LSR to handle long documents.

Language Modeling Language Modelling +2

Overview of the TREC 2022 NeuCLIR Track

no code implementations24 Apr 2023 Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval.

Information Retrieval Retrieval

A Unified Framework for Learned Sparse Retrieval

1 code implementation23 Mar 2023 Thong Nguyen, Sean MacAvaney, Andrew Yates

We then reproduce all prominent methods using a common codebase and re-train them in the same environment, which allows us to quantify how components of the framework affect effectiveness and efficiency.

Retrieval

One-Shot Labeling for Automatic Relevance Estimation

1 code implementation22 Feb 2023 Sean MacAvaney, Luca Soldaini

We then explore various approaches for predicting the relevance of unjudged documents with respect to a query and the known relevant document, including nearest neighbor, supervised, and prompting techniques.

Retrieval

Doc2Query--: When Less is More

1 code implementation9 Jan 2023 Mitko Gospodinov, Sean MacAvaney, Craig Macdonald

Doc2Query -- the process of expanding the content of a document before indexing using a sequence-to-sequence model -- has emerged as a prominent technique for improving the first-stage retrieval effectiveness of search engines.

Hallucination Retrieval

Adaptive Re-Ranking with a Corpus Graph

3 code implementations18 Aug 2022 Sean MacAvaney, Nicola Tonellotto, Craig Macdonald

Search systems often employ a re-ranking pipeline, wherein documents (or passages) from an initial pool of candidates are assigned new ranking scores.

Passage Ranking Re-Ranking +1

CODEC: Complex Document and Entity Collection

2 code implementations9 May 2022 Iain Mackie, Paul Owoicho, Carlos Gemmell, Sophie Fischer, Sean MacAvaney, Jeffrey Dalton

We also show that the manual query reformulations significantly improve document ranking and entity ranking performance.

Document Ranking Re-Ranking +1

On Survivorship Bias in MS MARCO

1 code implementation27 Apr 2022 Prashansa Gupta, Sean MacAvaney

We observe that this bias could be present in the popular MS MARCO dataset, given that annotators could not find answers to 38--45% of the queries, leading to these queries being discarded in training and evaluation processes.

valid

Reproducing Personalised Session Search over the AOL Query Log

no code implementations21 Jan 2022 Sean MacAvaney, Craig Macdonald, Iadh Ounis

Given that web documents are prone to change over time, we study the differences present between a version of the corpus containing documents as they appeared in 2017 (which has been used by several recent works) and a new version we construct that includes documents close to as they appeared at the time the query log was produced (2006).

Session Search

Streamlining Evaluation with ir-measures

1 code implementation26 Nov 2021 Sean MacAvaney, Craig Macdonald, Iadh Ounis

We present ir-measures, a new tool that makes it convenient to calculate a diverse set of evaluation measures used in information retrieval.

Information Retrieval Retrieval

Max-Utility Based Arm Selection Strategy For Sequential Query Recommendations

no code implementations31 Aug 2021 Shameem A. Puthiya Parambath, Christos Anagnostopoulos, Roderick Murray-Smith, Sean MacAvaney, Evangelos Zervas

We show that such a selection strategy often results in higher cumulative regret and to this end, we propose a selection strategy based on the maximum utility of the arms.

Multi-Armed Bandits

IntenT5: Search Result Diversification using Causal Language Models

no code implementations9 Aug 2021 Sean MacAvaney, Craig Macdonald, Roderick Murray-Smith, Iadh Ounis

Existing approaches often rely on massive query logs and interaction data to generate a variety of possible query intents, which then can be used to re-rank documents.

Causal Language Modeling Diversity +3

Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review

no code implementations3 May 2021 Eugene Yang, Sean MacAvaney, David D. Lewis, Ophir Frieder

We indeed find that the pre-trained BERT model reduces review cost by 10% to 15% in TAR workflows simulated on the RCV1-v2 newswire collection.

Active Learning Language Modeling +5

ToxCCIn: Toxic Content Classification with Interpretability

no code implementations EACL (WASSA) 2021 Tong Xiang, Sean MacAvaney, Eugene Yang, Nazli Goharian

Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans.

Classification General Classification

ABNIRML: Analyzing the Behavior of Neural IR Models

2 code implementations2 Nov 2020 Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan

Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.

Diagnostic Language Modelling +1

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

no code implementations EMNLP 2020 Sean MacAvaney, Arman Cohan, Nazli Goharian

With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus.

Articles Re-Ranking

PARADE: Passage Representation Aggregation for Document Reranking

1 code implementation20 Aug 2020 Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, Yingfei Sun

In this work, we explore strategies for aggregating relevance signals from a document's passages into a final ranking score.

Document Ranking Knowledge Distillation +1

Interaction Matching for Long-Tail Multi-Label Classification

no code implementations18 May 2020 Sean MacAvaney, Franck Dernoncourt, Walter Chang, Nazli Goharian, Ophir Frieder

We present an elegant and effective approach for addressing limitations in existing multi-label classification models by incorporating interaction matching, a concept shown to be useful for ad-hoc search result ranking.

Classification General Classification +2

SLEDGE: A Simple Yet Effective Baseline for COVID-19 Scientific Knowledge Search

1 code implementation5 May 2020 Sean MacAvaney, Arman Cohan, Nazli Goharian

In this work, we present a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles.

Articles

Training Curricula for Open Domain Answer Re-Ranking

1 code implementation29 Apr 2020 Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

We show that the proposed heuristics can be used to build a training curriculum that down-weights difficult samples early in the training process.

Re-Ranking

Expansion via Prediction of Importance with Contextualization

1 code implementation29 Apr 2020 Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder

We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches.

Language Modeling Language Modelling +4

Ranking Significant Discrepancies in Clinical Reports

no code implementations18 Jan 2020 Sean MacAvaney, Arman Cohan, Nazli Goharian, Ross Filice

This allows medical practitioners to easily identify and learn from the reports in which their interpretation most substantially differed from that of the attending physician (who finalized the report).

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

1 code implementation30 Dec 2019 Sean MacAvaney, Luca Soldaini, Nazli Goharian

While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages.

Ad-Hoc Information Retrieval Information Retrieval +2

Ontology-Aware Clinical Abstractive Summarization

no code implementations14 May 2019 Sean MacAvaney, Sajad Sotudeh, Arman Cohan, Nazli Goharian, Ish Talati, Ross W. Filice

Automatically generating accurate summaries from clinical reports could save a clinician's time, improve summary coverage, and reduce errors.

Abstractive Text Summarization

A Deeper Look into Dependency-Based Word Embeddings

no code implementations NAACL 2018 Sean MacAvaney, Amir Zeldes

We investigate the effect of various dependency-based word embeddings on distinguishing between functional and domain similarity, word similarity rankings, and two downstream tasks in English.

Word Embeddings Word Similarity

Content-Based Weak Supervision for Ad-Hoc Re-Ranking

1 code implementation1 Jul 2017 Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder

One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training.

Information Retrieval Re-Ranking

Cannot find the paper you are looking for? You can Submit a new open access paper.