Search Results for author: Guido Zuccon

Found 53 papers, 30 papers with code

Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems

1 code implementation20 Feb 2024 Shengyao Zhuang, Bevan Koopman, Xiaoran Chu, Guido Zuccon

In this paper, we investigate various aspects of embedding models that could influence the recoverability of text using Vec2Text.

Quantization Retrieval

FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation

no code implementations19 Feb 2024 Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, Guido Zuccon

Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent.

Benchmarking Chatbot +3

Large Language Models for Stemming: Promises, Pitfalls and Failures

no code implementations19 Feb 2024 Shuai Wang, Shengyao Zhuang, Guido Zuccon

With this respect, we identify three avenues, each characterised by different trade-offs in terms of computational cost, effectiveness and robustness : (1) use LLMs to stem the vocabulary for a collection, i. e., the set of unique words that appear in the collection (vocabulary stemming), (2) use LLMs to stem each document separately (contextual stemming), and (3) use LLMs to extract from each document entities that should not be stemmed, then use vocabulary stemming to stem the rest of the terms (entity-based contextual stemming).

Leveraging LLMs for Unsupervised Dense Retriever Ranking

no code implementations7 Feb 2024 Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Guido Zuccon

Existing methodologies for ranking dense retrievers fall short in addressing these domain shift scenarios.

ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search

no code implementations31 Jan 2024 Shuai Wang, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

Our ReSLLM method exploits LLMs to drive the selection of resources in federated search in a zero-shot setting.

TPRF: A Transformer-based Pseudo-Relevance Feedback Model for Efficient and Effective Retrieval

no code implementations24 Jan 2024 Chuting Yu, Hang Li, Ahmed Mourad, Bevan Koopman, Guido Zuccon

This paper considers Pseudo-Relevance Feedback (PRF) methods for dense retrievers in a resource constrained environment such as that of cheap cloud instances or embedded systems (e. g., smartphones and smartwatches), where memory and CPU are limited and GPUs are not present.

Retrieval

How to Forget Clients in Federated Online Learning to Rank?

1 code implementation24 Jan 2024 Shuyi Wang, Bing Liu, Guido Zuccon

In a FOLTR system, a ranker is learned by aggregating local updates to the global ranking model.

Learning-To-Rank

A Reproducibility Study of Goldilocks: Just-Right Tuning of BERT for TAR

1 code implementation16 Jan 2024 Xinyu Mao, Bevan Koopman, Guido Zuccon

In this context, we show that there is no need for further pre-training if a domain-specific BERT backbone is used within the active learning pipeline.

Active Learning TAR +2

Zero-shot Generative Large Language Models for Systematic Review Screening Automation

no code implementations12 Jan 2024 Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, Guido Zuccon

Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions.

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

1 code implementation20 Oct 2023 Shengyao Zhuang, Bing Liu, Bevan Koopman, Guido Zuccon

In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document.

Document Ranking Information Retrieval +3

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

1 code implementation14 Oct 2023 Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, Guido Zuccon

Our approach reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, significantly improving the efficiency of LLM-based zero-shot ranking.

Document Ranking

Selecting which Dense Retriever to use for Zero-Shot Search

no code implementations18 Sep 2023 Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang, Guido Zuccon

We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i. e. in a zero-shot setting.

Information Retrieval Retrieval

ChatGPT Hallucinates when Attributing Answers

no code implementations17 Sep 2023 Guido Zuccon, Bevan Koopman, Razia Shaik

We find that ChatGPT provides correct or partially correct answers in about half of the cases (50. 6% of the times), but its suggested references only exist 14% of the times.

Attribute

Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

no code implementations12 Sep 2023 Sophia Althammer, Guido Zuccon, Sebastian Hofstätter, Suzan Verberne, Allan Hanbury

We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost.

Active Learning Domain Adaptation

Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation

1 code implementation11 Sep 2023 Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman, Guido Zuccon

Our best approach is not only viable based on the information available at the time of screening, but also has similar effectiveness to the final title.

Natural Language Queries

An Analysis of Untargeted Poisoning Attack and Defense Methods for Federated Online Learning to Rank Systems

no code implementations4 Jul 2023 Shuyi Wang, Guido Zuccon

For this, FOLTR trains learning to rank models in an online manner -- i. e. by exploiting users' interactions with the search systems (queries, clicks), rather than labels -- and federatively -- i. e. by not aggregating interaction data in a central server for training purposes, but by training instances of a model on each user device on their own private data, and then sharing the model updates, not the data, across a set of users that have formed the federation.

Federated Learning Learning-To-Rank +1

Outcome-based Evaluation of Systematic Review Automation

no code implementations30 Jun 2023 Wojciech Kusa, Guido Zuccon, Petr Knoth, Allan Hanbury

We find that accounting for the difference in review outcomes leads to a different assessment of the quality of a system than if traditional evaluation measures were used.

TAR

Exploring the Representation Power of SPLADE Models

1 code implementation29 Jun 2023 Joel Mackenzie, Shengyao Zhuang, Guido Zuccon

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where documents are represented by term impact scores derived from large language models.

Retrieval

Beyond CO2 Emissions: The Overlooked Impact of Water Consumption of Information Retrieval Models

1 code implementation29 Jun 2023 Guido Zuccon, Harrisen Scells, Shengyao Zhuang

As in other fields of artificial intelligence, the information retrieval community has grown interested in investigating the power consumption associated with neural models, particularly models of search.

Information Retrieval Retrieval

Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval

1 code implementation6 May 2023 Shengyao Zhuang, Linjun Shou, Guido Zuccon

Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language models (PLMs) need to be trained to encompass both the relevance matching task and the cross-language alignment task.

Cross-Lingual Information Retrieval Retrieval

Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval

1 code implementation17 Apr 2023 Shengyao Zhuang, Linjun Shou, Jian Pei, Ming Gong, Houxing Ren, Guido Zuccon, Daxin Jiang

To address this challenge, we propose ToRoDer (TypOs-aware bottlenecked pre-training for RObust DEnse Retrieval), a novel re-training strategy for DRs that increases their robustness to misspelled queries while preserving their effectiveness in downstream retrieval tasks.

Language Modelling Retrieval

Dr ChatGPT, tell me what I want to hear: How prompt knowledge impacts health answer correctness

no code implementations23 Feb 2023 Guido Zuccon, Bevan Koopman

Aside from measuring the effectiveness of ChatGPT in this context, we show that the knowledge passed in the prompt can overturn the knowledge encoded in the model and this is, in our experiments, to the detriment of answer correctness.

Question Answering

Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?

no code implementations3 Feb 2023 Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon

The ability of ChatGPT to follow complex instructions and generate queries with high precision makes it a valuable tool for researchers conducting systematic reviews, particularly for rapid reviews where time is a constraint and often trading-off higher precision for lower recall is acceptable.

AgAsk: An Agent to Help Answer Farmer's Questions From Scientific Documents

1 code implementation21 Dec 2022 Bevan Koopman, Ahmed Mourad, Hang Li, Anton van der Vegt, Shengyao Zhuang, Simon Gibson, Yash Dang, David Lawrence, Guido Zuccon

On the basis of these needs we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question.

Information Retrieval Retrieval

MeSH Suggester: A Library and System for MeSH Term Suggestion for Systematic Review Boolean Query Construction

1 code implementation18 Dec 2022 Shuai Wang, Hang Li, Guido Zuccon

One challenge to creating an effective systematic review Boolean query is the selection of effective MeSH Terms to include in the query.

Guiding Neural Entity Alignment with Compatibility

1 code implementation29 Nov 2022 Bing Liu, Harrisen Scells, Wen Hua, Guido Zuccon, Genghong Zhao, Xia Zhang

Making compatible predictions thus should be one of the goals of training an EA model along with fitting the labelled data: this aspect however is neglected in current methods.

Entity Alignment Knowledge Graphs

Dependency-aware Self-training for Entity Alignment

1 code implementation29 Nov 2022 Bing Liu, Tiancheng Lan, Wen Hua, Guido Zuccon

Entity Alignment (EA), which aims to detect entity mappings (i. e. equivalent entity pairs) in different Knowledge Graphs (KGs), is critical for KG fusion.

Entity Alignment Knowledge Graphs

Automated MeSH Term Suggestion for Effective Query Formulation in Systematic Reviews Literature Search

1 code implementation19 Sep 2022 Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon

However, identifying the correct MeSH terms to include in a query is difficult: information experts are often unfamiliar with the MeSH database and unsure about the appropriateness of MeSH terms for a query.

High-quality Task Division for Large-scale Entity Alignment

1 code implementation22 Aug 2022 Bing Liu, Wen Hua, Guido Zuccon, Genghong Zhao, Xia Zhang

To include in the EA subtasks a high proportion of the potential mappings originally present in the large EA task, we devise a counterpart discovery method that exploits the locality principle of the EA task and the power of trained EA models.

Entity Alignment Informativeness +1

Rethinking Persistent Homology for Visual Recognition

no code implementations9 Jul 2022 Ekaterina Khramtsova, Guido Zuccon, Xi Wang, Mahsa Baktashmotlagh

This paper performs a detailed analysis of the effectiveness of topological properties for image classification in various training scenarios, defined by: the number of training samples, the complexity of the training data and the complexity of the backbone network.

Image Classification

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

1 code implementation21 Jun 2022 Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, Daxin Jiang

This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages.

Passage Retrieval Retrieval

How does Feedback Signal Quality Impact Effectiveness of Pseudo Relevance Feedback for Passage Retrieval?

no code implementations12 May 2022 Hang Li, Ahmed Mourad, Bevan Koopman, Guido Zuccon

Pseudo-Relevance Feedback (PRF) assumes that the top results retrieved by a first-stage ranker are relevant to the original query and uses them to improve the query representation for a second round of retrieval.

Passage Retrieval Retrieval

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

no code implementations30 Apr 2022 Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF).

Information Retrieval Language Modelling +1

Is Non-IID Data a Threat in Federated Online Learning to Rank?

1 code implementation20 Apr 2022 Shuyi Wang, Guido Zuccon

A well-known factor that affects the performance of federated learning systems, and that poses serious challenges to these approaches, is that there may be some type of bias in the way data is distributed across clients.

Federated Learning Information Retrieval +2

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

1 code implementation1 Apr 2022 Shengyao Zhuang, Guido Zuccon

We then demonstrate that the root cause of this resides in the input tokenization strategy employed by BERT.

Passage Retrieval Retrieval

Implicit Feedback for Dense Passage Retrieval: A Counterfactual Approach

1 code implementation1 Apr 2022 Shengyao Zhuang, Hang Li, Guido Zuccon

We then exploit such historic implicit interactions to improve the effectiveness of a DR. A key challenge that we study is the effect that biases in the click signal, such as position bias, have on the DRs.

counterfactual Passage Retrieval +2

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training

1 code implementation25 Feb 2022 Shengyao Zhuang, Guido Zuccon

A simple and efficient strategy to validate deep learning checkpoints is the addition of validation loops to execute during training.

Natural Questions Passage Retrieval +1

Reinforcement Online Learning to Rank with Unbiased Reward Shaping

1 code implementation5 Jan 2022 Shengyao Zhuang, Zhihao Qiao, Guido Zuccon

Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users' interactions, such as clicks.

Learning-To-Rank Position

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

1 code implementation13 Dec 2021 Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

Finally, we contribute a study of the generalisability of the ANCE-PRF method when dense retrievers other than ANCE are used for the first round of retrieval and for encoding the PRF signal.

Retrieval

Seed-driven Document Ranking for Systematic Reviews: A Reproducibility Study

1 code implementation8 Dec 2021 Shuai Wang, Harrisen Scells, Ahmed Mourad, Guido Zuccon

Our results also indicate that our reproduced screening prioritisation method, (1) is generalisable across datasets of similar and different topicality compared to the original implementation, (2) that when using multiple seed studies, the effectiveness of the method increases using our techniques to enable this, (3) and that the use of multiple seed studies produces more stable rankings compared to single seed studies.

Document Ranking

ActiveEA: Active Learning for Neural Entity Alignment

1 code implementation EMNLP 2021 Bing Liu, Harrisen Scells, Guido Zuccon, Wen Hua, Genghong Zhao

Entity Alignment (EA) aims to match equivalent entities across different Knowledge Graphs (KGs) and is an essential step of KG fusion.

Active Learning Entity Alignment +1

Dealing with Typos for BERT-based Passage Retrieval and Ranking

2 code implementations EMNLP 2021 Shengyao Zhuang, Guido Zuccon

Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.

Language Modelling Open-Domain Question Answering +5

Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

1 code implementation25 Aug 2021 Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

Text-based PRF results show that the use of PRF had a mixed effect on deep rerankers across different datasets.

Retrieval

Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion

1 code implementation19 Aug 2021 Shengyao Zhuang, Guido Zuccon

BERT-based information retrieval models are expensive, in both time (query latency) and computational resources (energy, hardware cost), making many of these models impractical especially under resource constraints.

Information Retrieval Passage Re-Ranking +2

The Benefits of Word Embeddings Features for Active Learning in Clinical Information Extraction

no code implementations ALTA 2016 Mahnoosh Kholghi, Lance De Vine, Laurianne Sitbon, Guido Zuccon, Anthony Nguyen

This study investigates the use of unsupervised word embeddings and sequence features for sample representation in an active learning framework built to extract clinical concepts from clinical free text.

Active Learning Informativeness +1

Building Evaluation Datasets for Consumer-Oriented Information Retrieval

no code implementations LREC 2016 Lorraine Goeuriot, Liadh Kelly, Guido Zuccon, Joao Palotti

In this paper we present the datasets created by CLEF eHealth Lab from 2013-2015 for evaluation of search solutions to support common people finding health information online.

Information Retrieval Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.