Search Results for author: Shengyao Zhuang

Found 26 papers, 18 papers with code

Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems

1 code implementation20 Feb 2024 Shengyao Zhuang, Bevan Koopman, Xiaoran Chu, Guido Zuccon

In this paper, we investigate various aspects of embedding models that could influence the recoverability of text using Vec2Text.

Quantization Retrieval

FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation

no code implementations19 Feb 2024 Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, Guido Zuccon

Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent.

Benchmarking Chatbot +3

Large Language Models for Stemming: Promises, Pitfalls and Failures

no code implementations19 Feb 2024 Shuai Wang, Shengyao Zhuang, Guido Zuccon

With this respect, we identify three avenues, each characterised by different trade-offs in terms of computational cost, effectiveness and robustness : (1) use LLMs to stem the vocabulary for a collection, i. e., the set of unique words that appear in the collection (vocabulary stemming), (2) use LLMs to stem each document separately (contextual stemming), and (3) use LLMs to extract from each document entities that should not be stemmed, then use vocabulary stemming to stem the rest of the terms (entity-based contextual stemming).

Leveraging LLMs for Unsupervised Dense Retriever Ranking

no code implementations7 Feb 2024 Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Guido Zuccon

Existing methodologies for ranking dense retrievers fall short in addressing these domain shift scenarios.

ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search

no code implementations31 Jan 2024 Shuai Wang, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

Our ReSLLM method exploits LLMs to drive the selection of resources in federated search in a zero-shot setting.

Zero-shot Generative Large Language Models for Systematic Review Screening Automation

no code implementations12 Jan 2024 Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, Guido Zuccon

Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions.

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

1 code implementation20 Oct 2023 Shengyao Zhuang, Bing Liu, Bevan Koopman, Guido Zuccon

In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document.

Document Ranking Information Retrieval +3

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

1 code implementation14 Oct 2023 Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, Guido Zuccon

Our approach reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, significantly improving the efficiency of LLM-based zero-shot ranking.

Document Ranking

Selecting which Dense Retriever to use for Zero-Shot Search

no code implementations18 Sep 2023 Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang, Guido Zuccon

We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i. e. in a zero-shot setting.

Information Retrieval Retrieval

Beyond CO2 Emissions: The Overlooked Impact of Water Consumption of Information Retrieval Models

1 code implementation29 Jun 2023 Guido Zuccon, Harrisen Scells, Shengyao Zhuang

As in other fields of artificial intelligence, the information retrieval community has grown interested in investigating the power consumption associated with neural models, particularly models of search.

Information Retrieval Retrieval

Exploring the Representation Power of SPLADE Models

1 code implementation29 Jun 2023 Joel Mackenzie, Shengyao Zhuang, Guido Zuccon

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where documents are represented by term impact scores derived from large language models.


Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval

1 code implementation6 May 2023 Shengyao Zhuang, Linjun Shou, Guido Zuccon

Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language models (PLMs) need to be trained to encompass both the relevance matching task and the cross-language alignment task.

Cross-Lingual Information Retrieval Retrieval

Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval

1 code implementation17 Apr 2023 Shengyao Zhuang, Linjun Shou, Jian Pei, Ming Gong, Houxing Ren, Guido Zuccon, Daxin Jiang

To address this challenge, we propose ToRoDer (TypOs-aware bottlenecked pre-training for RObust DEnse Retrieval), a novel re-training strategy for DRs that increases their robustness to misspelled queries while preserving their effectiveness in downstream retrieval tasks.

Language Modelling Retrieval

AgAsk: An Agent to Help Answer Farmer's Questions From Scientific Documents

1 code implementation21 Dec 2022 Bevan Koopman, Ahmed Mourad, Hang Li, Anton van der Vegt, Shengyao Zhuang, Simon Gibson, Yash Dang, David Lawrence, Guido Zuccon

On the basis of these needs we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question.

Information Retrieval Retrieval

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

1 code implementation21 Jun 2022 Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, Daxin Jiang

This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages.

Passage Retrieval Retrieval

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

no code implementations30 Apr 2022 Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF).

Information Retrieval Language Modelling +1

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

1 code implementation1 Apr 2022 Shengyao Zhuang, Guido Zuccon

We then demonstrate that the root cause of this resides in the input tokenization strategy employed by BERT.

Passage Retrieval Retrieval

Implicit Feedback for Dense Passage Retrieval: A Counterfactual Approach

1 code implementation1 Apr 2022 Shengyao Zhuang, Hang Li, Guido Zuccon

We then exploit such historic implicit interactions to improve the effectiveness of a DR. A key challenge that we study is the effect that biases in the click signal, such as position bias, have on the DRs.

counterfactual Passage Retrieval +2

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training

1 code implementation25 Feb 2022 Shengyao Zhuang, Guido Zuccon

A simple and efficient strategy to validate deep learning checkpoints is the addition of validation loops to execute during training.

Natural Questions Passage Retrieval +1

Reinforcement Online Learning to Rank with Unbiased Reward Shaping

1 code implementation5 Jan 2022 Shengyao Zhuang, Zhihao Qiao, Guido Zuccon

Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users' interactions, such as clicks.

Learning-To-Rank Position

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

1 code implementation13 Dec 2021 Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

Finally, we contribute a study of the generalisability of the ANCE-PRF method when dense retrievers other than ANCE are used for the first round of retrieval and for encoding the PRF signal.


Dealing with Typos for BERT-based Passage Retrieval and Ranking

2 code implementations EMNLP 2021 Shengyao Zhuang, Guido Zuccon

Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.

Language Modelling Open-Domain Question Answering +5

Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

1 code implementation25 Aug 2021 Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

Text-based PRF results show that the use of PRF had a mixed effect on deep rerankers across different datasets.


Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion

1 code implementation19 Aug 2021 Shengyao Zhuang, Guido Zuccon

BERT-based information retrieval models are expensive, in both time (query latency) and computational resources (energy, hardware cost), making many of these models impractical especially under resource constraints.

Information Retrieval Passage Re-Ranking +2

Cannot find the paper you are looking for? You can Submit a new open access paper.