Search Results for author: Shengyao Zhuang

Found 27 papers, 19 papers with code

PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval

1 code implementation • 29 Apr 2024 • Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin, Guido Zuccon

The current use of large language models (LLMs) for zero-shot document ranking follows one of two ways: 1) prompt-based re-ranking methods, which require no further training but are feasible for only re-ranking a handful of candidate documents due to the associated computational costs; and 2) unsupervised contrastive trained dense retrieval methods, which can retrieve relevant documents from the entire corpus but require a large amount of paired text data for contrastive training.

Document Ranking Re-Ranking +1

Paper
Code

Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders

2 code implementations • 10 Apr 2024 • Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen

Cross-encoders are effective passage re-rankers.

Passage Re-Ranking Re-Ranking

179

Paper
Code

Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems

1 code implementation • 20 Feb 2024 • Shengyao Zhuang, Bevan Koopman, Xiaoran Chu, Guido Zuccon

In this paper, we investigate various aspects of embedding models that could influence the recoverability of text using Vec2Text.

Quantization Retrieval

Paper
Code

FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation

no code implementations • 19 Feb 2024 • Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, Guido Zuccon

Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent.

Benchmarking Chatbot +3

Paper
Add Code

Large Language Models for Stemming: Promises, Pitfalls and Failures

no code implementations • 19 Feb 2024 • Shuai Wang, Shengyao Zhuang, Guido Zuccon

With this respect, we identify three avenues, each characterised by different trade-offs in terms of computational cost, effectiveness and robustness : (1) use LLMs to stem the vocabulary for a collection, i. e., the set of unique words that appear in the collection (vocabulary stemming), (2) use LLMs to stem each document separately (contextual stemming), and (3) use LLMs to extract from each document entities that should not be stemmed, then use vocabulary stemming to stem the rest of the terms (entity-based contextual stemming).

Paper
Add Code

Leveraging LLMs for Unsupervised Dense Retriever Ranking

no code implementations • 7 Feb 2024 • Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Guido Zuccon

Existing methodologies for ranking dense retrievers fall short in addressing these domain shift scenarios.

Paper
Add Code

ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search

no code implementations • 31 Jan 2024 • Shuai Wang, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

Our ReSLLM method exploits LLMs to drive the selection of resources in federated search in a zero-shot setting.

Paper
Add Code

Zero-shot Generative Large Language Models for Systematic Review Screening Automation

no code implementations • 12 Jan 2024 • Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, Guido Zuccon

Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions.

Paper
Add Code

Team IELAB at TREC Clinical Trial Track 2023: Enhancing Clinical Trial Retrieval with Neural Rankers and Large Language Models

no code implementations • 3 Jan 2024 • Shengyao Zhuang, Bevan Koopman, Guido Zuccon

We describe team ielab from CSIRO and The University of Queensland's approach to the 2023 TREC Clinical Trials Track.

Retrieval

Paper
Add Code

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

1 code implementation • 20 Oct 2023 • Shengyao Zhuang, Bing Liu, Bevan Koopman, Guido Zuccon

In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document.

Document Ranking Information Retrieval +3

Paper
Code

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

1 code implementation • 14 Oct 2023 • Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, Guido Zuccon

Our approach reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, significantly improving the efficiency of LLM-based zero-shot ranking.

Document Ranking

Paper
Code

Selecting which Dense Retriever to use for Zero-Shot Search

no code implementations • 18 Sep 2023 • Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang, Guido Zuccon

We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i. e. in a zero-shot setting.

Information Retrieval Retrieval

Paper
Add Code

Beyond CO2 Emissions: The Overlooked Impact of Water Consumption of Information Retrieval Models

1 code implementation • 29 Jun 2023 • Guido Zuccon, Harrisen Scells, Shengyao Zhuang

As in other fields of artificial intelligence, the information retrieval community has grown interested in investigating the power consumption associated with neural models, particularly models of search.

Information Retrieval Retrieval

Paper
Code

Exploring the Representation Power of SPLADE Models

1 code implementation • 29 Jun 2023 • Joel Mackenzie, Shengyao Zhuang, Guido Zuccon

The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where documents are represented by term impact scores derived from large language models.

Retrieval

Paper
Code

Augmenting Passage Representations with Query Generation for Enhanced Cross-Lingual Dense Retrieval

1 code implementation • 6 May 2023 • Shengyao Zhuang, Linjun Shou, Guido Zuccon

Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language models (PLMs) need to be trained to encompass both the relevance matching task and the cross-language alignment task.

Cross-Lingual Information Retrieval Retrieval

Paper
Code

Typos-aware Bottlenecked Pre-Training for Robust Dense Retrieval

1 code implementation • 17 Apr 2023 • Shengyao Zhuang, Linjun Shou, Jian Pei, Ming Gong, Houxing Ren, Guido Zuccon, Daxin Jiang

To address this challenge, we propose ToRoDer (TypOs-aware bottlenecked pre-training for RObust DEnse Retrieval), a novel re-training strategy for DRs that increases their robustness to misspelled queries while preserving their effectiveness in downstream retrieval tasks.

Language Modelling Retrieval

Paper
Code

AgAsk: An Agent to Help Answer Farmer's Questions From Scientific Documents

1 code implementation • 21 Dec 2022 • Bevan Koopman, Ahmed Mourad, Hang Li, Anton van der Vegt, Shengyao Zhuang, Simon Gibson, Yash Dang, David Lawrence, Guido Zuccon

On the basis of these needs we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question.

Information Retrieval Retrieval

Paper
Code

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation

1 code implementation • 21 Jun 2022 • Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, Daxin Jiang

This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages.

Passage Retrieval Retrieval

Paper
Code

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

no code implementations • 30 Apr 2022 • Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF).

Information Retrieval Language Modelling +1

Paper
Add Code

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

1 code implementation • 1 Apr 2022 • Shengyao Zhuang, Guido Zuccon

We then demonstrate that the root cause of this resides in the input tokenization strategy employed by BERT.

Passage Retrieval Retrieval

Paper
Code

Implicit Feedback for Dense Passage Retrieval: A Counterfactual Approach

1 code implementation • 1 Apr 2022 • Shengyao Zhuang, Hang Li, Guido Zuccon

We then exploit such historic implicit interactions to improve the effectiveness of a DR. A key challenge that we study is the effect that biases in the click signal, such as position bias, have on the DRs.

counterfactual Passage Retrieval +2

Paper
Code

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training

1 code implementation • 25 Feb 2022 • Shengyao Zhuang, Guido Zuccon

A simple and efficient strategy to validate deep learning checkpoints is the addition of validation loops to execute during training.

Natural Questions Passage Retrieval +1

Paper
Code

Reinforcement Online Learning to Rank with Unbiased Reward Shaping

1 code implementation • 5 Jan 2022 • Shengyao Zhuang, Zhihao Qiao, Guido Zuccon

Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users' interactions, such as clicks.

Learning-To-Rank Position

Paper
Code

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

1 code implementation • 13 Dec 2021 • Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

Finally, we contribute a study of the generalisability of the ANCE-PRF method when dense retrievers other than ANCE are used for the first round of retrieval and for encoding the PRF signal.

Retrieval

Paper
Code

Dealing with Typos for BERT-based Passage Retrieval and Ranking

2 code implementations • EMNLP 2021 • Shengyao Zhuang, Guido Zuccon

Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.

Language Modelling Open-Domain Question Answering +5

Paper
Code

Pseudo Relevance Feedback with Deep Language Models and Dense Retrievers: Successes and Pitfalls

1 code implementation • 25 Aug 2021 • Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

Text-based PRF results show that the use of PRF had a mixed effect on deep rerankers across different datasets.

Retrieval

Paper
Code

Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion

1 code implementation • 19 Aug 2021 • Shengyao Zhuang, Guido Zuccon

BERT-based information retrieval models are expensive, in both time (query latency) and computational resources (energy, hardware cost), making many of these models impractical especially under resource constraints.

Information Retrieval Passage Re-Ranking +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.