1 code implementation • 29 Apr 2024 • Shengyao Zhuang, Xueguang Ma, Bevan Koopman, Jimmy Lin, Guido Zuccon
The current use of large language models (LLMs) for zero-shot document ranking follows one of two ways: 1) prompt-based re-ranking methods, which require no further training but are feasible for only re-ranking a handful of candidate documents due to the associated computational costs; and 2) unsupervised contrastive trained dense retrieval methods, which can retrieve relevant documents from the entire corpus but require a large amount of paired text data for contrastive training.
2 code implementations • 10 Apr 2024 • Ferdinand Schlatt, Maik Fröbe, Harrisen Scells, Shengyao Zhuang, Bevan Koopman, Guido Zuccon, Benno Stein, Martin Potthast, Matthias Hagen
Cross-encoders are effective passage re-rankers.
1 code implementation • 20 Feb 2024 • Shengyao Zhuang, Bevan Koopman, Xiaoran Chu, Guido Zuccon
In this paper, we investigate various aspects of embedding models that could influence the recoverability of text using Vec2Text.
no code implementations • 19 Feb 2024 • Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, Guido Zuccon
Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent.
no code implementations • 19 Feb 2024 • Shuai Wang, Shengyao Zhuang, Guido Zuccon
With this respect, we identify three avenues, each characterised by different trade-offs in terms of computational cost, effectiveness and robustness : (1) use LLMs to stem the vocabulary for a collection, i. e., the set of unique words that appear in the collection (vocabulary stemming), (2) use LLMs to stem each document separately (contextual stemming), and (3) use LLMs to extract from each document entities that should not be stemmed, then use vocabulary stemming to stem the rest of the terms (entity-based contextual stemming).
no code implementations • 7 Feb 2024 • Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Guido Zuccon
Existing methodologies for ranking dense retrievers fall short in addressing these domain shift scenarios.
no code implementations • 31 Jan 2024 • Shuai Wang, Shengyao Zhuang, Bevan Koopman, Guido Zuccon
Our ReSLLM method exploits LLMs to drive the selection of resources in federated search in a zero-shot setting.
no code implementations • 12 Jan 2024 • Shuai Wang, Harrisen Scells, Shengyao Zhuang, Martin Potthast, Bevan Koopman, Guido Zuccon
Systematic reviews are crucial for evidence-based medicine as they comprehensively analyse published research findings on specific questions.
no code implementations • 3 Jan 2024 • Shengyao Zhuang, Bevan Koopman, Guido Zuccon
We describe team ielab from CSIRO and The University of Queensland's approach to the 2023 TREC Clinical Trials Track.
1 code implementation • 20 Oct 2023 • Shengyao Zhuang, Bing Liu, Bevan Koopman, Guido Zuccon
In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document.
1 code implementation • 14 Oct 2023 • Shengyao Zhuang, Honglei Zhuang, Bevan Koopman, Guido Zuccon
Our approach reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, significantly improving the efficiency of LLM-based zero-shot ranking.
no code implementations • 18 Sep 2023 • Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang, Guido Zuccon
We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i. e. in a zero-shot setting.
1 code implementation • 29 Jun 2023 • Guido Zuccon, Harrisen Scells, Shengyao Zhuang
As in other fields of artificial intelligence, the information retrieval community has grown interested in investigating the power consumption associated with neural models, particularly models of search.
1 code implementation • 29 Jun 2023 • Joel Mackenzie, Shengyao Zhuang, Guido Zuccon
The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where documents are represented by term impact scores derived from large language models.
1 code implementation • 6 May 2023 • Shengyao Zhuang, Linjun Shou, Guido Zuccon
Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language models (PLMs) need to be trained to encompass both the relevance matching task and the cross-language alignment task.
1 code implementation • 17 Apr 2023 • Shengyao Zhuang, Linjun Shou, Jian Pei, Ming Gong, Houxing Ren, Guido Zuccon, Daxin Jiang
To address this challenge, we propose ToRoDer (TypOs-aware bottlenecked pre-training for RObust DEnse Retrieval), a novel re-training strategy for DRs that increases their robustness to misspelled queries while preserving their effectiveness in downstream retrieval tasks.
1 code implementation • 21 Dec 2022 • Bevan Koopman, Ahmed Mourad, Hang Li, Anton van der Vegt, Shengyao Zhuang, Simon Gibson, Yash Dang, David Lawrence, Guido Zuccon
On the basis of these needs we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question.
1 code implementation • 21 Jun 2022 • Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, Daxin Jiang
This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages.
no code implementations • 30 Apr 2022 • Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon
In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF).
1 code implementation • 1 Apr 2022 • Shengyao Zhuang, Guido Zuccon
We then demonstrate that the root cause of this resides in the input tokenization strategy employed by BERT.
1 code implementation • 1 Apr 2022 • Shengyao Zhuang, Hang Li, Guido Zuccon
We then exploit such historic implicit interactions to improve the effectiveness of a DR. A key challenge that we study is the effect that biases in the click signal, such as position bias, have on the DRs.
1 code implementation • 25 Feb 2022 • Shengyao Zhuang, Guido Zuccon
A simple and efficient strategy to validate deep learning checkpoints is the addition of validation loops to execute during training.
1 code implementation • 5 Jan 2022 • Shengyao Zhuang, Zhihao Qiao, Guido Zuccon
Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users' interactions, such as clicks.
1 code implementation • 13 Dec 2021 • Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon
Finally, we contribute a study of the generalisability of the ANCE-PRF method when dense retrievers other than ANCE are used for the first round of retrieval and for encoding the PRF signal.
2 code implementations • EMNLP 2021 • Shengyao Zhuang, Guido Zuccon
Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.
1 code implementation • 25 Aug 2021 • Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon
Text-based PRF results show that the use of PRF had a mixed effect on deep rerankers across different datasets.
1 code implementation • 19 Aug 2021 • Shengyao Zhuang, Guido Zuccon
BERT-based information retrieval models are expensive, in both time (query latency) and computational resources (energy, hardware cost), making many of these models impractical especially under resource constraints.