no code implementations • 18 Sep 2023 • Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang, Guido Zuccon
We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i. e. in a zero-shot setting.
no code implementations • 17 Sep 2023 • Guido Zuccon, Bevan Koopman, Razia Shaik
We find that ChatGPT provides correct or partially correct answers in about half of the cases (50. 6% of the times), but its suggested references only exist 14% of the times.
no code implementations • 12 Sep 2023 • Sophia Althammer, Guido Zuccon, Sebastian Hofstätter, Suzan Verberne, Allan Hanbury
We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost.
1 code implementation • 11 Sep 2023 • Shuai Wang, Harrisen Scells, Martin Potthast, Bevan Koopman, Guido Zuccon
Screening prioritisation in medical systematic reviews aims to rank the set of documents retrieved by complex Boolean queries.
no code implementations • 4 Jul 2023 • Shuyi Wang, Guido Zuccon
For this, FOLTR trains learning to rank models in an online manner -- i. e. by exploiting users' interactions with the search systems (queries, clicks), rather than labels -- and federatively -- i. e. by not aggregating interaction data in a central server for training purposes, but by training instances of a model on each user device on their own private data, and then sharing the model updates, not the data, across a set of users that have formed the federation.
no code implementations • 30 Jun 2023 • Wojciech Kusa, Guido Zuccon, Petr Knoth, Allan Hanbury
We find that accounting for the difference in review outcomes leads to a different assessment of the quality of a system than if traditional evaluation measures were used.
1 code implementation • 29 Jun 2023 • Guido Zuccon, Harrisen Scells, Shengyao Zhuang
As in other fields of artificial intelligence, the information retrieval community has grown interested in investigating the power consumption associated with neural models, particularly models of search.
1 code implementation • 29 Jun 2023 • Joel Mackenzie, Shengyao Zhuang, Guido Zuccon
The SPLADE (SParse Lexical AnD Expansion) model is a highly effective approach to learned sparse retrieval, where documents are represented by term impact scores derived from large language models.
1 code implementation • 6 May 2023 • Shengyao Zhuang, Linjun Shou, Guido Zuccon
Effective cross-lingual dense retrieval methods that rely on multilingual pre-trained language models (PLMs) need to be trained to encompass both the relevance matching task and the cross-language alignment task.
no code implementations • 17 Apr 2023 • Shengyao Zhuang, Linjun Shou, Jian Pei, Ming Gong, Houxing Ren, Guido Zuccon, Daxin Jiang
To address this challenge, we propose ToRoDer (TypOs-aware bottlenecked pre-training for RObust DEnse Retrieval), a novel \textit{pre-training} strategy for DRs that increases their robustness to misspelled queries while preserving their effectiveness in downstream retrieval tasks.
no code implementations • 23 Feb 2023 • Guido Zuccon, Bevan Koopman
Aside from measuring the effectiveness of ChatGPT in this context, we show that the knowledge passed in the prompt can overturn the knowledge encoded in the model and this is, in our experiments, to the detriment of answer correctness.
no code implementations • 3 Feb 2023 • Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon
The ability of ChatGPT to follow complex instructions and generate queries with high precision makes it a valuable tool for researchers conducting systematic reviews, particularly for rapid reviews where time is a constraint and often trading-off higher precision for lower recall is acceptable.
1 code implementation • 21 Dec 2022 • Bevan Koopman, Ahmed Mourad, Hang Li, Anton van der Vegt, Shengyao Zhuang, Simon Gibson, Yash Dang, David Lawrence, Guido Zuccon
On the basis of these needs we release an information retrieval test collection comprising real questions, a large collection of scientific documents split in passages, and ground truth relevance assessments indicating which passages are relevant to each question.
1 code implementation • 18 Dec 2022 • Shuai Wang, Hang Li, Guido Zuccon
One challenge to creating an effective systematic review Boolean query is the selection of effective MeSH Terms to include in the query.
no code implementations • 18 Dec 2022 • Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon
An empirical analysis compares how effective neural methods compare to traditional methods for this task.
1 code implementation • 29 Nov 2022 • Bing Liu, Tiancheng Lan, Wen Hua, Guido Zuccon
Entity Alignment (EA), which aims to detect entity mappings (i. e. equivalent entity pairs) in different Knowledge Graphs (KGs), is critical for KG fusion.
1 code implementation • 29 Nov 2022 • Bing Liu, Harrisen Scells, Wen Hua, Guido Zuccon, Genghong Zhao, Xia Zhang
Making compatible predictions thus should be one of the goals of training an EA model along with fitting the labelled data: this aspect however is neglected in current methods.
1 code implementation • 19 Sep 2022 • Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon
However, identifying the correct MeSH terms to include in a query is difficult: information experts are often unfamiliar with the MeSH database and unsure about the appropriateness of MeSH terms for a query.
1 code implementation • 22 Aug 2022 • Bing Liu, Wen Hua, Guido Zuccon, Genghong Zhao, Xia Zhang
To include in the EA subtasks a high proportion of the potential mappings originally present in the large EA task, we devise a counterpart discovery method that exploits the locality principle of the EA task and the power of trained EA models.
no code implementations • 9 Jul 2022 • Ekaterina Khramtsova, Guido Zuccon, Xi Wang, Mahsa Baktashmotlagh
This paper performs a detailed analysis of the effectiveness of topological properties for image classification in various training scenarios, defined by: the number of training samples, the complexity of the training data and the complexity of the backbone network.
1 code implementation • 21 Jun 2022 • Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, Daxin Jiang
This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages.
no code implementations • 12 May 2022 • Hang Li, Ahmed Mourad, Bevan Koopman, Guido Zuccon
Pseudo-Relevance Feedback (PRF) assumes that the top results retrieved by a first-stage ranker are relevant to the original query and uses them to improve the query representation for a second round of retrieval.
no code implementations • 30 Apr 2022 • Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon
In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF).
1 code implementation • 20 Apr 2022 • Shuyi Wang, Guido Zuccon
A well-known factor that affects the performance of federated learning systems, and that poses serious challenges to these approaches, is that there may be some type of bias in the way data is distributed across clients.
1 code implementation • 6 Apr 2022 • Shuai Wang, Harrisen Scells, Justin Clark, Bevan Koopman, Guido Zuccon
However, we show pseudo seed studies are not representative of real seed studies used by information specialists.
1 code implementation • 1 Apr 2022 • Shengyao Zhuang, Guido Zuccon
We then demonstrate that the root cause of this resides in the input tokenization strategy employed by BERT.
1 code implementation • 1 Apr 2022 • Shengyao Zhuang, Hang Li, Guido Zuccon
We then exploit such historic implicit interactions to improve the effectiveness of a DR. A key challenge that we study is the effect that biases in the click signal, such as position bias, have on the DRs.
1 code implementation • 25 Feb 2022 • Shengyao Zhuang, Guido Zuccon
A simple and efficient strategy to validate deep learning checkpoints is the addition of validation loops to execute during training.
no code implementations • 15 Feb 2022 • Daniel Locke, Guido Zuccon
Case law retrieval is the retrieval of judicial decisions relevant to a legal question.
1 code implementation • 5 Jan 2022 • Shengyao Zhuang, Zhihao Qiao, Guido Zuccon
Online learning to rank (OLTR) aims to learn a ranker directly from implicit feedback derived from users' interactions, such as clicks.
1 code implementation • 13 Dec 2021 • Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon
Finally, we contribute a study of the generalisability of the ANCE-PRF method when dense retrievers other than ANCE are used for the first round of retrieval and for encoding the PRF signal.
1 code implementation • 8 Dec 2021 • Shuai Wang, Harrisen Scells, Ahmed Mourad, Guido Zuccon
Our results also indicate that our reproduced screening prioritisation method, (1) is generalisable across datasets of similar and different topicality compared to the original implementation, (2) that when using multiple seed studies, the effectiveness of the method increases using our techniques to enable this, (3) and that the use of multiple seed studies produces more stable rankings compared to single seed studies.
1 code implementation • EMNLP 2021 • Bing Liu, Harrisen Scells, Guido Zuccon, Wen Hua, Genghong Zhao
Entity Alignment (EA) aims to match equivalent entities across different Knowledge Graphs (KGs) and is an essential step of KG fusion.
2 code implementations • EMNLP 2021 • Shengyao Zhuang, Guido Zuccon
Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.
1 code implementation • 25 Aug 2021 • Hang Li, Ahmed Mourad, Shengyao Zhuang, Bevan Koopman, Guido Zuccon
Text-based PRF results show that the use of PRF had a mixed effect on deep rerankers across different datasets.
1 code implementation • 19 Aug 2021 • Shengyao Zhuang, Guido Zuccon
BERT-based information retrieval models are expensive, in both time (query latency) and computational resources (energy, hardware cost), making many of these models impractical especially under resource constraints.
no code implementations • ALTA 2016 • Mahnoosh Kholghi, Lance De Vine, Laurianne Sitbon, Guido Zuccon, Anthony Nguyen
This study investigates the use of unsupervised word embeddings and sequence features for sample representation in an active learning framework built to extract clinical concepts from clinical free text.
no code implementations • LREC 2016 • Lorraine Goeuriot, Liadh Kelly, Guido Zuccon, Joao Palotti
In this paper we present the datasets created by CLEF eHealth Lab from 2013-2015 for evaluation of search solutions to support common people finding health information online.