no code implementations • 29 Mar 2024 • Jinhyuk Lee, Zhuyun Dai, Xiaoqi Ren, Blair Chen, Daniel Cer, Jeremy R. Cole, Kai Hui, Michael Boratko, Rajvi Kapadia, Wen Ding, Yi Luan, Sai Meher Karthik Duddu, Gustavo Hernandez Abrego, Weiqiang Shi, Nithi Gupta, Aditya Kusupati, Prateek Jain, Siddhartha Reddy Jonnalagadda, Ming-Wei Chang, Iftekhar Naim
On the Massive Text Embedding Benchmark (MTEB), Gecko with 256 embedding dimensions outperforms all existing entries with 768 embedding size.
no code implementations • 15 Nov 2023 • Minghan Li, Honglei Zhuang, Kai Hui, Zhen Qin, Jimmy Lin, Rolf Jagerman, Xuanhui Wang, Michael Bendersky
In this paper, we re-examine this conclusion and raise the following question: Can query expansion improve generalization of strong cross-encoder rankers?
no code implementations • 22 Oct 2023 • Andrew Drozdov, Honglei Zhuang, Zhuyun Dai, Zhen Qin, Razieh Rahimi, Xuanhui Wang, Dana Alon, Mohit Iyyer, Andrew McCallum, Donald Metzler, Kai Hui
Recent studies show that large language models (LLMs) can be instructed to effectively perform zero-shot passage re-ranking, in which the results of a first stage retrieval method, such as BM25, are rated and reordered to improve relevance.
no code implementations • 21 Oct 2023 • Honglei Zhuang, Zhen Qin, Kai Hui, Junru Wu, Le Yan, Xuanhui Wang, Michael Bendersky
We propose to incorporate fine-grained relevance labels into the prompt for LLM rankers, enabling them to better differentiate among documents with different levels of relevance to the query and thus derive a more accurate ranking.
no code implementations • 30 Jun 2023 • Zhen Qin, Rolf Jagerman, Kai Hui, Honglei Zhuang, Junru Wu, Le Yan, Jiaming Shen, Tianqi Liu, Jialu Liu, Donald Metzler, Xuanhui Wang, Michael Bendersky
Ranking documents using Large Language Models (LLMs) by directly feeding the query and candidate documents into the prompt is an interesting and practical problem.
no code implementations • 19 May 2023 • Ronak Pradeep, Kai Hui, Jai Gupta, Adam D. Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, Vinh Q. Tran
Popularized by the Differentiable Search Index, the emerging paradigm of generative retrieval re-frames the classic information retrieval problem into a sequence-to-sequence modeling task, forgoing external indices and encoding an entire document corpus within a single Transformer.
1 code implementation • 15 Dec 2022 • Bernd Bohnet, Vinh Q. Tran, Pat Verga, Roee Aharoni, Daniel Andor, Livio Baldini Soares, Massimiliano Ciaramita, Jacob Eisenstein, Kuzman Ganchev, Jonathan Herzig, Kai Hui, Tom Kwiatkowski, Ji Ma, Jianmo Ni, Lierni Sestorain Saralegui, Tal Schuster, William W. Cohen, Michael Collins, Dipanjan Das, Donald Metzler, Slav Petrov, Kellie Webster
We take human annotations as a gold standard and show that a correlated automatic metric is suitable for development.
no code implementations • 12 Oct 2022 • Honglei Zhuang, Zhen Qin, Rolf Jagerman, Kai Hui, Ji Ma, Jing Lu, Jianmo Ni, Xuanhui Wang, Michael Bendersky
Recently, substantial progress has been made in text ranking based on pretrained language models such as BERT.
no code implementations • 11 Oct 2022 • Kai Hui, Tao Chen, Zhen Qin, Honglei Zhuang, Fernando Diaz, Mike Bendersky, Don Metzler
Retrieval augmentation has shown promising improvements in different tasks.
no code implementations • Findings (ACL) 2022 • Kai Hui, Honglei Zhuang, Tao Chen, Zhen Qin, Jing Lu, Dara Bahri, Ji Ma, Jai Prakash Gupta, Cicero Nogueira dos santos, Yi Tay, Don Metzler
This results in significant inference time speedups since the decoder-only architecture only needs to learn to interpret static encoder embeddings during inference.
1 code implementation • 14 Feb 2022 • Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, Donald Metzler
In this paper, we demonstrate that information retrieval can be accomplished with a single Transformer, in which all information about the corpus is encoded in the parameters of the model.
3 code implementations • ICLR 2022 • Vamsi Aribandi, Yi Tay, Tal Schuster, Jinfeng Rao, Huaixiu Steven Zheng, Sanket Vaibhav Mehta, Honglei Zhuang, Vinh Q. Tran, Dara Bahri, Jianmo Ni, Jai Gupta, Kai Hui, Sebastian Ruder, Donald Metzler
Despite the recent success of multi-task learning and transfer learning for natural language processing (NLP), few works have systematically studied the effect of scaling up the number of tasks during pre-training.
no code implementations • 18 Apr 2021 • Kai Hui, Klaus Berberich
In this work, we collect judgments from multiple judges using a crowdsourcing platform and aggregate them to compare the two kinds of preference judgments in terms of transitivity, time consumption, and quality.
no code implementations • 17 Apr 2021 • Xiaoyang Chen, Kai Hui, Ben He, Xianpei Han, Le Sun, Zheng Ye
BERT-based text ranking models have dramatically advanced the state-of-the-art in ad-hoc retrieval, wherein most models tend to consider individual query-document pairs independently.
4 code implementations • 16 Sep 2020 • Xuanang Chen, Ben He, Kai Hui, Le Sun, Yingfei Sun
Despite the effectiveness of utilizing the BERT model for document ranking, the high computational cost of such approaches limits their uses.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zhi Zheng, Kai Hui, Ben He, Xianpei Han, Le Sun, Andrew Yates
Query expansion aims to mitigate the mismatch between the language used in a query and in a document.
1 code implementation • EMNLP 2018 • Canjia Li, Yingfei Sun, Ben He, Le Wang, Kai Hui, Andrew Yates, Le Sun, Jungang Xu
Pseudo-relevance feedback (PRF) is commonly used to boost the performance of traditional information retrieval (IR) models by using top-ranked documents to identify and weight new query terms, thereby reducing the effect of query-document vocabulary mismatches.
Ranked #9 on Ad-Hoc Information Retrieval on TREC Robust04
no code implementations • ACL 2018 • Cancan Jin, Ben He, Kai Hui, Le Sun
Existing automated essay scoring (AES) models rely on rated essays for the target prompt as training data.
1 code implementation • 1 Jul 2017 • Sean MacAvaney, Andrew Yates, Kai Hui, Ophir Frieder
One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training.
3 code implementations • 30 Jun 2017 • Kai Hui, Andrew Yates, Klaus Berberich, Gerard de Melo
Neural IR models, such as DRMM and PACRR, have achieved strong results by successfully capturing relevance matching signals.
no code implementations • 27 Jun 2017 • Andrew Yates, Kai Hui
Recent neural IR models have demonstrated deep learning's utility in ad-hoc information retrieval.
3 code implementations • EMNLP 2017 • Kai Hui, Andrew Yates, Klaus Berberich, Gerard de Melo
In order to adopt deep learning for information retrieval, models are needed that can capture all relevant information required to assess the relevance of a document to a given user query.