no code implementations • 19 Jan 2025 • Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Shihao Liu, Shuaiqing Wang, Dawei Yin, Xueqi Cheng
Directly applying GR to book search is a challenge due to the unique characteristics of book search: The model needs to retain the complex, multi-faceted information of the book, which increases the demand for labeled data.
1 code implementation • 25 Dec 2024 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Changjiang Zhou, Maarten de Rijke, Xueqi Cheng
Based on this taxonomy, we conduct empirical studies to analyze the OOD robustness of representative generative IR models against dense retrieval models.
1 code implementation • 25 Dec 2024 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Neural ranking models (NRMs) have been shown to be highly effective in terms of retrieval performance.
1 code implementation • 16 Oct 2024 • Zhihao Zhang, Yixing Fan, Ruqing Zhang, Jiafeng Guo
To bridge this gap, we introduce a new claim decomposition benchmark, which requires building system that can identify atomic and checkworthy claims for LLM responses.
no code implementations • 15 Oct 2024 • Haosheng Qian, Yixing Fan, Ruqing Zhang, Jiafeng Guo
The results on WebGLM-QA, ASQA and ELI5 datasets show that our method substantially improves the quality of citations in responses generated by LLMs.
no code implementations • 27 Sep 2024 • Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Xueqi Cheng
GR$^2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training.
1 code implementation • 24 Sep 2024 • Lu Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
Our research identifies two critical latent factors affecting RAG's confidence in its predictions: the quality of the retrieved results and the manner in which these results are utilized.
1 code implementation • 23 Sep 2024 • Weichao Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
We have developed a Chinese-language benchmark, PatentMIA, to assess the performance of detection approaches for LLMs on Chinese text.
no code implementations • 16 Jul 2024 • Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query.
1 code implementation • 9 Jul 2024 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Recent advances in neural information retrieval (IR) models have significantly enhanced their effectiveness over various IR tasks.
no code implementations • 13 Jun 2024 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke
Beyond effectiveness, the robustness of an information retrieval (IR) system is increasingly attracting attention.
no code implementations • 12 Jun 2024 • Feini Huang, Shijie Jiang, Lu Li, Yongkun Zhang, Ye Zhang, Ruqing Zhang, Qingliang Li, Danxi Li, Wei Shangguan, Yongjiu Dai
In recent years, artificial intelligence (AI) rapidly accelerated its influence and is expected to promote the development of Earth system science (ESS) if properly harnessed.
no code implementations • 2 Apr 2024 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
However, limiting perturbations to a single level of granularity may reduce the flexibility of adversarial examples, thereby diminishing the potential threat of the attack.
1 code implementation • 28 Mar 2024 • Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Retrieval-augmented generation (RAG) is considered to be a promising approach to alleviate the hallucination issue of large language models (LLMs), and it has received widespread attention from researchers recently.
1 code implementation • 19 Mar 2024 • Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Xueqi Cheng
Specifically, we view the generation of a ranked docid list as a sequence learning process: at each step we learn a subset of parameters that maximizes the corresponding generation likelihood of the $i$-th docid given the (preceding) top $i-1$ docids.
1 code implementation • 26 Feb 2024 • Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Very recently, a pre-trained generative retrieval model for KILTs, named CorpusBrain, was proposed and reached new state-of-the-art retrieval performance.
no code implementations • 9 Feb 2024 • Lu Chen, Wei Huang, Ruqing Zhang, Wei Chen, Jiafeng Guo, Xueqi Cheng
The key idea is to learn task-required causal factors and only use those to make predictions for a given task.
1 code implementation • 16 Dec 2023 • Run-Ze Fan, Yixing Fan, Jiangui Chen, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng
Automatic mainstream hashtag recommendation aims to accurately provide users with concise and popular topical hashtags before publication.
no code implementations • 16 Dec 2023 • Yu-An Liu, Ruqing Zhang, Mingkun Zhang, Wei Chen, Maarten de Rijke, Jiafeng Guo, Xueqi Cheng
We decompose the robust ranking error into two components, i. e., a natural ranking error for effectiveness evaluation and a boundary ranking error for assessing adversarial robustness.
no code implementations • 6 Nov 2023 • Yinqiong Cai, Yixing Fan, Keping Bi, Jiafeng Guo, Wei Chen, Ruqing Zhang, Xueqi Cheng
The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently.
1 code implementation • 18 Oct 2023 • Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.
no code implementations • 29 Aug 2023 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng
We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product Quantization, which updates a partial quantization codebook according to two adaptive thresholds; and (ii) To memorize new documents for querying without forgetting previous knowledge, we propose a memory-augmented learning mechanism, to form meaningful connections between old and new documents.
1 code implementation • 24 Aug 2023 • Lu Chen, Ruqing Zhang, Wei Huang, Wei Chen, Jiafeng Guo, Xueqi Cheng
The key idea is to reformulate the Variational Auto-encoder (VAE) to fit the joint distribution of the document and summary variables from the training corpus.
no code implementations • 19 Aug 2023 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng
The AREA task is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model in response to a query.
no code implementations • 22 Jun 2023 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Wei Chen, Xueqi Cheng
Recently, we have witnessed generative retrieval increasingly gaining attention in the information retrieval (IR) field, which retrieves documents by directly generating their identifiers.
no code implementations • 5 Jun 2023 • Gabriel Bénédict, Ruqing Zhang, Donald Metzler
Generative information retrieval (IR) has experienced substantial growth across multiple research communities (e. g., information retrieval, computer vision, natural language processing, and machine learning), and has been highly visible in the popular press.
no code implementations • 24 May 2023 • Yubao Tang, Ruqing Zhang, Jiafeng Guo, Jiangui Chen, Zuowei Zhu, Shuaiqiang Wang, Dawei Yin, Xueqi Cheng
Specifically, we assign each document an Elaborative Description based on the query generation technique, which is more meaningful than a string of integers in the original DSI; and (2) For the associations between a document and its identifier, we take inspiration from Rehearsal Strategies in human learning.
1 code implementation • 28 Apr 2023 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng
In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic.
1 code implementation • 28 Apr 2023 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yiqun Liu, Yixing Fan, Xueqi Cheng
Learning task-specific retrievers that return relevant contexts at an appropriate level of semantic granularity, such as a document retriever, passage retriever, sentence retriever, and entity retriever, may help to achieve better performance on the end-to-end task.
1 code implementation • 9 Nov 2022 • Wenxiang Sun, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng
Since each entity often contains rich visual and textual information in KBs, we thus propose three different sub-tasks, i. e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL).
no code implementations • 28 Oct 2022 • Sihao Yu, Fei Sun, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng
However, such a strategy typically leads to a loss in model performance, which poses the challenge that increasing the unlearning efficiency while maintaining acceptable performance.
1 code implementation • 14 Sep 2022 • Chen Wu, Ruqing Zhang, Jiafeng Guo, Wei Chen, Yixing Fan, Maarten de Rijke, Xueqi Cheng
A ranking model is said to be Certified Top-$K$ Robust on a ranked list when it is guaranteed to keep documents that are out of the top $K$ away from the top $K$ under any attack.
no code implementations • 12 Sep 2022 • Yinqiong Cai, Jiafeng Guo, Yixing Fan, Qingyao Ai, Ruqing Zhang, Xueqi Cheng
When sampling top-ranked results (excluding the labeled positives) as negatives from a stronger retriever, the performance of the learned NRM becomes even worse.
no code implementations • 21 Aug 2022 • Xinyu Ma, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
Empirical results show that our method can significantly outperform the state-of-the-art autoencoder-based language models and other pre-trained models for dense retrieval.
no code implementations • 21 Aug 2022 • Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xueqi Cheng
Unlike the promising results in NLP, we find that these methods cannot achieve comparable performance to full fine-tuning at both stages when updating less than 1\% of the original model parameters.
1 code implementation • 16 Aug 2022 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yiqun Liu, Yixing Fan, Xueqi Cheng
We show that a strong generative retrieval model can be learned with a set of adequately designed pre-training tasks, and be adopted to improve a variety of downstream KILT tasks with further fine-tuning.
1 code implementation • 5 May 2022 • Shaojie Jiang, Ruqing Zhang, Svitlana Vakulenko, Maarten de Rijke
The cross-entropy objective has proved to be an all-purpose training objective for autoregressive language models (LMs).
1 code implementation • 22 Apr 2022 • Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xueqi Cheng
% Therefore, in this work, we propose to drop out the decoder and introduce a novel contrastive span prediction task to pre-train the encoder alone.
1 code implementation • 12 Apr 2022 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
This classical approach has clear drawbacks as follows: i) a large document index as well as a complicated search process is required, leading to considerable memory and computational overhead; ii) independent scoring paradigms fail to capture the interactions among documents and sentences in ranking; iii) a fixed number of sentences are selected to form the final evidence set.
no code implementations • 4 Apr 2022 • Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list.
no code implementations • CVPR 2022 • Sihao Yu, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Zizhen Wang, Xueqi Cheng
By reducing the weights of the majority classes, such instances would become more difficult to learn and hurt the overall performance consequently.
no code implementations • 27 Nov 2021 • Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to the user's information need.
2 code implementations • 11 Aug 2021 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
A possible solution to this dilemma is a new approach known as federated learning, which is a privacy-preserving machine learning technique over distributed datasets.
no code implementations • 11 Aug 2021 • Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
So we raise the question in this work: Are neural ranking models robust?
no code implementations • 18 Jul 2021 • Yinqiong Cai, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Yanyan Lan, Xueqi Cheng
However, these methods often lose the discriminative power as term-based methods, thus introduce noise during retrieval and hurt the recall performance.
1 code implementation • 20 Apr 2021 • Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yingyan Li, Xueqi Cheng
The basic idea of PROP is to construct the \textit{representative words prediction} (ROP) task for pre-training inspired by the query likelihood model.
1 code implementation • 8 Mar 2021 • Jiafeng Guo, Yinqiong Cai, Yixing Fan, Fei Sun, Ruqing Zhang, Xueqi Cheng
We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development.
no code implementations • 1 Mar 2021 • Yixing Fan, Jiafeng Guo, Xinyu Ma, Ruqing Zhang, Yanyan Lan, Xueqi Cheng
We employ 16 linguistic tasks to probe a unified retrieval model over these three retrieval tasks to answer this question.
no code implementations • 25 Feb 2021 • Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xueqi Cheng
One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search.
1 code implementation • 20 Oct 2020 • Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, Xueqi Cheng
Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR).
no code implementations • 25 Aug 2020 • Lixin Su, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yanyan Lan, Xue-Qi Cheng
To tackle such a challenge, in this work, we introduce the \textit{Continual Domain Adaptation} (CDA) task for MRC.
1 code implementation • 25 Aug 2020 • Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng
To address this new task, we propose a novel Contrastive Generation model, namely CtrsGen for short, to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query.
no code implementations • 21 Jun 2020 • Zizhen Wang, Yixing Fan, Jiafeng Guo, Liu Yang, Ruqing Zhang, Yanyan Lan, Xue-Qi Cheng, Hui Jiang, Xiaozhao Wang
However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i. e., there could be different ways to ask a same question or different questions sharing similar expressions.
no code implementations • 24 May 2019 • Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng
To generate a sound outline, an ideal OG model should be able to capture three levels of coherence, namely the coherence between context paragraphs, that between a section and its heading, and that between context headings.
no code implementations • ACL 2018 • Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, Xue-Qi Cheng
In conversation, a general response (e. g., {``}I don{'}t know{''}) could correspond to a large variety of input utterances.
no code implementations • 18 Jul 2017 • Ruqing Zhang, Jiafeng Guo, Yanyan Lan, Jun Xu, Xue-Qi Cheng
Representing texts as fixed-length vectors is central to many language processing tasks.