no code implementations • 20 Aug 2024 • Yihang Wang, Xu Huang, Bowen Tian, Yixing Fan, Jiafeng Guo
Generative LLM have achieved significant success in various industrial tasks and can effectively adapt to vertical domains and downstream tasks through ICL.
1 code implementation • 1 Aug 2024 • Wenshan Wang, Yihang Wang, Yixing Fan, Huaming Liao, Jiafeng Guo
Specifically, we take a trigger token to calculate the attention distribution of the context in response to the question.
no code implementations • 16 Jul 2024 • Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query.
1 code implementation • 9 Jul 2024 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Recent advances in neural information retrieval (IR) models have significantly enhanced their effectiveness over various IR tasks.
no code implementations • 2 Apr 2024 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
However, limiting perturbations to a single level of granularity may reduce the flexibility of adversarial examples, thereby diminishing the potential threat of the attack.
1 code implementation • 28 Mar 2024 • Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Retrieval-augmented generation (RAG) is considered to be a promising approach to alleviate the hallucination issue of large language models (LLMs), and it has received widespread attention from researchers recently.
1 code implementation • 26 Feb 2024 • Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yixing Fan, Xueqi Cheng
Very recently, a pre-trained generative retrieval model for KILTs, named CorpusBrain, was proposed and reached new state-of-the-art retrieval performance.
1 code implementation • 16 Dec 2023 • Run-Ze Fan, Yixing Fan, Jiangui Chen, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng
Automatic mainstream hashtag recommendation aims to accurately provide users with concise and popular topical hashtags before publication.
no code implementations • 6 Nov 2023 • Yinqiong Cai, Yixing Fan, Keping Bi, Jiafeng Guo, Wei Chen, Ruqing Zhang, Xueqi Cheng
The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently.
1 code implementation • 18 Oct 2023 • Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.
no code implementations • 29 Aug 2023 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng
We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product Quantization, which updates a partial quantization codebook according to two adaptive thresholds; and (ii) To memorize new documents for querying without forgetting previous knowledge, we propose a memory-augmented learning mechanism, to form meaningful connections between old and new documents.
1 code implementation • 22 Aug 2023 • Yinqiong Cai, Keping Bi, Yixing Fan, Jiafeng Guo, Wei Chen, Xueqi Cheng
First-stage retrieval is a critical task that aims to retrieve relevant document candidates from a large-scale collection.
no code implementations • 19 Aug 2023 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng
The AREA task is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model in response to a query.
1 code implementation • 28 Apr 2023 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yiqun Liu, Yixing Fan, Xueqi Cheng
Learning task-specific retrievers that return relevant contexts at an appropriate level of semantic granularity, such as a document retriever, passage retriever, sentence retriever, and entity retriever, may help to achieve better performance on the end-to-end task.
1 code implementation • 28 Apr 2023 • Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng
In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic.
1 code implementation • 9 Nov 2022 • Wenxiang Sun, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng
Since each entity often contains rich visual and textual information in KBs, we thus propose three different sub-tasks, i. e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL).
1 code implementation • 14 Sep 2022 • Chen Wu, Ruqing Zhang, Jiafeng Guo, Wei Chen, Yixing Fan, Maarten de Rijke, Xueqi Cheng
A ranking model is said to be Certified Top-$K$ Robust on a ranked list when it is guaranteed to keep documents that are out of the top $K$ away from the top $K$ under any attack.
no code implementations • 12 Sep 2022 • Yinqiong Cai, Jiafeng Guo, Yixing Fan, Qingyao Ai, Ruqing Zhang, Xueqi Cheng
When sampling top-ranked results (excluding the labeled positives) as negatives from a stronger retriever, the performance of the learned NRM becomes even worse.
no code implementations • 21 Aug 2022 • Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xueqi Cheng
Unlike the promising results in NLP, we find that these methods cannot achieve comparable performance to full fine-tuning at both stages when updating less than 1\% of the original model parameters.
no code implementations • 21 Aug 2022 • Xinyu Ma, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
Empirical results show that our method can significantly outperform the state-of-the-art autoencoder-based language models and other pre-trained models for dense retrieval.
1 code implementation • 16 Aug 2022 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yiqun Liu, Yixing Fan, Xueqi Cheng
We show that a strong generative retrieval model can be learned with a set of adequately designed pre-training tasks, and be adopted to improve a variety of downstream KILT tasks with further fine-tuning.
1 code implementation • 22 Apr 2022 • Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xueqi Cheng
% Therefore, in this work, we propose to drop out the decoder and introduce a novel contrastive span prediction task to pre-train the encoder alone.
1 code implementation • 12 Apr 2022 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
This classical approach has clear drawbacks as follows: i) a large document index as well as a complicated search process is required, leading to considerable memory and computational overhead; ii) independent scoring paradigms fail to capture the interactions among documents and sentences in ranking; iii) a fixed number of sentences are selected to form the final evidence set.
no code implementations • 4 Apr 2022 • Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng
We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list.
no code implementations • CVPR 2022 • Sihao Yu, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Zizhen Wang, Xueqi Cheng
By reducing the weights of the majority classes, such instances would become more difficult to learn and hurt the overall performance consequently.
no code implementations • 27 Nov 2021 • Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to the user's information need.
no code implementations • 11 Aug 2021 • Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
So we raise the question in this work: Are neural ranking models robust?
2 code implementations • 11 Aug 2021 • Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng
A possible solution to this dilemma is a new approach known as federated learning, which is a privacy-preserving machine learning technique over distributed datasets.
no code implementations • 18 Jul 2021 • Yinqiong Cai, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Yanyan Lan, Xueqi Cheng
However, these methods often lose the discriminative power as term-based methods, thus introduce noise during retrieval and hurt the recall performance.
1 code implementation • 20 Apr 2021 • Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yingyan Li, Xueqi Cheng
The basic idea of PROP is to construct the \textit{representative words prediction} (ROP) task for pre-training inspired by the query likelihood model.
1 code implementation • 8 Mar 2021 • Jiafeng Guo, Yinqiong Cai, Yixing Fan, Fei Sun, Ruqing Zhang, Xueqi Cheng
We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development.
no code implementations • 1 Mar 2021 • Yixing Fan, Jiafeng Guo, Xinyu Ma, Ruqing Zhang, Yanyan Lan, Xueqi Cheng
We employ 16 linguistic tasks to probe a unified retrieval model over these three retrieval tasks to answer this question.
no code implementations • 25 Feb 2021 • Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xueqi Cheng
One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search.
1 code implementation • 20 Oct 2020 • Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, Xueqi Cheng
Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR).
no code implementations • 25 Aug 2020 • Lixin Su, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yanyan Lan, Xue-Qi Cheng
To tackle such a challenge, in this work, we introduce the \textit{Continual Domain Adaptation} (CDA) task for MRC.
1 code implementation • 25 Aug 2020 • Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng
To address this new task, we propose a novel Contrastive Generation model, namely CtrsGen for short, to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query.
no code implementations • 21 Jun 2020 • Zizhen Wang, Yixing Fan, Jiafeng Guo, Liu Yang, Ruqing Zhang, Yanyan Lan, Xue-Qi Cheng, Hui Jiang, Xiaozhao Wang
However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i. e., there could be different ways to ask a same question or different questions sharing similar expressions.
no code implementations • 24 May 2019 • Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng
To generate a sound outline, an ideal OG model should be able to capture three levels of coherence, namely the coherence between context paragraphs, that between a section and its heading, and that between context headings.
no code implementations • 24 May 2019 • Lixin Su, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng
Web question answering (QA) has become an indispensable component in modern search systems, which can significantly improve users' search experience by providing a direct answer to users' information need.
1 code implementation • 24 May 2019 • Jiafeng Guo, Yixing Fan, Xiang Ji, Xue-Qi Cheng
Text matching is the core problem in many natural language processing (NLP) tasks, such as information retrieval, question answering, and conversation.
no code implementations • 16 Mar 2019 • Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W. Bruce Croft, Xue-Qi Cheng
Ranking models lie at the heart of research on information retrieval (IR).
no code implementations • ACL 2018 • Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Jun Xu, Xue-Qi Cheng
In conversation, a general response (e. g., {``}I don{'}t know{''}) could correspond to a large variety of input utterances.
2 code implementations • SIGIR '18 2018 • Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, ChengXiang Zhai, Xue-Qi Cheng
The local matching layer focuses on producing a set of local relevance signals by modeling the semantic matching between a query and each passage of a document.
3 code implementations • 23 Nov 2017 • Jiafeng Guo, Yixing Fan, Qingyao Ai, W. Bruce Croft
Specifically, our model employs a joint deep architecture at the query term level for relevance matching.
Ranked #14 on Ad-Hoc Information Retrieval on TREC Robust04
1 code implementation • 23 Jul 2017 • Yixing Fan, Liang Pang, Jianpeng Hou, Jiafeng Guo, Yanyan Lan, Xue-Qi Cheng
In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods.