Search Results for author: Yixing Fan

Found 45 papers, 23 papers with code

QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention

no code implementations20 Aug 2024 Yihang Wang, Xu Huang, Bowen Tian, Yixing Fan, Jiafeng Guo

Generative LLM have achieved significant success in various industrial tasks and can effectively adapt to vertical domains and downstream tasks through ICL.

QUITO: Accelerating Long-Context Reasoning through Query-Guided Context Compression

1 code implementation1 Aug 2024 Wenshan Wang, Yihang Wang, Yixing Fan, Huaming Liao, Jiafeng Guo

Specifically, we take a trigger token to calculate the attention distribution of the context in response to the question.

In-Context Learning

Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval

no code implementations16 Jul 2024 Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query.

Memorization Retrieval

Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

1 code implementation9 Jul 2024 Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

Recent advances in neural information retrieval (IR) models have significantly enhanced their effectiveness over various IR tasks.

Information Retrieval Retrieval

Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

no code implementations2 Apr 2024 Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

However, limiting perturbations to a single level of granularity may reduce the flexibility of adversarial examples, thereby diminishing the potential threat of the attack.

Adversarial Attack Decision Making +2

Are Large Language Models Good at Utility Judgments?

1 code implementation28 Mar 2024 Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

Retrieval-augmented generation (RAG) is considered to be a promising approach to alleviate the hallucination issue of large language models (LLMs), and it has received widespread attention from researchers recently.

Answer Generation Benchmarking +5

CorpusBrain++: A Continual Generative Pre-Training Framework for Knowledge-Intensive Language Tasks

1 code implementation26 Feb 2024 Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yixing Fan, Xueqi Cheng

Very recently, a pre-trained generative retrieval model for KILTs, named CorpusBrain, was proposed and reached new state-of-the-art retrieval performance.

Retrieval

RIGHT: Retrieval-augmented Generation for Mainstream Hashtag Recommendation

1 code implementation16 Dec 2023 Run-Ze Fan, Yixing Fan, Jiangui Chen, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng

Automatic mainstream hashtag recommendation aims to accurately provide users with concise and popular topical hashtags before publication.

Retrieval

CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval

no code implementations6 Nov 2023 Yinqiong Cai, Yixing Fan, Keping Bi, Jiafeng Guo, Wei Chen, Ruqing Zhang, Xueqi Cheng

The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently.

Retrieval

From Relevance to Utility: Evidence Retrieval with Feedback for Fact Verification

1 code implementation18 Oct 2023 Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.

Fact Verification Retrieval

Continual Learning for Generative Retrieval over Dynamic Corpora

no code implementations29 Aug 2023 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng

We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product Quantization, which updates a partial quantization codebook according to two adaptive thresholds; and (ii) To memorize new documents for querying without forgetting previous knowledge, we propose a memory-augmented learning mechanism, to form meaningful connections between old and new documents.

Continual Learning Quantization +1

L^2R: Lifelong Learning for First-stage Retrieval with Backward-Compatible Representations

1 code implementation22 Aug 2023 Yinqiong Cai, Keping Bi, Yixing Fan, Jiafeng Guo, Wei Chen, Xueqi Cheng

First-stage retrieval is a critical task that aims to retrieve relevant document candidates from a large-scale collection.

Retrieval

Black-box Adversarial Attacks against Dense Retrieval Models: A Multi-view Contrastive Learning Method

no code implementations19 Aug 2023 Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng

The AREA task is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model in response to a query.

Adversarial Attack Attribute +2

A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning

1 code implementation28 Apr 2023 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yiqun Liu, Yixing Fan, Xueqi Cheng

Learning task-specific retrievers that return relevant contexts at an appropriate level of semantic granularity, such as a document retriever, passage retriever, sentence retriever, and entity retriever, may help to achieve better performance on the end-to-end task.

Retrieval Sentence

Topic-oriented Adversarial Attacks against Black-box Neural Ranking Models

1 code implementation28 Apr 2023 Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, Xueqi Cheng

In this paper, we focus on a more general type of perturbation and introduce the topic-oriented adversarial ranking attack task against NRMs, which aims to find an imperceptible perturbation that can promote a target document in ranking for a group of queries with the same topic.

Information Retrieval Retrieval

Visual Named Entity Linking: A New Dataset and A Baseline

1 code implementation9 Nov 2022 Wenxiang Sun, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Xueqi Cheng

Since each entity often contains rich visual and textual information in KBs, we thus propose three different sub-tasks, i. e., visual to visual entity linking (V2VEL), visual to textual entity linking (V2TEL), and visual to visual-textual entity linking (V2VTEL).

Entity Linking Image Retrieval +3

Certified Robustness to Word Substitution Ranking Attack for Neural Ranking Models

1 code implementation14 Sep 2022 Chen Wu, Ruqing Zhang, Jiafeng Guo, Wei Chen, Yixing Fan, Maarten de Rijke, Xueqi Cheng

A ranking model is said to be Certified Top-$K$ Robust on a ranked list when it is guaranteed to keep documents that are out of the top $K$ away from the top $K$ under any attack.

Information Retrieval Retrieval

Hard Negatives or False Negatives: Correcting Pooling Bias in Training Neural Ranking Models

no code implementations12 Sep 2022 Yinqiong Cai, Jiafeng Guo, Yixing Fan, Qingyao Ai, Ruqing Zhang, Xueqi Cheng

When sampling top-ranked results (excluding the labeled positives) as negatives from a stronger retriever, the performance of the learned NRM becomes even worse.

Information Retrieval Retrieval

Scattered or Connected? An Optimized Parameter-efficient Tuning Approach for Information Retrieval

no code implementations21 Aug 2022 Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xueqi Cheng

Unlike the promising results in NLP, we find that these methods cannot achieve comparable performance to full fine-tuning at both stages when updating less than 1\% of the original model parameters.

Information Retrieval Re-Ranking +1

A Contrastive Pre-training Approach to Learn Discriminative Autoencoder for Dense Retrieval

no code implementations21 Aug 2022 Xinyu Ma, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng

Empirical results show that our method can significantly outperform the state-of-the-art autoencoder-based language models and other pre-trained models for dense retrieval.

Decoder Information Retrieval +1

CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks

1 code implementation16 Aug 2022 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yiqun Liu, Yixing Fan, Xueqi Cheng

We show that a strong generative retrieval model can be learned with a set of adequately designed pre-training tasks, and be adopted to improve a variety of downstream KILT tasks with further fine-tuning.

Retrieval

Pre-train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span Prediction

1 code implementation22 Apr 2022 Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xueqi Cheng

% Therefore, in this work, we propose to drop out the decoder and introduce a novel contrastive span prediction task to pre-train the encoder alone.

Contrastive Learning Decoder +3

GERE: Generative Evidence Retrieval for Fact Verification

1 code implementation12 Apr 2022 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng

This classical approach has clear drawbacks as follows: i) a large document index as well as a complicated search process is required, leading to considerable memory and computational overhead; ii) independent scoring paradigms fail to capture the interactions among documents and sentences in ranking; iii) a fixed number of sentences are selected to form the final evidence set.

Claim Verification Fact Verification +3

PRADA: Practical Black-Box Adversarial Attacks against Neural Ranking Models

no code implementations4 Apr 2022 Chen Wu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

We focus on the decision-based black-box attack setting, where the attackers cannot directly get access to the model information, but can only query the target model to obtain the rank positions of the partial retrieved list.

Document Ranking Information Retrieval +1

A Re-Balancing Strategy for Class-Imbalanced Classification Based on Instance Difficulty

no code implementations CVPR 2022 Sihao Yu, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Zizhen Wang, Xueqi Cheng

By reducing the weights of the majority classes, such instances would become more difficult to learn and hurt the overall performance consequently.

imbalanced classification

Pre-training Methods in Information Retrieval

no code implementations27 Nov 2021 Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to the user's information need.

Information Retrieval Re-Ranking +1

FedMatch: Federated Learning Over Heterogeneous Question Answering Data

2 code implementations11 Aug 2021 Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Xueqi Cheng

A possible solution to this dilemma is a new approach known as federated learning, which is a privacy-preserving machine learning technique over distributed datasets.

Federated Learning Privacy Preserving +1

A Discriminative Semantic Ranker for Question Retrieval

no code implementations18 Jul 2021 Yinqiong Cai, Yixing Fan, Jiafeng Guo, Ruqing Zhang, Yanyan Lan, Xueqi Cheng

However, these methods often lose the discriminative power as term-based methods, thus introduce noise during retrieval and hurt the recall performance.

Question Answering Re-Ranking +1

B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval

1 code implementation20 Apr 2021 Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yingyan Li, Xueqi Cheng

The basic idea of PROP is to construct the \textit{representative words prediction} (ROP) task for pre-training inspired by the query likelihood model.

Information Retrieval Language Modelling +1

Semantic Models for the First-stage Retrieval: A Comprehensive Review

1 code implementation8 Mar 2021 Jiafeng Guo, Yinqiong Cai, Yixing Fan, Fei Sun, Ruqing Zhang, Xueqi Cheng

We believe it is the right time to survey current status, learn from existing methods, and gain some insights for future development.

Re-Ranking Retrieval +1

A Linguistic Study on Relevance Modeling in Information Retrieval

no code implementations1 Mar 2021 Yixing Fan, Jiafeng Guo, Xinyu Ma, Ruqing Zhang, Yanyan Lan, Xueqi Cheng

We employ 16 linguistic tasks to probe a unified retrieval model over these three retrieval tasks to answer this question.

Information Retrieval Natural Language Understanding +2

Learning to Truncate Ranked Lists for Information Retrieval

no code implementations25 Feb 2021 Chen Wu, Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xueqi Cheng

One is the widely adopted metric such as F1 which acts as a balanced objective, and the other is the best F1 under some minimal recall constraint which represents a typical objective in professional search.

Information Retrieval Retrieval

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

1 code implementation20 Oct 2020 Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, Xueqi Cheng

Recently pre-trained language representation models such as BERT have shown great success when fine-tuned on downstream tasks including information retrieval (IR).

Information Retrieval Language Modelling +1

Continual Domain Adaptation for Machine Reading Comprehension

no code implementations25 Aug 2020 Lixin Su, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yanyan Lan, Xue-Qi Cheng

To tackle such a challenge, in this work, we introduce the \textit{Continual Domain Adaptation} (CDA) task for MRC.

Continual Learning Domain Adaptation +2

Query Understanding via Intent Description Generation

1 code implementation25 Aug 2020 Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng

To address this new task, we propose a novel Contrastive Generation model, namely CtrsGen for short, to generate the intent description by contrasting the relevant documents with the irrelevant documents given a query.

Clustering Information Retrieval +1

Match$^2$: A Matching over Matching Model for Similar Question Identification

no code implementations21 Jun 2020 Zizhen Wang, Yixing Fan, Jiafeng Guo, Liu Yang, Ruqing Zhang, Yanyan Lan, Xue-Qi Cheng, Hui Jiang, Xiaozhao Wang

However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i. e., there could be different ways to ask a same question or different questions sharing similar expressions.

Community Question Answering

Outline Generation: Understanding the Inherent Content Structure of Documents

no code implementations24 May 2019 Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng

To generate a sound outline, an ideal OG model should be able to capture three levels of coherence, namely the coherence between context paragraphs, that between a section and its heading, and that between context headings.

Structured Prediction

Controlling Risk of Web Question Answering

no code implementations24 May 2019 Lixin Su, Jiafeng Guo, Yixing Fan, Yanyan Lan, Xue-Qi Cheng

Web question answering (QA) has become an indispensable component in modern search systems, which can significantly improve users' search experience by providing a direct answer to users' information need.

Machine Reading Comprehension Question Answering

MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching

1 code implementation24 May 2019 Jiafeng Guo, Yixing Fan, Xiang Ji, Xue-Qi Cheng

Text matching is the core problem in many natural language processing (NLP) tasks, such as information retrieval, question answering, and conversation.

Information Retrieval Question Answering +2

Modeling Diverse Relevance Patterns in Ad-hoc Retrieval

2 code implementations SIGIR '18 2018 Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, ChengXiang Zhai, Xue-Qi Cheng

The local matching layer focuses on producing a set of local relevance signals by modeling the semantic matching between a query and each passage of a document.

Retrieval

MatchZoo: A Toolkit for Deep Text Matching

1 code implementation23 Jul 2017 Yixing Fan, Liang Pang, Jianpeng Hou, Jiafeng Guo, Yanyan Lan, Xue-Qi Cheng

In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods.

Ad-Hoc Information Retrieval Information Retrieval +3

Cannot find the paper you are looking for? You can Submit a new open access paper.