Search Results for author: Xueguang Ma

Found 24 papers, 12 papers with code

An Encoder Attribution Analysis for Dense Passage Retriever in Open-Domain Question Answering

no code implementations NAACL (TrustNLP) 2022 Minghan Li, Xueguang Ma, Jimmy Lin

The bi-encoder design of dense passage retriever (DPR) is a key factor to its success in open-domain question answering (QA), yet it is unclear how DPR’s question encoder and passage encoder individually contributes to overall performance, which we refer to as the encoder attribution problem.

Open-Domain Question Answering Retrieval

Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval

no code implementations EMNLP 2021 Xueguang Ma, Minghan Li, Kai Sun, Ji Xin, Jimmy Lin

Recent work has shown that dense passage retrieval techniques achieve better ranking accuracy in open-domain question answering compared to sparse retrieval techniques such as BM25, but at the cost of large space and memory requirements.

Open-Domain Question Answering Passage Retrieval +2

Fine-Tuning LLaMA for Multi-Stage Text Retrieval

no code implementations12 Oct 2023 Xueguang Ma, Liang Wang, Nan Yang, Furu Wei, Jimmy Lin

Our findings demonstrate that the effectiveness of large language models indeed surpasses that of smaller models.

Passage Retrieval Retrieval +1

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models

1 code implementation11 Oct 2023 Raphael Tang, Xinyu Zhang, Xueguang Ma, Jimmy Lin, Ferhan Ture

Large language models (LLMs) exhibit positional bias in how they use context, which especially complicates listwise ranking.

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering

no code implementations5 Sep 2023 YuBo Wang, Xueguang Ma, Wenhu Chen

Large-scale language models (LLMs), such as ChatGPT, are capable of generating human-like responses for various downstream tasks, such as task-oriented dialogues and question answering.

Question Answering Retrieval

TheoremQA: A Theorem-driven Question Answering dataset

2 code implementations21 May 2023 Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, Tony Xia

We evaluate a wide spectrum of 16 large language and code models with different prompting strategies like Chain-of-Thoughts and Program-of-Thoughts.

Math Question Answering

Zero-Shot Listwise Document Reranking with a Large Language Model

no code implementations3 May 2023 Xueguang Ma, Xinyu Zhang, Ronak Pradeep, Jimmy Lin

Supervised ranking methods based on bi-encoder or cross-encoder architectures have shown success in multi-stage text ranking tasks, but they require large amounts of relevance judgments as training data.

Language Modelling Large Language Model +1

Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes

no code implementations24 Apr 2023 Xueguang Ma, Tommaso Teofili, Jimmy Lin

With Pyserini, which provides a Python interface to Anserini, users gain access to both sparse and dense retrieval models, as Pyserini implements bindings to the Faiss vector search library alongside Lucene inverted indexes in a uniform, consistent interface.

Information Retrieval Retrieval

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes

1 code implementation13 Feb 2023 Minghan Li, Sheng-Chieh Lin, Xueguang Ma, Jimmy Lin

Multi-vector retrieval methods have demonstrated their effectiveness on various retrieval datasets, and among them, ColBERT is the most established method based on the late interaction of contextualized token embeddings of pre-trained language models.

Information Retrieval Retrieval

Precise Zero-Shot Dense Retrieval without Relevance Labels

1 code implementation20 Dec 2022 Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan

Given a query, HyDE first zero-shot instructs an instruction-following language model (e. g. InstructGPT) to generate a hypothetical document.

Fact Verification Instruction Following +3

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

3 code implementations22 Nov 2022 Wenhu Chen, Xueguang Ma, Xinyi Wang, William W. Cohen

By combining PoT with self-consistency decoding, we can achieve SoTA performance on all math problem datasets and near-SoTA performance on financial datasets.


To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers

no code implementations30 Apr 2022 Hang Li, Shuai Wang, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF).

Information Retrieval Language Modelling +1

Towards Best Practices for Training Multilingual Dense Retrieval Models

no code implementations5 Apr 2022 Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, Jimmy Lin

Dense retrieval models using a transformer-based bi-encoder design have emerged as an active area of research.

Cross-Lingual Transfer Retrieval

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

1 code implementation11 Mar 2022 Luyu Gao, Xueguang Ma, Jimmy Lin, Jamie Callan

In this paper, we present Tevatron, a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity.


Sparsifying Sparse Representations for Passage Retrieval by Top-$k$ Masking

no code implementations17 Dec 2021 Jheng-Hong Yang, Xueguang Ma, Jimmy Lin

Sparse lexical representation learning has demonstrated much progress in improving passage retrieval effectiveness in recent models such as DeepImpact, uniCOIL, and SPLADE.

Passage Retrieval Representation Learning +2

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study

1 code implementation13 Dec 2021 Hang Li, Shengyao Zhuang, Ahmed Mourad, Xueguang Ma, Jimmy Lin, Guido Zuccon

Finally, we contribute a study of the generalisability of the ANCE-PRF method when dense retrievers other than ANCE are used for the first round of retrieval and for encoding the PRF signal.


Personalized multi-faceted trust modeling to determine trust links in social media and its potential for misinformation management

no code implementations11 Nov 2021 Alexandre Parmentier, Robin Cohen, Xueguang Ma, Gaurav Sahu, Queenie Chen

In this paper, we present an approach for predicting trust links between peers in social media, one that is grounded in the artificial intelligence area of multiagent trust modeling.

Management Misinformation +1

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

1 code implementation EMNLP (MRL) 2021 Xinyu Zhang, Xueguang Ma, Peng Shi, Jimmy Lin

We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages, designed to evaluate ranking with learned dense representations.

Representation Learning Retrieval

A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques

no code implementations28 Jun 2021 Jimmy Lin, Xueguang Ma

Recent developments in representational learning for information retrieval can be organized in a conceptual framework that establishes two pairs of contrasts: sparse vs. dense representations and unsupervised vs. learned representations.

Information Retrieval Passage Ranking +1

A Replication Study of Dense Passage Retriever

1 code implementation12 Apr 2021 Xueguang Ma, Kai Sun, Ronak Pradeep, Jimmy Lin

Text retrieval using learned dense representations has recently emerged as a promising alternative to "traditional" text retrieval using sparse bag-of-words representations.

Open-Domain Question Answering Retrieval +1

Scientific Claim Verification with VERT5ERINI

no code implementations EACL (Louhi) 2021 Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin

This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain.

Claim Verification Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.