Search Results for author: Xiaofei Ma

Found 26 papers, 5 papers with code

BASS: Batched Attention-optimized Speculative Sampling

no code implementations24 Apr 2024 Haifeng Qian, Sujan Kumar Gonugondla, Sungsoo Ha, Mingyue Shang, Sanjay Krishna Gouda, Ramesh Nallapati, Sudipta Sengupta, Xiaofei Ma, Anoop Deoras

Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models.

CodeFort: Robust Training for Code Generation Models

no code implementations11 Apr 2024 Yuhao Zhang, Shiqi Wang, Haifeng Qian, Zijian Wang, Mingyue Shang, Linbo Liu, Sanjay Krishna Gouda, Baishakhi Ray, Murali Krishna Ramanathan, Xiaofei Ma, Anoop Deoras

Code generation models are not robust to small perturbations, which often lead to inconsistent and incorrect generations and significantly degrade the performance of these models.

Code Generation Contrastive Learning +1

Repoformer: Selective Retrieval for Repository-Level Code Completion

no code implementations15 Mar 2024 Di wu, Wasi Uddin Ahmad, Dejiao Zhang, Murali Krishna Ramanathan, Xiaofei Ma

Recent advances in retrieval-augmented generation (RAG) have initiated a new era in repository-level code completion.

Code Completion Retrieval +1

Code Representation Learning At Scale

no code implementations2 Feb 2024 Dejiao Zhang, Wasi Ahmad, Ming Tan, Hantian Ding, Ramesh Nallapati, Dan Roth, Xiaofei Ma, Bing Xiang

Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i. e., code generation.

Code Generation Contrastive Learning +3

Code-Aware Prompting: A study of Coverage Guided Test Generation in Regression Setting using LLM

no code implementations31 Jan 2024 Gabriel Ryan, Siddhartha Jain, Mingyue Shang, Shiqi Wang, Xiaofei Ma, Murali Krishna Ramanathan, Baishakhi Ray

Recent works using large language models (LLMs) for test generation have focused on improving generation quality through optimizing the test generation context and correcting errors in model outputs, but use fixed prompting strategies that prompt the model to generate tests without additional guidance.

Lightweight reranking for language model generations

no code implementations11 Jul 2023 Siddhartha Jain, Xiaofei Ma, Anoop Deoras, Bing Xiang

We show strong improvements for selecting the best k generations for code generation tasks as well as robust improvements for the best generation for the tasks of autoformalization, summarization, and translation.

Code Generation Language Modelling

SWING: Balancing Coverage and Faithfulness for Dialogue Summarization

1 code implementation25 Jan 2023 Kung-Hsiang Huang, Siffi Singh, Xiaofei Ma, Wei Xiao, Feng Nan, Nicholas Dingwall, William Yang Wang, Kathleen McKeown

Missing information is a common issue of dialogue summarization where some information in the reference summaries is not covered in the generated summaries.

Natural Language Inference

Learning Dialogue Representations from Consecutive Utterances

1 code implementation NAACL 2022 Zhihan Zhou, Dejiao Zhang, Wei Xiao, Nicholas Dingwall, Xiaofei Ma, Andrew O. Arnold, Bing Xiang

In this paper, we introduce Dialogue Sentence Embedding (DSE), a self-supervised contrastive learning method that learns effective dialogue representations suitable for a wide range of dialogue tasks.

Contrastive Learning Conversational Question Answering +14

Debiasing Neural Retrieval via In-batch Balancing Regularization

no code implementations NAACL (GeBNLP) 2022 Yuantong Li, Xiaokai Wei, Zijian Wang, Shen Wang, Parminder Bhatia, Xiaofei Ma, Andrew Arnold

People frequently interact with information retrieval (IR) systems, however, IR models exhibit biases and discrimination towards various demographics.

Fairness Passage Retrieval +1

Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification

no code implementations10 Dec 2021 Mingwen Dong, Christos Christodoulopoulos, Sheng-Min Shih, Xiaofei Ma

A BERT-based retrieval model made more mistakes in retrieving refuting evidence for false claims than supporting evidence for true claims.

Data Augmentation Fact Checking +2

Virtual Augmentation Supported Contrastive Learning of Sentence Representations

1 code implementation Findings (ACL) 2022 Dejiao Zhang, Wei Xiao, Henghui Zhu, Xiaofei Ma, Andrew O. Arnold

We then define an instance discrimination task regarding this neighborhood and generate the virtual augmentation in an adversarial training manner.

Contrastive Learning Data Augmentation +2

Contrastive Fine-tuning Improves Robustness for Neural Rankers

no code implementations Findings (ACL) 2021 Xiaofei Ma, Cicero Nogueira dos santos, Andrew O. Arnold

The performance of state-of-the-art neural rankers can deteriorate substantially when exposed to noisy inputs or applied to a new domain.

Data Augmentation Passage Ranking

Uncertainty-Based Adaptive Learning for Reading Comprehension

no code implementations1 Jan 2021 Jing Wang, Jie Shen, Xiaofei Ma, Andrew Arnold

Recent years have witnessed a surge of successful applications of machine reading comprehension.

Machine Reading Comprehension

Beyond [CLS] through Ranking by Generation

no code implementations EMNLP 2020 Cicero Nogueira dos santos, Xiaofei Ma, Ramesh Nallapati, Zhiheng Huang, Bing Xiang

Generative models for Information Retrieval, where ranking of documents is viewed as the task of generating a query from a document's language model, were very successful in various IR tasks in the past.

Answer Selection Information Retrieval +4

Domain Adaptation with BERT-based Domain Classification and Data Selection

no code implementations WS 2019 Xiaofei Ma, Peng Xu, Zhiguo Wang, Ramesh Nallapati, Bing Xiang

The performance of deep neural models can deteriorate substantially when there is a domain shift between training and test data.

Classification Domain Adaptation +2

Universal Text Representation from BERT: An Empirical Study

no code implementations17 Oct 2019 Xiaofei Ma, Zhiguo Wang, Patrick Ng, Ramesh Nallapati, Bing Xiang

We present a systematic investigation of layer-wise BERT activations for general-purpose text representations to understand what linguistic information they capture and how transferable they are across different tasks.

Learning-To-Rank Natural Language Inference +4

Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering

no code implementations IJCNLP 2019 Zhiguo Wang, Patrick Ng, Xiaofei Ma, Ramesh Nallapati, Bing Xiang

To tackle this issue, we propose a multi-passage BERT model to globally normalize answer scores across all passages of the same question, and this change enables our QA model find better answers by utilizing more passages.

Open-Domain Question Answering

Passage Ranking with Weak Supervision

no code implementations ICLR Workshop LLD 2019 Peng Xu, Xiaofei Ma, Ramesh Nallapati, Bing Xiang

In this paper, we propose a \textit{weak supervision} framework for neural ranking tasks based on the data programming paradigm \citep{Ratner2016}, which enables us to leverage multiple weak supervision signals from different sources.

Passage Ranking

Hierarchical Clustering with Prior Knowledge

no code implementations9 Jun 2018 Xiaofei Ma, Satya Dhavala

Being greedy in the algorithmic sense, a hierarchical clustering partitions data at every step solely based on a similarity / dissimilarity measure.

Constrained Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.