Search Results for author: Wayne Xiong

Found 10 papers, 5 papers with code

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning

1 code implementation25 Oct 2024 Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao

Key-Value (KV) caching is a common technique to enhance the computational efficiency of Large Language Models (LLMs), but its memory overhead grows rapidly with input length.

Computational Efficiency Question Answering +1

Integrative Decoding: Improve Factuality via Implicit Self-consistency

1 code implementation2 Oct 2024 Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models.

TruthfulQA

Developing a Reliable, General-Purpose Hallucination Detection and Mitigation Service: Insights and Lessons Learned

no code implementations22 Jul 2024 Song Wang, Xun Wang, Jie Mei, Yujia Xie, Sean Muarray, Zhang Li, Lingfeng Wu, Si-Qing Chen, Wayne Xiong

Hallucination, a phenomenon where large language models (LLMs) produce output that is factually incorrect or unrelated to the input, is a major challenge for LLM applications that require accuracy and dependability.

Hallucination named-entity-recognition +3

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

2 code implementations4 Jun 2024 Zefan Cai, Yichi Zhang, Bofei Gao, Yuliang Liu, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, Wen Xiao

Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage.

Interactive Editing for Text Summarization

1 code implementation5 Jun 2023 Yujia Xie, Xun Wang, Si-Qing Chen, Wayne Xiong, Pengcheng He

Summarizing lengthy documents is a common and essential task in our daily lives.

Decoder Text Summarization

Momentum Calibration for Text Generation

no code implementations8 Dec 2022 Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei

The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.

Abstractive Text Summarization Text Generation

Session-level Language Modeling for Conversational Speech

no code implementations EMNLP 2018 Wayne Xiong, Lingfeng Wu, Jun Zhang, Andreas Stolcke

We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence.

Language Modeling Language Modelling +2

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

no code implementations21 Jul 2017 Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong

We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.