Search Results for author: Wen Xiao

Found 28 papers, 20 papers with code

T3-Vis: visual analytic for Training and fine-Tuning Transformers in NLP

1 code implementation EMNLP (ACL) 2021 Raymond Li, Wen Xiao, Lanjun Wang, Hyeju Jang, Giuseppe Carenini

Transformers are the dominant architecture in NLP, but their training and fine-tuning is still very challenging.

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

1 code implementation18 Feb 2025 Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bo Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, Anima Anandkumar

Extending the context length has disproportionately shifted the memory footprint of LLMs during inference to the key-value cache (KV cache).

Computational Efficiency

Environmental Factors Can Have Opposite Biodiversity Influences on the Community Temporal Stability In Aquatic Ecosystems

no code implementations6 Jan 2025 Zihao Wen, Hang Shan, Hao Wang, Yu Cao, Liang He, Wenjing Ren, Chengjie Yin, Qingchuan Chou, Chaochao Lv, Haojie Su, Tao Tang, Qinghua Cai, Leyi Ni, Wen Xiao, Xiaolin Zhang, Kuanyi Li, Te Cao, Ming-Chih Chiu, Vincent H. Resh, Pablo Urrutia-Cordero

However, we know little about how local environmental conditions can influence these biodiversity drivers, and consequently how they indirectly shape the ecological stability of ecosystems.

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

1 code implementation16 Dec 2024 Liang Chen, Zekun Wang, Shuhuai Ren, Lei LI, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang

As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context.

Language Modeling Language Modelling +2

Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasoning

1 code implementation25 Oct 2024 Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao

Key-Value (KV) caching is a common technique to enhance the computational efficiency of Large Language Models (LLMs), but its memory overhead grows rapidly with input length.

All Computational Efficiency +2

Integrative Decoding: Improve Factuality via Implicit Self-consistency

1 code implementation2 Oct 2024 Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models.

TruthfulQA

LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback

1 code implementation20 Jun 2024 Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang

In recent progress, mathematical verifiers have achieved success in mathematical reasoning tasks by validating the correctness of solutions generated by policy models.

Binary Classification GSM8K +2

PyramidKV: Dynamic KV Cache Compression based on Pyramidal Information Funneling

2 code implementations4 Jun 2024 Zefan Cai, Yichi Zhang, Bofei Gao, Yuliang Liu, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, Wen Xiao

Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage.

Cross-Task Defense: Instruction-Tuning LLMs for Content Safety

1 code implementation24 May 2024 Yu Fu, Wen Xiao, Jia Chen, Jiachen Li, Evangelos Papalexakis, Aichi Chien, Yue Dong

Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation.

Safety Alignment in NLP Tasks: Weakly Aligned Summarization as an In-Context Attack

1 code implementation12 Dec 2023 Yu Fu, Yufei Li, Wen Xiao, Cong Liu, Yue Dong

Recent developments in balancing the usefulness and safety of Large Language Models (LLMs) have raised a critical question: Are mainstream NLP tasks adequately aligned with safety consideration?

Question Answering Safety Alignment

Visual Analytics for Generative Transformer Models

no code implementations21 Nov 2023 Raymond Li, Ruixin Yang, Wen Xiao, Ahmed Aburaed, Gabriel Murray, Giuseppe Carenini

While transformer-based models have achieved state-of-the-art results in a variety of classification and generation tasks, their black-box nature makes them challenging for interpretability.

Decoder

Personalized Abstractive Summarization by Tri-agent Generation Pipeline

1 code implementation4 May 2023 Wen Xiao, Yujia Xie, Giuseppe Carenini, Pengcheng He

The inference-only large language model (ChatGPT) serves as both the generator and editor, with a smaller model acting as the instructor to guide output generation.

Abstractive Text Summarization Language Modeling +2

Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

no code implementations12 Feb 2023 Chuyuan Li, Patrick Huber, Wen Xiao, Maxime Amblard, Chloé Braud, Giuseppe Carenini

As a result, we explore approaches to build discourse structures for dialogues, based on attention matrices from Pre-trained Language Models (PLMs).

Sentence Sentence Ordering

Attend to the Right Context: A Plug-and-Play Module for Content-Controllable Summarization

1 code implementation21 Dec 2022 Wen Xiao, Lesly Miculicich, Yang Liu, Pengcheng He, Giuseppe Carenini

Content-Controllable Summarization generates summaries focused on the given controlling signals.

Entity-based SpanCopy for Abstractive Summarization to Improve the Factual Consistency

1 code implementation7 Sep 2022 Wen Xiao, Giuseppe Carenini

Despite the success of recent abstractive summarizers on automatic evaluation metrics, the generated summaries still present factual inconsistencies with the source document.

Abstractive Text Summarization

SensatUrban: Learning Semantics from Urban-Scale Photogrammetric Point Clouds

no code implementations12 Jan 2022 Qingyong Hu, Bo Yang, Sheikh Khalid, Wen Xiao, Niki Trigoni, Andrew Markham

Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset.

PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

3 code implementations ACL 2022 Wen Xiao, Iz Beltagy, Giuseppe Carenini, Arman Cohan

We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.

Abstractive Text Summarization Decoder +3

T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP

1 code implementation31 Aug 2021 Raymond Li, Wen Xiao, Lanjun Wang, Hyeju Jang, Giuseppe Carenini

Transformers are the dominant architecture in NLP, but their training and fine-tuning is still very challenging.

W-RST: Towards a Weighted RST-style Discourse Framework

no code implementations ACL 2021 Patrick Huber, Wen Xiao, Giuseppe Carenini

Aiming for a better integration of data-driven and linguistically-inspired approaches, we explore whether RST Nuclearity, assigning a binary assessment of importance between text segments, can be replaced by automatically generated, real-valued scores, in what we call a Weighted-RST framework.

Demoting the Lead Bias in News Summarization via Alternating Adversarial Learning

no code implementations ACL 2021 Linzi Xing, Wen Xiao, Giuseppe Carenini

In news articles the lead bias is a common phenomenon that usually dominates the learning signals for neural extractive summarizers, severely limiting their performance on data with different or even no bias.

News Summarization

Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help !

no code implementations EMNLP (CODI) 2020 Wen Xiao, Patrick Huber, Giuseppe Carenini

The multi-head self-attention of popular transformer models is widely used within Natural Language Processing (NLP), including for the task of extractive summarization.

Extractive Summarization Sentence

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

2 code implementations CVPR 2021 Qingyong Hu, Bo Yang, Sheikh Khalid, Wen Xiao, Niki Trigoni, Andrew Markham

An essential prerequisite for unleashing the potential of supervised deep learning algorithms in the area of 3D scene understanding is the availability of large-scale and richly annotated datasets.

Scene Understanding Semantic Segmentation

Extractive Summarization of Long Documents by Combining Global and Local Context

1 code implementation IJCNLP 2019 Wen Xiao, Giuseppe Carenini

In this paper, we propose a novel neural single document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic.

Extractive Summarization Text Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.