1 code implementation • EMNLP (ACL) 2021 • Raymond Li, Wen Xiao, Lanjun Wang, Hyeju Jang, Giuseppe Carenini
Transformers are the dominant architecture in NLP, but their training and fine-tuning is still very challenging.
no code implementations • NAACL (DeeLIO) 2021 • Hyeju Jang, Seojin Bang, Wen Xiao, Giuseppe Carenini, Raymond Ng, Young ji Lee
Text classification has wide-ranging applications in various domains.
1 code implementation • 18 Feb 2025 • Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bo Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, Anima Anandkumar
Extending the context length has disproportionately shifted the memory footprint of LLMs during inference to the key-value cache (KV cache).
no code implementations • 6 Jan 2025 • Zihao Wen, Hang Shan, Hao Wang, Yu Cao, Liang He, Wenjing Ren, Chengjie Yin, Qingchuan Chou, Chaochao Lv, Haojie Su, Tao Tang, Qinghua Cai, Leyi Ni, Wen Xiao, Xiaolin Zhang, Kuanyi Li, Te Cao, Ming-Chih Chiu, Vincent H. Resh, Pablo Urrutia-Cordero
However, we know little about how local environmental conditions can influence these biodiversity drivers, and consequently how they indirectly shape the ecological stability of ecosystems.
1 code implementation • 16 Dec 2024 • Liang Chen, Zekun Wang, Shuhuai Ren, Lei LI, Haozhe Zhao, Yunshui Li, Zefan Cai, Hongcheng Guo, Lei Zhang, Yizhe Xiong, Yichi Zhang, Ruoyu Wu, Qingxiu Dong, Ge Zhang, Jian Yang, Lingwei Meng, Shujie Hu, Yulong Chen, Junyang Lin, Shuai Bai, Andreas Vlachos, Xu Tan, Minjia Zhang, Wen Xiao, Aaron Yee, Tianyu Liu, Baobao Chang
As Large Language Models (LLMs) have advanced to unify understanding and generation tasks within the textual modality, recent research has shown that tasks from different modalities can also be effectively encapsulated within the NTP framework, transforming the multimodal information into tokens and predict the next one given the context.
1 code implementation • 25 Oct 2024 • Yu Fu, Zefan Cai, Abedelkadir Asi, Wayne Xiong, Yue Dong, Wen Xiao
Key-Value (KV) caching is a common technique to enhance the computational efficiency of Large Language Models (LLMs), but its memory overhead grows rapidly with input length.
1 code implementation • 2 Oct 2024 • Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong
Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models.
1 code implementation • 4 Sep 2024 • Bofei Gao, Feifan Song, Yibo Miao, Zefan Cai, Zhe Yang, Liang Chen, Helan Hu, Runxin Xu, Qingxiu Dong, Ce Zheng, Shanghaoran Quan, Wen Xiao, Ge Zhang, Daoguang Zan, Keming Lu, Bowen Yu, Dayiheng Liu, Zeyu Cui, Jian Yang, Lei Sha, Houfeng Wang, Zhifang Sui, Peiyi Wang, Tianyu Liu, Baobao Chang
Finally, based on our unified perspective, we explore the challenges and future research directions for aligning large language models with human preferences.
1 code implementation • 20 Jun 2024 • Bofei Gao, Zefan Cai, Runxin Xu, Peiyi Wang, Ce Zheng, Runji Lin, Keming Lu, Dayiheng Liu, Chang Zhou, Wen Xiao, Junjie Hu, Tianyu Liu, Baobao Chang
In recent progress, mathematical verifiers have achieved success in mathematical reasoning tasks by validating the correctness of solutions generated by policy models.
2 code implementations • 4 Jun 2024 • Zefan Cai, Yichi Zhang, Bofei Gao, Yuliang Liu, Tianyu Liu, Keming Lu, Wayne Xiong, Yue Dong, Baobao Chang, Junjie Hu, Wen Xiao
Our experimental evaluations, utilizing the LongBench benchmark, show that PyramidKV matches the performance of models with a full KV cache while retaining only 12% of the KV cache, thus significantly reducing memory usage.
1 code implementation • 24 May 2024 • Yu Fu, Wen Xiao, Jia Chen, Jiachen Li, Evangelos Papalexakis, Aichi Chien, Yue Dong
Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation.
1 code implementation • 12 Dec 2023 • Yu Fu, Yufei Li, Wen Xiao, Cong Liu, Yue Dong
Recent developments in balancing the usefulness and safety of Large Language Models (LLMs) have raised a critical question: Are mainstream NLP tasks adequately aligned with safety consideration?
no code implementations • 21 Nov 2023 • Raymond Li, Ruixin Yang, Wen Xiao, Ahmed Aburaed, Gabriel Murray, Giuseppe Carenini
While transformer-based models have achieved state-of-the-art results in a variety of classification and generation tasks, their black-box nature makes them challenging for interpretability.
1 code implementation • 4 May 2023 • Wen Xiao, Yujia Xie, Giuseppe Carenini, Pengcheng He
The inference-only large language model (ChatGPT) serves as both the generator and editor, with a smaller model acting as the instructor to guide output generation.
no code implementations • 12 Feb 2023 • Chuyuan Li, Patrick Huber, Wen Xiao, Maxime Amblard, Chloé Braud, Giuseppe Carenini
As a result, we explore approaches to build discourse structures for dialogues, based on attention matrices from Pre-trained Language Models (PLMs).
1 code implementation • 21 Dec 2022 • Wen Xiao, Lesly Miculicich, Yang Liu, Pengcheng He, Giuseppe Carenini
Content-Controllable Summarization generates summaries focused on the given controlling signals.
1 code implementation • 7 Sep 2022 • Wen Xiao, Giuseppe Carenini
Despite the success of recent abstractive summarizers on automatic evaluation metrics, the generated summaries still present factual inconsistencies with the source document.
no code implementations • 12 Jan 2022 • Qingyong Hu, Bo Yang, Sheikh Khalid, Wen Xiao, Niki Trigoni, Andrew Markham
Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset.
1 code implementation • 10 Dec 2021 • Raymond Li, Wen Xiao, Linzi Xing, Lanjun Wang, Gabriel Murray, Giuseppe Carenini
The multi-head self-attention mechanism of the transformer model has been thoroughly investigated recently.
3 code implementations • ACL 2022 • Wen Xiao, Iz Beltagy, Giuseppe Carenini, Arman Cohan
We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.
Ranked #1 on
Multi-Document Summarization
on Multi-News
1 code implementation • 31 Aug 2021 • Raymond Li, Wen Xiao, Lanjun Wang, Hyeju Jang, Giuseppe Carenini
Transformers are the dominant architecture in NLP, but their training and fine-tuning is still very challenging.
no code implementations • ACL 2021 • Patrick Huber, Wen Xiao, Giuseppe Carenini
Aiming for a better integration of data-driven and linguistically-inspired approaches, we explore whether RST Nuclearity, assigning a binary assessment of importance between text segments, can be replaced by automatically generated, real-valued scores, in what we call a Weighted-RST framework.
no code implementations • ACL 2021 • Linzi Xing, Wen Xiao, Giuseppe Carenini
In news articles the lead bias is a common phenomenon that usually dominates the learning signals for neural extractive summarizers, severely limiting their performance on data with different or even no bias.
1 code implementation • NAACL 2021 • Wen Xiao, Patrick Huber, Giuseppe Carenini
Previous work indicates that discourse information benefits summarization.
no code implementations • EMNLP (CODI) 2020 • Wen Xiao, Patrick Huber, Giuseppe Carenini
The multi-head self-attention of popular transformer models is widely used within Natural Language Processing (NLP), including for the task of extractive summarization.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Wen Xiao, Giuseppe Carenini
Our analysis of large summarization datasets indicates that redundancy is a very serious problem when summarizing long documents.
Ranked #16 on
Text Summarization
on Pubmed
2 code implementations • CVPR 2021 • Qingyong Hu, Bo Yang, Sheikh Khalid, Wen Xiao, Niki Trigoni, Andrew Markham
An essential prerequisite for unleashing the potential of supervised deep learning algorithms in the area of 3D scene understanding is the availability of large-scale and richly annotated datasets.
1 code implementation • IJCNLP 2019 • Wen Xiao, Giuseppe Carenini
In this paper, we propose a novel neural single document extractive summarization model for long documents, incorporating both the global context of the whole document and the local context within the current topic.
Ranked #19 on
Text Summarization
on Arxiv HEP-TH citation graph