no code implementations • insights (ACL) 2022 • Simeng Sun, Brian Dillon, Mohit Iyyer
Recent progress in large pretrained language models (LMs) has led to a growth of analyses examining what kinds of linguistic knowledge are encoded by these models.
no code implementations • EMNLP 2021 • Simeng Sun, Wenlong Zhao, Varun Manjunatha, Rajiv Jain, Vlad Morariu, Franck Dernoncourt, Balaji Vasan Srinivasan, Mohit Iyyer
While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored.
no code implementations • 11 Apr 2025 • Krishna C. Puvvada, Faisal Ladhak, Santiago Akle Serrano, Cheng-Ping Hsieh, Shantanu Acharya, Somshubra Majumdar, Fei Jia, Samuel Kriman, Simeng Sun, Dima Rekesh, Boris Ginsburg
We present a decoder-only Transformer architecture that robustly generalizes to sequence lengths substantially longer than those seen during training.
1 code implementation • 28 Mar 2025 • Simeng Sun, Cheng-Ping Hsieh, Faisal Ladhak, Erik Arakelyan, Santiago Akle Serano, Boris Ginsburg
Complex reasoning tasks often rely on the ability to consistently and accurately apply simple rules across incremental steps, a foundational capability which we term "level-0" reasoning.
no code implementations • 16 Oct 2024 • Simeng Sun, Cheng-Ping Hsieh
We analyze contextual representations in neural autoregressive language models, emphasizing long-range contexts that span several thousand tokens.
no code implementations • 1 Oct 2024 • Ilya Loshchilov, Cheng-Ping Hsieh, Simeng Sun, Boris Ginsburg
We propose a novel neural network architecture, the normalized Transformer (nGPT) with representation learning on the hypersphere.
1 code implementation • 27 Jun 2024 • Chau Minh Pham, Simeng Sun, Mohit Iyyer
Existing research on instruction following largely focuses on tasks with simple instructions and short responses.
6 code implementations • 9 Apr 2024 • Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, Boris Ginsburg
Despite achieving nearly perfect accuracy in the vanilla NIAH test, almost all models exhibit large performance drops as the context length increases.
1 code implementation • 2 Nov 2023 • Chau Minh Pham, Alexander Hoyle, Simeng Sun, Philip Resnik, Mohit Iyyer
Topic modeling is a well-established technique for exploring text corpora.
1 code implementation • 16 Sep 2023 • Simeng Sun, Dhawal Gupta, Mohit Iyyer
During the last stage of RLHF, a large language model is aligned to human intents via PPO training, a process that generally requires large-scale computational resources.
1 code implementation • 23 May 2023 • Simeng Sun, Yang Liu, Shuohang Wang, Chenguang Zhu, Mohit Iyyer
PEARL outperforms zero-shot and chain-of-thought prompting on this dataset, and ablation experiments show that each stage of PEARL is critical to its performance.
no code implementations • 22 Feb 2023 • Simeng Sun, Yang Liu, Dan Iter, Chenguang Zhu, Mohit Iyyer
This motivates the use of parameter-efficient adaptation methods such as prompt tuning (PT), which adds a small number of tunable embeddings to an otherwise frozen model, and in-context learning (ICL), in which demonstrations of the task are provided to the model in natural language without any additional training.
no code implementations • 7 Feb 2023 • Simeng Sun, Maha Elbayad, Anna Sun, James Cross
With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages.
no code implementations • 5 Jul 2022 • Ruoyu Feng, Xin Jin, Zongyu Guo, Runsen Feng, Yixin Gao, Tianyu He, Zhizheng Zhang, Simeng Sun, Zhibo Chen
Learning a kind of feature that is both general (for AI tasks) and compact (for compression) is pivotal for its success.
2 code implementations • NAACL 2022 • Simeng Sun, Katherine Thai, Mohit Iyyer
While numerous architectures for long-range language models (LRLMs) have recently been proposed, a meaningful evaluation of their discourse-level language understanding capabilities has not yet followed.
no code implementations • 25 Jan 2022 • Xin Jin, Ruoyu Feng, Simeng Sun, Runsen Feng, Tianyu He, Zhibo Chen
Traditional media coding schemes typically encode image/video into a semantic-unknown binary stream, which fails to directly support downstream intelligent tasks at the bitstream level.
no code implementations • ACL 2022 • Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzman
Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible.
no code implementations • EMNLP 2021 • Simeng Sun, Kalpesh Krishna, Andrew Mattarella-Micke, Mohit Iyyer
Language models are generally trained on short, truncated input sequences, which limits their ability to use discourse-level information present in long-range context to improve their predictions.
1 code implementation • 14 Apr 2021 • Simeng Sun, Wenlong Zhao, Varun Manjunatha, Rajiv Jain, Vlad Morariu, Franck Dernoncourt, Balaji Vasan Srinivasan, Mohit Iyyer
While large-scale pretrained language models have significantly improved writing assistance functionalities such as autocomplete, more complex and controllable writing assistants have yet to be explored.
1 code implementation • NAACL 2021 • Simeng Sun, Mohit Iyyer
Recent progress in language modeling has been driven not only by advances in neural architectures, but also through hardware and optimization improvements.
Ranked #59 on
Language Modelling
on WikiText-103
no code implementations • 11 Dec 2020 • Xin Li, Xin Jin, Tao Yu, Yingxue Pang, Simeng Sun, Zhizheng Zhang, Zhibo Chen
Traditional single image super-resolution (SISR) methods that focus on solving single and uniform degradation (i. e., bicubic down-sampling), typically suffer from poor performance when applied into real-world low-resolution (LR) images due to the complicated realistic degradations.
1 code implementation • ACL 2021 • Sumanta Bhattacharyya, Amirmohammad Rooshenas, Subhajit Naskar, Simeng Sun, Mohit Iyyer, Andrew McCallum
To benefit from this observation, we train an energy-based model to mimic the behavior of the task measure (i. e., the energy-based model assigns lower energy to samples with higher BLEU score), which is resulted in a re-ranking algorithm based on the samples drawn from NMT: energy-based re-ranking (EBR).
no code implementations • 16 May 2020 • Xin Li, Simeng Sun, Zhizheng Zhang, Zhibo Chen
Versatile Video Coding (H. 266/VVC) standard achieves better image quality when keeping the same bits than any other conventional image codec, such as BPG, JPEG, and etc.
1 code implementation • ACL 2020 • Weiqiu You, Simeng Sun, Mohit Iyyer
Recent work has questioned the importance of the Transformer's multi-headed attention for achieving high translation quality.
no code implementations • IJCNLP 2019 • Simeng Sun, Ani Nenkova
ROUGE is widely used to automatically evaluate summarization systems.
no code implementations • WS 2019 • Simeng Sun, Ori Shapira, Ido Dagan, Ani Nenkova
We show that plain ROUGE F1 scores are not ideal for comparing current neural systems which on average produce different lengths.