no code implementations • 1 Nov 2023 • Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky
This suggests that a flexible positional embedding design and attention alignment can go a long way toward Transformer length extrapolation.
1 code implementation • 14 Sep 2023 • Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky
In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language and long-range modeling, while offering rapid parallel training and constant inference cost.
1 code implementation • SIGDIAL (ACL) 2022 • Ta-Chung Chi, Alexander I. Rudnicky
In addition, unlike in previous work, we do not rely on hand-crafted features; this improves the model's robustness.
Ranked #1 on
Discourse Parsing
on STAC
no code implementations • 23 May 2023 • Ta-Chung Chi, Ting-Han Fan, Li-Wei Chen, Alexander I. Rudnicky, Peter J. Ramadge
The use of positional embeddings in transformer language models is widely accepted.
no code implementations • 5 May 2023 • Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge
Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages.
no code implementations • 20 Dec 2022 • Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge
Length extrapolation permits training a transformer language model on short sequences that preserves perplexities when tested on substantially longer sequences.
1 code implementation • 15 Jun 2022 • Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky, Peter J. Ramadge
While deep generative models have succeeded in image processing, natural language processing, and reinforcement learning, training that involves discrete random variables remains challenging due to the high variance of its gradient estimation process.
2 code implementations • 20 May 2022 • Ta-Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky
Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation.
1 code implementation • EMNLP 2021 • Ta-Chung Chi, Alexander I. Rudnicky
In this paper, we are the first to propose a~\textbf{zero-shot} dialogue disentanglement solution.
no code implementations • 4 Dec 2018 • George Larionov, Zachary Kaden, Hima Varsha Dureddy, Gabriel Bayomi T. Kalejaiye, Mihir Kale, Srividya Pranavi Potharaju, Ankit Parag Shah, Alexander I. Rudnicky
Tartan is a non-goal-oriented socialbot focused around providing users with an engaging and fluent casual conversation.
no code implementations • 1 Mar 2017 • Zhou Yu, Alan W. black, Alexander I. Rudnicky
These systems work well when users have clear and explicit intentions that are well-aligned to the systems' capabilities.