Search Results for author: Alexander I. Rudnicky

Found 11 papers, 5 papers with code

Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

no code implementations1 Nov 2023 Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky

This suggests that a flexible positional embedding design and attention alignment can go a long way toward Transformer length extrapolation.

Code Completion Language Modelling +2

Advancing Regular Language Reasoning in Linear Recurrent Neural Networks

1 code implementation14 Sep 2023 Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky

In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language and long-range modeling, while offering rapid parallel training and constant inference cost.

Long-range modeling

Structured Dialogue Discourse Parsing

1 code implementation SIGDIAL (ACL) 2022 Ta-Chung Chi, Alexander I. Rudnicky

In addition, unlike in previous work, we do not rely on hand-crafted features; this improves the model's robustness.

Discourse Parsing Multiple-choice

Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation

no code implementations5 May 2023 Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge

Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages.

Dissecting Transformer Length Extrapolation via the Lens of Receptive Field Analysis

no code implementations20 Dec 2022 Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky, Peter J. Ramadge

Length extrapolation permits training a transformer language model on short sequences that preserves perplexities when tested on substantially longer sequences.

Language Modelling

Training Discrete Deep Generative Models via Gapped Straight-Through Estimator

1 code implementation15 Jun 2022 Ting-Han Fan, Ta-Chung Chi, Alexander I. Rudnicky, Peter J. Ramadge

While deep generative models have succeeded in image processing, natural language processing, and reinforcement learning, training that involves discrete random variables remains challenging due to the high variance of its gradient estimation process.

ListOps reinforcement-learning +1

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

2 code implementations20 May 2022 Ta-Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky

Relative positional embeddings (RPE) have received considerable attention since RPEs effectively model the relative distance among tokens and enable length extrapolation.

Language Modelling Position

Learning Conversational Systems that Interleave Task and Non-Task Content

no code implementations1 Mar 2017 Zhou Yu, Alan W. black, Alexander I. Rudnicky

These systems work well when users have clear and explicit intentions that are well-aligned to the systems' capabilities.

Cannot find the paper you are looking for? You can Submit a new open access paper.