Search Results for author: Albert Gu

Found 34 papers, 29 papers with code

On the Benefits of Memory for Modeling Time-Dependent PDEs

no code implementations3 Sep 2024 Ricardo Buitrago Ruiz, Tanya Marwah, Albert Gu, Andrej Risteski

Data-driven techniques have emerged as a promising alternative to traditional numerical methods for solving partial differential equations (PDEs).

Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

1 code implementation19 Aug 2024 Aviv Bick, Kevin Y. Li, Eric P. Xing, J. Zico Kolter, Albert Gu

In this work, we present a method that is able to distill a pretrained Transformer architecture into alternative architectures such as state space models (SSMs).

Language Modeling Language Modelling +2

Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers

1 code implementation13 Jul 2024 Sukjun Hwang, Aakash Lahoti, Tri Dao, Albert Gu

We identify a key axis of matrix parameterizations termed sequence alignment, which increases the flexibility and performance of matrix mixers, providing insights into the strong performance of Transformers and recent SSMs such as Mamba.

Mamba State Space Models

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

2 code implementations31 May 2024 Tri Dao, Albert Gu

While Transformers have been the main architecture behind deep learning's success in language modeling, state-space models (SSMs) such as Mamba have recently been shown to match or outperform Transformers at small to medium scale.

Language Modeling Language Modelling +2

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

4 code implementations5 Mar 2024 Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov

Large-scale sequence modeling has sparked rapid advances that now extend into biology and genomics.


Mamba: Linear-Time Sequence Modeling with Selective State Spaces

28 code implementations1 Dec 2023 Albert Gu, Tri Dao

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module.

2D Pose Estimation Common Sense Reasoning +6

Augmenting conformers with structured state-space sequence models for online speech recognition

no code implementations15 Sep 2023 Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath

Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems.

speech-recognition Speech Recognition

Structured State Space Models for In-Context Reinforcement Learning

2 code implementations NeurIPS 2023 Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani

We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks.

continuous-control Continuous Control +4

Modelling Long Range Dependencies in $N$D: From Task-Specific to a General Purpose CNN

1 code implementation25 Jan 2023 David M. Knigge, David W. Romero, Albert Gu, Efstratios Gavves, Erik J. Bekkers, Jakub M. Tomczak, Mark Hoogendoorn, Jan-Jakob Sonke

Performant Convolutional Neural Network (CNN) architectures must be tailored to specific tasks in order to consider the length, resolution, and dimensionality of the input data.

Pretraining Without Attention

1 code implementation20 Dec 2022 Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M. Rush

Even so, BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation.

State Space Models

S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces

1 code implementation12 Oct 2022 Eric Nguyen, Karan Goel, Albert Gu, Gordon W. Downs, Preey Shah, Tri Dao, Stephen A. Baccus, Christopher Ré

On ImageNet-1k, S4ND exceeds the performance of a Vision Transformer baseline by $1. 5\%$ when training with a $1$D sequence of patches, and matches ConvNeXt when modeling images in $2$D.

Inductive Bias State Space Models +1

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

1 code implementation24 Jun 2022 Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré

Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4).

Long-range modeling State Space Models

On the Parameterization and Initialization of Diagonal State Space Models

2 code implementations23 Jun 2022 Albert Gu, Ankit Gupta, Karan Goel, Christopher Ré

On the other hand, a recent variant of S4 called DSS showed that restricting the state matrix to be fully diagonal can still preserve the performance of the original model when using a specific initialization based on approximating S4's matrix.

Long-range modeling State Space Models +1

Towards a General Purpose CNN for Long Range Dependencies in $N$D

1 code implementation7 Jun 2022 David W. Romero, David M. Knigge, Albert Gu, Erik J. Bekkers, Efstratios Gavves, Jakub M. Tomczak, Mark Hoogendoorn

The use of Convolutional Neural Networks (CNNs) is widespread in Deep Learning due to a range of desirable model properties which result in an efficient and effective machine learning framework.

Diagonal State Spaces are as Effective as Structured State Spaces

2 code implementations27 Mar 2022 Ankit Gupta, Albert Gu, Jonathan Berant

Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video.

Long-range modeling

It's Raw! Audio Generation with State-Space Models

6 code implementations20 Feb 2022 Karan Goel, Albert Gu, Chris Donahue, Christopher Ré

SaShiMi yields state-of-the-art performance for unconditional waveform generation in the autoregressive setting.

Audio Generation Density Estimation +2

Efficiently Modeling Long Sequences with Structured State Spaces

7 code implementations ICLR 2022 Albert Gu, Karan Goel, Christopher Ré

A central goal of sequence modeling is designing a single principled model that can address sequence data across a range of modalities and tasks, particularly on long-range dependencies.

Data Augmentation Language Modeling +4

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

2 code implementations NeurIPS 2021 Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency.

Computational Efficiency Memorization +3

HoroPCA: Hyperbolic Dimensionality Reduction via Horospherical Projections

1 code implementation7 Jun 2021 Ines Chami, Albert Gu, Dat Nguyen, Christopher Ré

Given directions, PCA relies on: (1) a parameterization of subspaces spanned by these directions, (2) a method of projection onto subspaces that preserves information in these directions, and (3) an objective to optimize, namely the variance explained by projections.

Dimensionality Reduction

Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers

no code implementations NeurIPS 2021 Albert Gu, Isys Johnson, Karan Goel, Khaled Kamal Saab, Tri Dao, Atri Rudra, Christopher Re

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency.

Computational Efficiency Memorization +3

Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps

2 code implementations ICLR 2020 Tri Dao, Nimit S. Sohoni, Albert Gu, Matthew Eichhorn, Amit Blonder, Megan Leszczynski, Atri Rudra, Christopher Ré

Modern neural network architectures use structured linear transformations, such as low-rank matrices, sparse matrices, permutations, and the Fourier transform, to improve inference speed and reduce memory usage compared to general linear maps.

Image Classification speech-recognition +1

No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems

1 code implementation NeurIPS 2020 Nimit S. Sohoni, Jared A. Dunnmon, Geoffrey Angus, Albert Gu, Christopher Ré

As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses.

Clustering General Classification +1

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

2 code implementations NeurIPS 2020 Ines Chami, Albert Gu, Vaggos Chatziafratis, Christopher Ré

Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree.


Model Patching: Closing the Subgroup Performance Gap with Data Augmentation

1 code implementation ICLR 2021 Karan Goel, Albert Gu, Yixuan Li, Christopher Ré

Particularly concerning are models with inconsistent performance on specific subgroups of a class, e. g., exhibiting disparities in skin cancer classification in the presence or absence of a spurious bandage.

Cancer Classification Data Augmentation +1

Learning Mixed-Curvature Representations in Product Spaces

no code implementations ICLR 2019 Albert Gu, Frederic Sala, Beliz Gunel, Christopher Ré

The quality of the representations achieved by embeddings is determined by how well the geometry of the embedding space matches the structure of the data.

Riemannian optimization Word Embeddings

Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations

1 code implementation14 Mar 2019 Tri Dao, Albert Gu, Matthew Eichhorn, Atri Rudra, Christopher Ré

Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions.

BIG-bench Machine Learning

Learning Compressed Transforms with Low Displacement Rank

1 code implementation NeurIPS 2018 Anna T. Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Ré

The low displacement rank (LDR) framework for structured matrices represents a matrix through two displacement operators and a low-rank residual.

Image Classification Language Modeling +1

Representation Tradeoffs for Hyperbolic Embeddings

3 code implementations ICML 2018 Christopher De Sa, Albert Gu, Christopher Ré, Frederic Sala

Given a tree, we give a combinatorial construction that embeds the tree in hyperbolic space with arbitrarily low distortion without using optimization.

A Kernel Theory of Modern Data Augmentation

no code implementations16 Mar 2018 Tri Dao, Albert Gu, Alexander J. Ratner, Virginia Smith, Christopher De Sa, Christopher Ré

Data augmentation, a technique in which a training set is expanded with class-preserving transformations, is ubiquitous in modern machine learning pipelines.

BIG-bench Machine Learning Data Augmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.