no code implementations • 26 Nov 2024 • Armin W. Thomas, Rom Parnichkun, Alexander Amini, Stefano Massaroli, Michael Poli
In this work, we propose a new approach for the synthesis of tailored architectures (STAR).
1 code implementation • 10 May 2024 • Rom N. Parnichkun, Stefano Massaroli, Alessandro Moro, Jimmy T. H. Smith, Ramin Hasani, Mathias Lechner, Qi An, Christopher Ré, Hajime Asama, Stefano Ermon, Taiji Suzuki, Atsushi Yamashita, Michael Poli
We approach designing a state-space model for deep learning applications through its dual representation, the transfer function, and uncover a highly efficient sequence parallel inference algorithm that is state-free: unlike other proposed algorithms, state-free inference does not incur any significant memory or computational cost with an increase in state size.
1 code implementation • 26 Mar 2024 • Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli
The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation.
3 code implementations • 8 Dec 2023 • Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré
To close the gap between synthetics and real language, we develop a new formalization of the task called multi-query associative recall (MQAR) that better reflects actual language.
1 code implementation • NeurIPS 2023 • Chuanbo Hua, Federico Berto, Michael Poli, Stefano Massaroli, Jinkyoo Park
While complex simulations of physical systems have been widely used in engineering and scientific computing, lowering their often prohibitive computational requirements has only recently been tackled by deep learning approaches.
1 code implementation • NeurIPS 2023 • Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré
We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension?
4 code implementations • NeurIPS 2023 • Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, Chris Ré
Leveraging Hyena's new long-range capabilities, we present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level - an up to 500x increase over previous dense attention-based models.
no code implementations • 29 Mar 2023 • Michael Poli, Stefano Massaroli, Stefano Ermon, Bryan Wilder, Eric Horvitz
We present a methodology for formulating simplifying abstractions in machine learning systems by identifying and harnessing the utility structure of decisions.
1 code implementation • 16 Mar 2023 • Michael Zhang, Khaled K. Saab, Michael Poli, Tri Dao, Karan Goel, Christopher Ré
For expressivity, we propose a new SSM parameterization based on the companion matrix -- a canonical representation for discrete-time processes -- which enables SpaceTime's SSM layers to learn desirable autoregressive processes.
6 code implementations • 21 Feb 2023 • Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré
Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale.
Ranked #37 on Language Modelling on WikiText-103
1 code implementation • 24 Dec 2022 • Linqi Zhou, Michael Poli, Winnie Xu, Stefano Massaroli, Stefano Ermon
Methods based on ordinary differential equations (ODEs) are widely used to build generative models of time-series.
1 code implementation • 26 Nov 2022 • Michael Poli, Stefano Massaroli, Federico Berto, Jinykoo Park, Tri Dao, Christopher Ré, Stefano Ermon
Instead, this work introduces a blueprint for frequency domain learning through a single transform: transform once (T1).
no code implementations • 15 Apr 2022 • Michael Poli, Winnie Xu, Stefano Massaroli, Chenlin Meng, Kuno Kim, Stefano Ermon
We investigate how to leverage the representations produced by Neural Collages in various tasks, including data compression and generation.
2 code implementations • 1 Apr 2022 • Tri Dao, Beidi Chen, Nimit Sohoni, Arjun Desai, Michael Poli, Jessica Grogan, Alexander Liu, Aniruddh Rao, Atri Rudra, Christopher Ré
To address these issues, we propose a class of matrices (Monarch) that is hardware-efficient (they are parameterized as products of two block-diagonal matrices for better hardware utilization) and expressive (they can represent many commonly used transforms).
1 code implementation • NeurIPS Workshop DLDE 2021 • Federico Berto, Stefano Massaroli, Michael Poli, Jinkyoo Park
Synthesizing optimal controllers for dynamical systems often involves solving optimization problems with hard real-time constraints.
no code implementations • ICLR 2022 • Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Michael Poli, Sangdoo Yun
This phenomenon, also known as shortcut learning, is emerging as a key limitation of the current generation of machine learning models.
no code implementations • 22 Jun 2021 • Michael Poli, Stefano Massaroli, Clayton M. Rabideau, Junyoung Park, Atsushi Yamashita, Hajime Asama, Jinkyoo Park
We introduce the framework of continuous-depth graph neural networks (GNNs).
no code implementations • NeurIPS 2021 • Michael Poli, Stefano Massaroli, Luca Scimeca, Seong Joon Oh, Sanghyuk Chun, Atsushi Yamashita, Hajime Asama, Jinkyoo Park, Animesh Garg
Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes.
no code implementations • NeurIPS 2021 • Stefano Massaroli, Michael Poli, Sho Sonoda, Taji Suzuki, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
We detail a novel class of implicit neural models.
no code implementations • 7 Jun 2021 • Stefano Massaroli, Michael Poli, Stefano Peluchetti, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
We systematically develop a learning-based treatment of stochastic optimal control (SOC), relying on direct optimization of parametric control policies.
no code implementations • 14 Jan 2021 • Stefano Massaroli, Michael Poli, Federico Califano, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
We introduce optimal energy shaping as an enhancement of classical passivity-based control methods.
1 code implementation • 16 Oct 2020 • Daehoon Gwak, Gyuhyeon Sim, Michael Poli, Stefano Massaroli, Jaegul Choo, Edward Choi
By interpreting the forward dynamics of the latent representation of neural networks as an ordinary differential equation, Neural Ordinary Differential Equation (Neural ODE) emerged as an effective framework for modeling a system dynamics in the continuous time domain.
no code implementations • 20 Sep 2020 • Michael Poli, Stefano Massaroli, Atsushi Yamashita, Hajime Asama, Jinkyoo Park
Continuous-depth learning has recently emerged as a novel perspective on deep learning, improving performance in tasks related to dynamical systems and density estimation.
1 code implementation • NeurIPS 2020 • Michael Poli, Stefano Massaroli, Atsushi Yamashita, Hajime Asama, Jinkyoo Park
The infinite-depth paradigm pioneered by Neural ODEs has launched a renaissance in the search for novel dynamical system-inspired deep learning primitives; however, their utilization in problems of non-trivial size has often proved impossible due to poor computational scalability.
no code implementations • 18 Mar 2020 • Stefano Massaroli, Michael Poli, Michelangelo Bin, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
We introduce a provably stable variant of neural ordinary differential equations (neural ODEs) whose trajectories evolve on an energy functional parametrised by a neural network.
no code implementations • ICLR Workshop DeepDiffEq 2019 • Michael Poli, Stefano Massaroli, Atsushi Yamashita, Hajime Asama, Jinkyoo Park
In this paper we present a general framework for continuous--time gradient descent, often referred to as gradient flow.
no code implementations • ICLR Workshop DeepDiffEq 2019 • Stefano Massaroli, Michael Poli, Sanzhar Bakhtiyarov, Atsushi Yamashita, Hajime Asama, Jinkyoo Park
Action spaces equipped with parameter sets are a common occurrence in reinforcement learning applications.
Hierarchical Reinforcement Learning reinforcement-learning +2
1 code implementation • NeurIPS 2020 • Stefano Massaroli, Michael Poli, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
Continuous deep learning architectures have recently re-emerged as Neural Ordinary Differential Equations (Neural ODEs).
1 code implementation • 18 Nov 2019 • Michael Poli, Stefano Massaroli, Junyoung Park, Atsushi Yamashita, Hajime Asama, Jinkyoo Park
We introduce the framework of continuous--depth graph neural networks (GNNs).
2 code implementations • 24 Sep 2019 • Michael Poli, Jinkyoo Park, Ilija Ilievski
Finance is a particularly challenging application area for deep learning models due to low noise-to-signal ratio, non-stationarity, and partial observability.
2 code implementations • 6 Sep 2019 • Stefano Massaroli, Michael Poli, Federico Califano, Angela Faragasso, Jinkyoo Park, Atsushi Yamashita, Hajime Asama
Neural networks are discrete entities: subdivided into discrete layers and parametrized by weights which are iteratively optimized via difference equations.