1 code implementation • 7 Feb 2024 • Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini
Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matrix-vector products for every sequence in the batch.
1 code implementation • 31 Jan 2023 • Jordan Juravsky, Yunrong Guo, Sanja Fidler, Xue Bin Peng
In this work, we present PADL, which leverages recent innovations in NLP in order to take steps towards developing language-directed controllers for physics-based character animation.
no code implementations • 23 Nov 2022 • Bradley C. A. Brown, Jordan Juravsky, Anthony L. Caterini, Gabriel Loaiza-Ganem
Given a pair of models with similar training set performance, it is natural to assume that the model that possesses simpler internal representations would exhibit better generalization.