1 code implementation • 22 May 2024 • Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Grosse, Eric Xing
Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited.
1 code implementation • 20 May 2024 • Juhan Bae, Wu Lin, Jonathan Lorraine, Roger Grosse
While being computationally efficient compared to unrolling-based approaches, Source is suitable in cases where implicit-differentiation-based approaches struggle, such as in non-converged models and multi-stage training pipelines.
2 code implementations • 5 Feb 2024 • Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner, Alireza Makhzani
Adaptive gradient optimizers like Adam(W) are the default training algorithms for many deep learning architectures, such as transformers.
no code implementations • 7 Dec 2023 • Michael R. Zhang, Nishkrit Desai, Juhan Bae, Jonathan Lorraine, Jimmy Ba
This paper studies using foundational large language models (LLMs) to make decisions during hyperparameter optimization (HPO).
2 code implementations • 7 Aug 2023 • Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman
When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior?
3 code implementations • 12 Jun 2023 • George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson
In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark.
no code implementations • 7 Feb 2023 • Nikita Dhawan, Sicong Huang, Juhan Bae, Roger Grosse
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
no code implementations • 7 Dec 2022 • Juhan Bae, Michael R. Zhang, Michael Ruan, Eric Wang, So Hasegawa, Jimmy Ba, Roger Grosse
Variational autoencoders (VAEs) are powerful tools for learning latent representations of data used in a wide range of applications.
2 code implementations • 12 Sep 2022 • Juhan Bae, Nathan Ng, Alston Lo, Marzyeh Ghassemi, Roger Grosse
Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters.
no code implementations • 28 Feb 2022 • Juhan Bae, Paul Vicol, Jeff Z. HaoChen, Roger Grosse
Using APO to adapt a structured preconditioning matrix generally results in optimization performance competitive with second-order methods.
1 code implementation • 22 Apr 2021 • James Lucas, Juhan Bae, Michael R. Zhang, Stanislav Fort, Richard Zemel, Roger Grosse
Linear interpolation between initial neural network parameters and converged parameters after training with stochastic gradient descent (SGD) typically leads to a monotonic decrease in the training objective.
1 code implementation • NeurIPS 2020 • Juhan Bae, Roger Grosse
Hyperparameter optimization of neural networks can be elegantly formulated as a bilevel optimization problem.
3 code implementations • 30 Nov 2018 • Juhan Bae, Guodong Zhang, Roger Grosse
A recently proposed method, noisy natural gradient, is a surprisingly simple method to fit expressive posteriors by adding weight noise to regular natural gradient updates.
1 code implementation • 1 Oct 2018 • Sebastian Kmiec, Juhan Bae, Ruijian An
We demonstrate our solutions in the "The 2nd YouTube-8M Video Understanding Challenge", by using frame-level video and audio descriptors.