1 code implementation • 20 Jun 2024 • Johannes Treutlein, Dami Choi, Jan Betley, Samuel Marks, Cem Anil, Roger Grosse, Owain Evans
As a step towards answering this question, we study inductive out-of-context reasoning (OOCR), a type of generalization in which LLMs infer latent information from evidence distributed across training documents and apply it to downstream tasks without in-context learning.
1 code implementation • 21 May 2024 • James Requeima, John Bronskill, Dami Choi, Richard E. Turner, David Duvenaud
Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses.
no code implementations • NeurIPS 2023 • Dami Choi, Derrick Xin, Hamid Dadkhahi, Justin Gilmer, Ankush Garg, Orhan Firat, Chih-Kuan Yeh, Andrew M. Dai, Behrooz Ghorbani
In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance.
no code implementations • NeurIPS Workshop ICBINB 2020 • Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp Hennig
Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters.
1 code implementation • NeurIPS 2020 • Max B. Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, Chris J. Maddison
The Gumbel-Max trick is the basis of many relaxed gradient estimators.
2 code implementations • 11 Oct 2019 • Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl
In particular, we find that the popular adaptive gradient methods never underperform momentum or gradient descent.
1 code implementation • 12 Jul 2019 • Dami Choi, Alexandre Passos, Christopher J. Shallue, George E. Dahl
In the twilight of Moore's law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training.
no code implementations • ICLR 2019 • Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein
This arises when an approximate gradient is easier to compute than the full gradient (e. g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e. g. in certain reinforcement learning applications or training networks with discrete variables).
1 code implementation • ICLR 2019 • Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein
We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search.
7 code implementations • ICLR 2018 • Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud
Gradient-based optimization is the foundation of deep learning and reinforcement learning.