Search Results for author: Dami Choi

Found 8 papers, 5 papers with code

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

no code implementations • NeurIPS 2023 • Dami Choi, Derrick Xin, Hamid Dadkhahi, Justin Gilmer, Ankush Garg, Orhan Firat, Chih-Kuan Yeh, Andrew M. Dai, Behrooz Ghorbani

In this paper, we empirically study the optimization dynamics of multi-task learning, particularly focusing on those that govern a collection of tasks with significant data imbalance.

Language Modelling Machine Translation +3

Paper
Add Code

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

no code implementations • NeurIPS Workshop ICBINB 2020 • Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp Hennig

Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters.

Stochastic Optimization

Paper
Add Code

Gradient Estimation with Stochastic Softmax Tricks

1 code implementation • NeurIPS 2020 • Max B. Paulus, Dami Choi, Daniel Tarlow, Andreas Krause, Chris J. Maddison

The Gumbel-Max trick is the basis of many relaxed gradient estimators.

Paper
Code

On Empirical Comparisons of Optimizers for Deep Learning

1 code implementation • 11 Oct 2019 • Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl

In particular, we find that the popular adaptive gradient methods never underperform momentum or gradient descent.

Benchmarking

Paper
Code

Faster Neural Network Training with Data Echoing

1 code implementation • 12 Jul 2019 • Dami Choi, Alexandre Passos, Christopher J. Shallue, George E. Dahl

In the twilight of Moore's law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training.

Paper
Code

Guided Evolutionary Strategies: Escaping the curse of dimensionality in random search

no code implementations • ICLR 2019 • Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

This arises when an approximate gradient is easier to compute than the full gradient (e. g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e. g. in certain reinforcement learning applications or training networks with discrete variables).

Meta-Learning