no code implementations • 27 Sep 2024 • Nikunj Saunshi, Stefani Karp, Shankar Krishnan, Sobhan Miryoosefi, Sashank J. Reddi, Sanjiv Kumar
These findings of training efficiency and inductive bias towards reasoning are verified at 1B, 2B and 8B parameter language models.
3 code implementations • 12 Jun 2023 • George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson
In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark.
no code implementations • 29 Jul 2022 • Jeremy M. Cohen, Behrooz Ghorbani, Shankar Krishnan, Naman Agarwal, Sourabh Medapati, Michal Badura, Daniel Suo, David Cardoze, Zachary Nado, George E. Dahl, Justin Gilmer
Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning.
no code implementations • ICLR 2021 • Chulhee Yun, Shankar Krishnan, Hossein Mobahi
For $L$-layer linear tensor networks that are orthogonally decomposable, we show that gradient flow on separable classification finds a stationary point of the $\ell_{2/L}$ max-margin problem in a "transformed" input space defined by the network.
no code implementations • 16 Mar 2020 • Piotr Zielinski, Shankar Krishnan, Satrajit Chatterjee
The key insight of CGH is that, since the overall gradient for a single step of SGD is the sum of the per-example gradients, it is strongest in directions that reduce the loss on multiple examples if such directions exist.
16 code implementations • CVPR 2020 • Saurabh Singh, Shankar Krishnan
Our method outperforms BN and other alternatives in a variety of settings for all batch sizes.
Ranked #753 on Image Classification on ImageNet
no code implementations • 28 May 2019 • Behrooz Ghorbani, Ying Xiao, Shankar Krishnan
It is well-known that deeper neural networks are harder to train than shallower ones.
1 code implementation • 29 Jan 2019 • Behrooz Ghorbani, Shankar Krishnan, Ying Xiao
To understand the dynamics of optimization in deep neural networks, we develop a tool to study the evolution of the entire Hessian spectrum throughout the optimization process.
no code implementations • ICLR 2018 • Shankar Krishnan, Ying Xiao, Rif A. Saurous
We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (Inception-V3, Resnet-50, Resnet-101 and Inception-Resnet-V2) with mini-batch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps.
1 code implementation • INFORMS 2006 • Tamraparni Dasu, Shankar Krishnan, Suresh Venkatasubramanian, Ke Yi
In this paper, we take a general, information-theoretic approach to the change detection problem, which works for multidimensional as well as categorical data.