Search Results for author: Shankar Krishnan

Found 10 papers, 4 papers with code

On the Inductive Bias of Stacking Towards Improving Reasoning

no code implementations27 Sep 2024 Nikunj Saunshi, Stefani Karp, Shankar Krishnan, Sobhan Miryoosefi, Sashank J. Reddi, Sanjiv Kumar

These findings of training efficiency and inductive bias towards reasoning are verified at 1B, 2B and 8B parameter language models.

Inductive Bias Math +1

A Unifying View on Implicit Bias in Training Linear Neural Networks

no code implementations ICLR 2021 Chulhee Yun, Shankar Krishnan, Hossein Mobahi

For $L$-layer linear tensor networks that are orthogonally decomposable, we show that gradient flow on separable classification finds a stationary point of the $\ell_{2/L}$ max-margin problem in a "transformed" input space defined by the network.

Tensor Networks

Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale

no code implementations16 Mar 2020 Piotr Zielinski, Shankar Krishnan, Satrajit Chatterjee

The key insight of CGH is that, since the overall gradient for a single step of SGD is the sum of the per-example gradients, it is strongest in directions that reduce the loss on multiple examples if such directions exist.

Memorization

The Effect of Network Depth on the Optimization Landscape

no code implementations28 May 2019 Behrooz Ghorbani, Ying Xiao, Shankar Krishnan

It is well-known that deeper neural networks are harder to train than shallower ones.

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

1 code implementation29 Jan 2019 Behrooz Ghorbani, Shankar Krishnan, Ying Xiao

To understand the dynamics of optimization in deep neural networks, we develop a tool to study the evolution of the entire Hessian spectrum throughout the optimization process.

Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks

no code implementations ICLR 2018 Shankar Krishnan, Ying Xiao, Rif A. Saurous

We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (Inception-V3, Resnet-50, Resnet-101 and Inception-Resnet-V2) with mini-batch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps.

Stochastic Optimization

An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams

1 code implementation INFORMS 2006 Tamraparni Dasu, Shankar Krishnan, Suresh Venkatasubramanian, Ke Yi

In this paper, we take a general, information-theoretic approach to the change detection problem, which works for multidimensional as well as categorical data.

Change Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.