Search Results for author: Sneha Kudugunta

Found 9 papers, 1 papers with code

A Loss Curvature Perspective on Training Instability in Deep Learning

no code implementations8 Oct 2021 Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Dahl, Zachary Nado, Orhan Firat

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics.

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference

no code implementations24 Sep 2021 Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Minh-Thang Luong, Orhan Firat

On WMT, our task-MoE with 32 experts (533M parameters) outperforms the best performing token-level MoE model (token-MoE) by +1. 0 BLEU on average across 30 language pairs.

Exploring Routing Strategies for Multilingual Mixture-of-Experts Models

no code implementations1 Jan 2021 Sneha Kudugunta, Yanping Huang, Ankur Bapna, Maxim Krikun, Dmitry Lepikhin, Thang Luong, Orhan Firat

Sparsely-Gated Mixture-of-Experts (MoE) has been a successful approach for scaling multilingual translation models to billions of parameters without a proportional increase in training computation.

DANTE: Deep AlterNations for Training nEural networks

no code implementations1 Feb 2019 Vaibhav B Sinha, Sneha Kudugunta, Adepu Ravi Sankar, Surya Teja Chavali, Purushottam Kar, Vineeth N. Balasubramanian

We present DANTE, a novel method for training neural networks using the alternating minimization principle.

Deep Neural Networks for Bot Detection

2 code implementations12 Feb 2018 Sneha Kudugunta, Emilio Ferrara

In this paper, we propose a deep neural network based on contextual long short-term memory (LSTM) architecture that exploits both content and metadata to detect bots at the tweet level: contextual features are extracted from user metadata and fed as auxiliary input to LSTM deep nets processing the tweet text.

General Classification Sentiment Analysis

Training Autoencoders by Alternating Minimization

no code implementations ICLR 2018 Sneha Kudugunta, Adepu Shankar, Surya Chavali, Vineeth Balasubramanian, Purushottam Kar

We present DANTE, a novel method for training neural networks, in particular autoencoders, using the alternating minimization principle.

Cannot find the paper you are looking for? You can Submit a new open access paper.