Search Results for author: Devansh Arpit

Found 28 papers, 10 papers with code

Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

1 code implementation21 Oct 2021 Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong

We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stopping.

Domain Generalization Model Selection

Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles

no code implementations19 Oct 2021 Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong

Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains.

Transfer Learning

Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE

no code implementations19 Oct 2021 Devansh Arpit, Aadyot, Bhatnagar, Huan Wang, Caiming Xiong

Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution.

Contrastive Learning Representation Learning

Momentum Contrastive Autoencoder

no code implementations1 Jan 2021 Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong

Quantitatively, we show that our algorithm achieves a new state-of-the-art FID of 54. 36 on CIFAR-10, and performs competitively with existing models on CelebA in terms of FID score.

Contrastive Learning Representation Learning

Neural Bayes: A Generic Parameterization Method for Unsupervised Learning

no code implementations1 Jan 2021 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Representation Learning

The Break-Even Point on Optimization Trajectories of Deep Neural Networks

no code implementations ICLR 2020 Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras

We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD.

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

1 code implementation20 Feb 2020 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Representation Learning

Predicting with High Correlation Features

2 code implementations1 Oct 2019 Devansh Arpit, Caiming Xiong, Richard Socher

In this paper, we consider distribution shift as a shift in the distribution of input features during test time that exhibit low correlation with targets in the training set.

Entropy Penalty: Towards Generalization Beyond the IID Assumption

no code implementations25 Sep 2019 Devansh Arpit, Caiming Xiong, Richard Socher

This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features.

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

2 code implementations NeurIPS 2019 Devansh Arpit, Victor Campos, Yoshua Bengio

Finally, we show that using our initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

A Walk with SGD: How SGD Explores Regions of Deep Network Loss?

no code implementations ICLR 2019 Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio

The non-convex nature of the loss landscape of deep neural networks (DNN) lends them the intuition that over the course of training, stochastic optimization algorithms explore different regions of the loss surface by entering and escaping many local minima due to the noise induced by mini-batches.

Stochastic Optimization

The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

1 code implementation11 Jan 2019 Devansh Arpit, Yoshua Bengio

These results are derived using the PAC analysis framework, and hold true for finitely sized datasets such that the width of the ReLU network only needs to be larger than a certain finite lower bound.

h-detach: Modifying the LSTM Gradient Towards Better Optimization

1 code implementation ICLR 2019 Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio

This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps.

On the Spectral Bias of Neural Networks

2 code implementations ICLR 2019 Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.

A Walk with SGD

no code implementations24 Feb 2018 Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio

Based on this and other metrics, we deduce that for most of the training update steps, SGD moves in valley like regions of the loss surface by jumping from one valley wall to another at a height above the valley floor.

Variational Bi-LSTMs

no code implementations ICLR 2018 Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio

Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data.

Three Factors Influencing Minima in SGD

no code implementations ICLR 2018 Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

In particular we find that the ratio of learning rate to batch size is a key determinant of SGD dynamics and of the width of the final minima, and that higher values of the ratio lead to wider minima and often better generalization.

Fraternal Dropout

1 code implementation ICLR 2018 Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.

Image Captioning Language Modelling

Residual Connections Encourage Iterative Inference

no code implementations ICLR 2018 Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.

Representation Learning

On Optimality Conditions for Auto-Encoder Signal Recovery

no code implementations ICLR 2018 Devansh Arpit, Yingbo Zhou, Hung Q. Ngo, Nils Napp, Venu Govindaraju

Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost.

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

no code implementations4 Mar 2016 Devansh Arpit, Yingbo Zhou, Bhargava U. Kota, Venu Govindaraju

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks.

Why Regularized Auto-Encoders learn Sparse Representation?

no code implementations21 May 2015 Devansh Arpit, Yingbo Zhou, Hung Ngo, Venu Govindaraju

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- \textit{Internal Covariate Shift}-- the current solution has certain drawbacks.

Dimensionality Reduction with Subspace Structure Preservation

no code implementations NeurIPS 2014 Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju

Modeling data as being sampled from a union of independent subspaces has been widely applied to a number of real world applications.

Dimensionality Reduction

Is Joint Training Better for Deep Auto-Encoders?

no code implementations6 May 2014 Yingbo Zhou, Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju

But due to the greedy scheme of the layerwise training technique, the parameters of lower layers are fixed when training higher layers.

An Analysis of Random Projections in Cancelable Biometrics

no code implementations17 Jan 2014 Devansh Arpit, Ifeoma Nwogu, Gaurav Srivastava, Venu Govindaraju

With increasing concerns about security, the need for highly secure physical biometrics-based authentication systems utilizing \emph{cancelable biometric} technologies is on the rise.

Face Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.