Search Results for author: Devansh Arpit

Found 35 papers, 12 papers with code

On the Spectral Bias of Neural Networks

2 code implementations ICLR 2019 Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.

Fraternal Dropout

1 code implementation ICLR 2018 Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.

Image Captioning Language Modelling

Predicting with High Correlation Features

2 code implementations1 Oct 2019 Devansh Arpit, Caiming Xiong, Richard Socher

In this paper, we consider distribution shift as a shift in the distribution of input features during test time that exhibit low correlation with targets in the training set.

Vocal Bursts Intensity Prediction

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

1 code implementation20 Feb 2020 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Clustering Representation Learning

Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

1 code implementation21 Oct 2021 Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong

We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stopping.

Domain Generalization Model Selection

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

2 code implementations NeurIPS 2019 Devansh Arpit, Victor Campos, Yoshua Bengio

Finally, we show that using our initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

h-detach: Modifying the LSTM Gradient Towards Better Optimization

1 code implementation ICLR 2019 Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio

This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps.

The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

1 code implementation11 Jan 2019 Devansh Arpit, Yoshua Bengio

These results are derived using the PAC analysis framework, and hold true for finitely sized datasets such that the width of the ReLU network only needs to be larger than a certain finite lower bound.

Three Factors Influencing Minima in SGD

no code implementations ICLR 2018 Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

In particular we find that the ratio of learning rate to batch size is a key determinant of SGD dynamics and of the width of the final minima, and that higher values of the ratio lead to wider minima and often better generalization.

Memorization Open-Ended Question Answering

A Walk with SGD

no code implementations24 Feb 2018 Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio

Based on this and other metrics, we deduce that for most of the training update steps, SGD moves in valley like regions of the loss surface by jumping from one valley wall to another at a height above the valley floor.

Residual Connections Encourage Iterative Inference

no code implementations ICLR 2018 Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.

Representation Learning

Variational Bi-LSTMs

no code implementations ICLR 2018 Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio

Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data.

On Optimality Conditions for Auto-Encoder Signal Recovery

no code implementations ICLR 2018 Devansh Arpit, Yingbo Zhou, Hung Q. Ngo, Nils Napp, Venu Govindaraju

Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost.

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

no code implementations4 Mar 2016 Devansh Arpit, Yingbo Zhou, Bhargava U. Kota, Venu Govindaraju

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks.

Why Regularized Auto-Encoders learn Sparse Representation?

no code implementations21 May 2015 Devansh Arpit, Yingbo Zhou, Hung Ngo, Venu Govindaraju

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- \textit{Internal Covariate Shift}-- the current solution has certain drawbacks.

Dimensionality Reduction with Subspace Structure Preservation

no code implementations NeurIPS 2014 Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju

Modeling data as being sampled from a union of independent subspaces has been widely applied to a number of real world applications.

2k Dimensionality Reduction

Is Joint Training Better for Deep Auto-Encoders?

no code implementations6 May 2014 Yingbo Zhou, Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju

But due to the greedy scheme of the layerwise training technique, the parameters of lower layers are fixed when training higher layers.

An Analysis of Random Projections in Cancelable Biometrics

no code implementations17 Jan 2014 Devansh Arpit, Ifeoma Nwogu, Gaurav Srivastava, Venu Govindaraju

With increasing concerns about security, the need for highly secure physical biometrics-based authentication systems utilizing \emph{cancelable biometric} technologies is on the rise.

Face Recognition

A Walk with SGD: How SGD Explores Regions of Deep Network Loss?

no code implementations ICLR 2019 Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio

The non-convex nature of the loss landscape of deep neural networks (DNN) lends them the intuition that over the course of training, stochastic optimization algorithms explore different regions of the loss surface by entering and escaping many local minima due to the noise induced by mini-batches.

Stochastic Optimization

The Break-Even Point on Optimization Trajectories of Deep Neural Networks

no code implementations ICLR 2020 Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras

We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD.

Neural Bayes: A Generic Parameterization Method for Unsupervised Learning

no code implementations1 Jan 2021 Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Clustering Representation Learning

Momentum Contrastive Autoencoder

no code implementations1 Jan 2021 Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong

Quantitatively, we show that our algorithm achieves a new state-of-the-art FID of 54. 36 on CIFAR-10, and performs competitively with existing models on CelebA in terms of FID score.

Contrastive Learning Representation Learning

Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE

no code implementations19 Oct 2021 Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong

Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution.

Contrastive Learning Representation Learning

Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles

no code implementations19 Oct 2021 Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong

Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains.

Transfer Learning

Entropy Penalty: Towards Generalization Beyond the IID Assumption

no code implementations25 Sep 2019 Devansh Arpit, Caiming Xiong, Richard Socher

This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features.

On the Unlikelihood of D-Separation

no code implementations10 Mar 2023 Itai Feigenbaum, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Devansh Arpit

We then provide an analytic average case analysis of the PC Algorithm for causal discovery, as well as a variant of the SGS Algorithm we call UniformSGS.

Causal Discovery

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

no code implementations4 Aug 2023 Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.

Language Modelling

Editing Arbitrary Propositions in LLMs without Subject Labels

no code implementations15 Jan 2024 Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

On datasets of binary propositions derived from the CounterFact dataset, we show that our method -- without access to subject labels -- performs close to state-of-the-art L\&E methods which has access subject labels.

Language Modelling Large Language Model +1

Causal Layering via Conditional Entropy

no code implementations19 Jan 2024 Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

Under appropriate assumptions and conditioning, we can separate the sources or sinks from the remainder of the nodes by comparing their conditional entropy to the unconditional entropy of their noise.

Causal Discovery

Cannot find the paper you are looking for? You can Submit a new open access paper.