Search Results for author: Devansh Arpit

Found 35 papers, 13 papers with code

Causal Layering via Conditional Entropy

no code implementations • 19 Jan 2024 • Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

Under appropriate assumptions and conditioning, we can separate the sources or sinks from the remainder of the nodes by comparing their conditional entropy to the unconditional entropy of their noise.

Causal Discovery

Paper
Add Code

Editing Arbitrary Propositions in LLMs without Subject Labels

no code implementations • 15 Jan 2024 • Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese

On datasets of binary propositions derived from the CounterFact dataset, we show that our method -- without access to subject labels -- performs close to state-of-the-art L\&E methods which has access subject labels.

Language Modelling Large Language Model +1

Paper
Add Code

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

2 code implementations • 11 Aug 2023 • Zhiwei Liu, Weiran Yao, JianGuo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs).

Benchmarking Decision Making

319

Paper
Code

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

1 code implementation • 4 Aug 2023 • Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.

Language Modelling

Paper
Code

REX: Rapid Exploration and eXploitation for AI Agents

no code implementations • 18 Jul 2023 • Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.

Decision Making Reinforcement Learning (RL)

Paper
Add Code

On the Unlikelihood of D-Separation

no code implementations • 10 Mar 2023 • Itai Feigenbaum, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Devansh Arpit

We then provide an analytic average case analysis of the PC Algorithm for causal discovery, as well as a variant of the SGS Algorithm we call UniformSGS.

Causal Discovery

Paper
Add Code

Salesforce CausalAI Library: A Fast and Scalable Framework for Causal Analysis of Time Series and Tabular Data

1 code implementation • 25 Jan 2023 • Devansh Arpit, Matthew Fernandez, Itai Feigenbaum, Weiran Yao, Chenghao Liu, Wenzhuo Yang, Paul Josel, Shelby Heinecke, Eric Hu, Huan Wang, Stephen Hoi, Caiming Xiong, Kun Zhang, Juan Carlos Niebles

Finally, we provide a user interface (UI) that allows users to perform causal analysis on data without coding.

Causal Discovery Causal Inference +2

235

Paper
Code

Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization

1 code implementation • 21 Oct 2021 • Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong

We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stopping.

Ranked #4 on Domain Generalization on TerraIncognita

Domain Generalization Model Selection

Paper
Code

Momentum Contrastive Autoencoder: Using Contrastive Learning for Latent Space Distribution Matching in WAE

no code implementations • 19 Oct 2021 • Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong

Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution.

Contrastive Learning Representation Learning

Paper
Add Code

Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles

no code implementations • 19 Oct 2021 • Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong

Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains.

Transfer Learning

Paper
Add Code

Merlion: A Machine Learning Library for Time Series

2 code implementations • 20 Sep 2021 • Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang

We introduce Merlion, an open-source machine learning library for time series.

Anomaly Detection BIG-bench Machine Learning +2

3,284

Paper
Code

Momentum Contrastive Autoencoder

no code implementations • 1 Jan 2021 • Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong

Quantitatively, we show that our algorithm achieves a new state-of-the-art FID of 54. 36 on CIFAR-10, and performs competitively with existing models on CelebA in terms of FID score.

Contrastive Learning Representation Learning

Paper
Add Code

Neural Bayes: A Generic Parameterization Method for Unsupervised Learning

no code implementations • 1 Jan 2021 • Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Clustering Representation Learning

Paper
Add Code

Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

no code implementations • 28 Dec 2020 • Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras

The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function.

Memorization

Paper
Add Code

The Break-Even Point on Optimization Trajectories of Deep Neural Networks

no code implementations • ICLR 2020 • Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras

We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD.

Paper
Add Code

Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning

1 code implementation • 20 Feb 2020 • Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio

Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.

Clustering Representation Learning

Paper
Code

Predicting with High Correlation Features

2 code implementations • 1 Oct 2019 • Devansh Arpit, Caiming Xiong, Richard Socher

In this paper, we consider distribution shift as a shift in the distribution of input features during test time that exhibit low correlation with targets in the training set.

Vocal Bursts Intensity Prediction

Paper
Code

Entropy Penalty: Towards Generalization Beyond the IID Assumption

no code implementations • 25 Sep 2019 • Devansh Arpit, Caiming Xiong, Richard Socher

This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features.

Paper
Add Code

How to Initialize your Network? Robust Initialization for WeightNorm & ResNets

2 code implementations • NeurIPS 2019 • Devansh Arpit, Victor Campos, Yoshua Bengio

Finally, we show that using our initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.

Paper
Code

A Walk with SGD: How SGD Explores Regions of Deep Network Loss?

no code implementations • ICLR 2019 • Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio

The non-convex nature of the loss landscape of deep neural networks (DNN) lends them the intuition that over the course of training, stochastic optimization algorithms explore different regions of the loss surface by entering and escaping many local minima due to the noise induced by mini-batches.

Stochastic Optimization

Paper
Add Code

The Benefits of Over-parameterization at Initialization in Deep ReLU Networks

1 code implementation • 11 Jan 2019 • Devansh Arpit, Yoshua Bengio

These results are derived using the PAC analysis framework, and hold true for finitely sized datasets such that the width of the ReLU network only needs to be larger than a certain finite lower bound.

Paper
Code

h-detach: Modifying the LSTM Gradient Towards Better Optimization

1 code implementation • ICLR 2019 • Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio

This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps.

Paper
Code

On the Spectral Bias of Neural Networks

2 code implementations • ICLR 2019 • Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.

Paper
Code

A Walk with SGD

no code implementations • 24 Feb 2018 • Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio

Based on this and other metrics, we deduce that for most of the training update steps, SGD moves in valley like regions of the loss surface by jumping from one valley wall to another at a height above the valley floor.

Paper
Add Code

Variational Bi-LSTMs

no code implementations • ICLR 2018 • Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio

Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data.

Paper
Add Code

Three Factors Influencing Minima in SGD

no code implementations • ICLR 2018 • Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

In particular we find that the ratio of learning rate to batch size is a key determinant of SGD dynamics and of the width of the final minima, and that higher values of the ratio lead to wider minima and often better generalization.

Memorization Open-Ended Question Answering

Paper
Add Code

Fraternal Dropout

1 code implementation • ICLR 2018 • Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio

We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.

Ranked #28 on Language Modelling on Penn Treebank (Word Level)

Image Captioning Language Modelling

Paper
Code

Residual Connections Encourage Iterative Inference

no code implementations • ICLR 2018 • Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.

Representation Learning

Paper
Add Code

A Closer Look at Memorization in Deep Networks

2 code implementations • ICML 2017 • Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness.

Adversarial Robustness Memorization

Paper
Code

On Optimality Conditions for Auto-Encoder Signal Recovery

no code implementations • ICLR 2018 • Devansh Arpit, Yingbo Zhou, Hung Q. Ngo, Nils Napp, Venu Govindaraju

Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost.

Paper
Add Code

Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks

no code implementations • 4 Mar 2016 • Devansh Arpit, Yingbo Zhou, Bhargava U. Kota, Venu Govindaraju

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks.

Paper
Add Code

Why Regularized Auto-Encoders learn Sparse Representation?

no code implementations • 21 May 2015 • Devansh Arpit, Yingbo Zhou, Hung Ngo, Venu Govindaraju

While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- \textit{Internal Covariate Shift}-- the current solution has certain drawbacks.

Paper
Add Code

Dimensionality Reduction with Subspace Structure Preservation

no code implementations • NeurIPS 2014 • Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju

Modeling data as being sampled from a union of independent subspaces has been widely applied to a number of real world applications.

2k Dimensionality Reduction

Paper
Add Code

Is Joint Training Better for Deep Auto-Encoders?

no code implementations • 6 May 2014 • Yingbo Zhou, Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju

But due to the greedy scheme of the layerwise training technique, the parameters of lower layers are fixed when training higher layers.

Paper
Add Code

An Analysis of Random Projections in Cancelable Biometrics

no code implementations • 17 Jan 2014 • Devansh Arpit, Ifeoma Nwogu, Gaurav Srivastava, Venu Govindaraju

With increasing concerns about security, the need for highly secure physical biometrics-based authentication systems utilizing \emph{cancelable biometric} technologies is on the rise.

Face Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.