no code implementations • 19 Jan 2024 • Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese
Under appropriate assumptions and conditioning, we can separate the sources or sinks from the remainder of the nodes by comparing their conditional entropy to the unconditional entropy of their noise.
no code implementations • 15 Jan 2024 • Itai Feigenbaum, Devansh Arpit, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Silvio Savarese
On datasets of binary propositions derived from the CounterFact dataset, we show that our method -- without access to subject labels -- performs close to state-of-the-art L\&E methods which has access subject labels.
2 code implementations • 11 Aug 2023 • Zhiwei Liu, Weiran Yao, JianGuo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
The massive successes of large language models (LLMs) encourage the emerging exploration of LLM-augmented Autonomous Agents (LAAs).
1 code implementation • 4 Aug 2023 • Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, JianGuo Zhang, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
This demonstrates that using policy gradient optimization to improve language agents, for which we believe our work is one of the first, seems promising and can be applied to optimize other models in the agent architecture to enhance agent performances over time.
no code implementations • 18 Jul 2023 • Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
no code implementations • 10 Mar 2023 • Itai Feigenbaum, Huan Wang, Shelby Heinecke, Juan Carlos Niebles, Weiran Yao, Caiming Xiong, Devansh Arpit
We then provide an analytic average case analysis of the PC Algorithm for causal discovery, as well as a variant of the SGS Algorithm we call UniformSGS.
1 code implementation • 25 Jan 2023 • Devansh Arpit, Matthew Fernandez, Itai Feigenbaum, Weiran Yao, Chenghao Liu, Wenzhuo Yang, Paul Josel, Shelby Heinecke, Eric Hu, Huan Wang, Stephen Hoi, Caiming Xiong, Kun Zhang, Juan Carlos Niebles
Finally, we provide a user interface (UI) that allows users to perform causal analysis on data without coding.
1 code implementation • 21 Oct 2021 • Devansh Arpit, Huan Wang, Yingbo Zhou, Caiming Xiong
We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stopping.
Ranked #5 on
Domain Generalization
on TerraIncognita
no code implementations • 19 Oct 2021 • Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong
Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution.
no code implementations • 19 Oct 2021 • Bram Wallace, Devansh Arpit, Huan Wang, Caiming Xiong
Pretraining convolutional neural networks via self-supervision, and applying them in transfer learning, is an incredibly fast-growing field that is rapidly and iteratively improving performance across practically all image domains.
2 code implementations • 20 Sep 2021 • Aadyot Bhatnagar, Paul Kassianik, Chenghao Liu, Tian Lan, Wenzhuo Yang, Rowan Cassius, Doyen Sahoo, Devansh Arpit, Sri Subramanian, Gerald Woo, Amrita Saha, Arun Kumar Jagota, Gokulakrishnan Gopalakrishnan, Manpreet Singh, K C Krithika, Sukumar Maddineni, Daeki Cho, Bo Zong, Yingbo Zhou, Caiming Xiong, Silvio Savarese, Steven Hoi, Huan Wang
We introduce Merlion, an open-source machine learning library for time series.
no code implementations • 1 Jan 2021 • Devansh Arpit, Aadyot Bhatnagar, Huan Wang, Caiming Xiong
Quantitatively, we show that our algorithm achieves a new state-of-the-art FID of 54. 36 on CIFAR-10, and performs competitively with existing models on CelebA in terms of FID score.
no code implementations • 1 Jan 2021 • Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio
Disjoint Manifold Separation: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.
no code implementations • 28 Dec 2020 • Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras
The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function.
no code implementations • ICLR 2020 • Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras
We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD.
1 code implementation • 20 Feb 2020 • Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, Yoshua Bengio
Disjoint Manifold Labeling: Neural Bayes allows us to formulate an objective which can optimally label samples from disjoint manifolds present in the support of a continuous distribution.
2 code implementations • 1 Oct 2019 • Devansh Arpit, Caiming Xiong, Richard Socher
In this paper, we consider distribution shift as a shift in the distribution of input features during test time that exhibit low correlation with targets in the training set.
no code implementations • 25 Sep 2019 • Devansh Arpit, Caiming Xiong, Richard Socher
This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features.
2 code implementations • NeurIPS 2019 • Devansh Arpit, Victor Campos, Yoshua Bengio
Finally, we show that using our initialization in conjunction with learning rate warmup is able to reduce the gap between the performance of weight normalized and batch normalized networks.
no code implementations • ICLR 2019 • Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio
The non-convex nature of the loss landscape of deep neural networks (DNN) lends them the intuition that over the course of training, stochastic optimization algorithms explore different regions of the loss surface by entering and escaping many local minima due to the noise induced by mini-batches.
1 code implementation • 11 Jan 2019 • Devansh Arpit, Yoshua Bengio
These results are derived using the PAC analysis framework, and hold true for finitely sized datasets such that the width of the ReLU network only needs to be larger than a certain finite lower bound.
1 code implementation • ICLR 2019 • Devansh Arpit, Bhargav Kanuparthi, Giancarlo Kerg, Nan Rosemary Ke, Ioannis Mitliagkas, Yoshua Bengio
This problem becomes more evident in tasks where the information needed to correctly solve them exist over long time scales, because EVGP prevents important gradient components from being back-propagated adequately over a large number of steps.
2 code implementations • ICLR 2019 • Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville
Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy.
no code implementations • 24 Feb 2018 • Chen Xing, Devansh Arpit, Christos Tsirigotis, Yoshua Bengio
Based on this and other metrics, we deduce that for most of the training update steps, SGD moves in valley like regions of the loss surface by jumping from one valley wall to another at a height above the valley floor.
no code implementations • ICLR 2018 • Samira Shabanian, Devansh Arpit, Adam Trischler, Yoshua Bengio
Bidirectional LSTMs (Bi-LSTMs) on the other hand model sequences along both forward and backward directions and are generally known to perform better at such tasks because they capture a richer representation of the data.
no code implementations • ICLR 2018 • Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey
In particular we find that the ratio of learning rate to batch size is a key determinant of SGD dynamics and of the width of the final minima, and that higher values of the ratio lead to wider minima and often better generalization.
1 code implementation • ICLR 2018 • Konrad Zolna, Devansh Arpit, Dendi Suhubdy, Yoshua Bengio
We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout.
Ranked #28 on
Language Modelling
on Penn Treebank (Word Level)
no code implementations • ICLR 2018 • Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio
In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features.
2 code implementations • ICML 2017 • Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien
We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness.
no code implementations • ICLR 2018 • Devansh Arpit, Yingbo Zhou, Hung Q. Ngo, Nils Napp, Venu Govindaraju
Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost.
no code implementations • 4 Mar 2016 • Devansh Arpit, Yingbo Zhou, Bhargava U. Kota, Venu Govindaraju
While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- Internal Covariate Shift-- the current solution has certain drawbacks.
no code implementations • 21 May 2015 • Devansh Arpit, Yingbo Zhou, Hung Ngo, Venu Govindaraju
While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks-- \textit{Internal Covariate Shift}-- the current solution has certain drawbacks.
no code implementations • NeurIPS 2014 • Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju
Modeling data as being sampled from a union of independent subspaces has been widely applied to a number of real world applications.
no code implementations • 6 May 2014 • Yingbo Zhou, Devansh Arpit, Ifeoma Nwogu, Venu Govindaraju
But due to the greedy scheme of the layerwise training technique, the parameters of lower layers are fixed when training higher layers.
no code implementations • 17 Jan 2014 • Devansh Arpit, Ifeoma Nwogu, Gaurav Srivastava, Venu Govindaraju
With increasing concerns about security, the need for highly secure physical biometrics-based authentication systems utilizing \emph{cancelable biometric} technologies is on the rise.