1 code implementation • 8 Jul 2024 • Katie Everett, Lechao Xiao, Mitchell Wortsman, Alexander A. Alemi, Roman Novak, Peter J. Liu, Izzeddin Gur, Jascha Sohl-Dickstein, Leslie Pack Kaelbling, Jaehoon Lee, Jeffrey Pennington
Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices.
no code implementations • 27 Jul 2023 • Inbar Seroussi, Alexander A. Alemi, Moritz Helias, Zohar Ringel
State-of-the-art neural networks require extreme computational power to train.
no code implementations • 14 Jul 2023 • Alexander A. Alemi, Ben Poole
In this paper, we present variational prediction, a technique for directly learning a variational approximation to the posterior predictive distribution using a variational bound.
no code implementations • 18 Nov 2022 • Yangjun Ruan, Saurabh Singh, Warren Morningstar, Alexander A. Alemi, Sergey Ioffe, Ian Fischer, Joshua V. Dillon
Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning.
no code implementations • 15 Feb 2022 • Yuqing Du, Daniel Ho, Alexander A. Alemi, Eric Jang, Mohi Khansari
In this work we investigate and demonstrate benefits of a Bayesian approach to imitation learning from multiple sensor inputs, as applied to the task of opening office doors with a mobile manipulator.
no code implementations • ICML Workshop AML 2021 • Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal
We study the adversarial robustness of information bottleneck models for classification.
2 code implementations • NeurIPS 2021 • Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson
Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks.
no code implementations • 19 Oct 2020 • Warren R. Morningstar, Alexander A. Alemi, Joshua V. Dillon
The Bayesian posterior minimizes the "inferential risk" which itself bounds the "predictive risk".
no code implementations • 16 Jun 2020 • Warren R. Morningstar, Cusuh Ham, Andrew G. Gallagher, Balaji Lakshminarayanan, Alexander A. Alemi, Joshua V. Dillon
Drawing on the statistical physics notion of ``density of states,'' the DoSE decision rule avoids direct comparison of model probabilities, and instead utilizes the ``probability of the model probability,'' or indeed the frequency of any reasonable statistic.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
+1
1 code implementation • 13 Feb 2020 • Ian Fischer, Alexander A. Alemi
We demonstrate that the Conditional Entropy Bottleneck (CEB) can improve model robustness.
3 code implementations • ICLR 2020 • Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, Samuel S. Schoenholz
Neural Tangents is a library designed to enable research into infinite-width neural networks.
1 code implementation • pproximateinference AABI Symposium 2019 • Ravid Shwartz-Ziv, Alexander A. Alemi
In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks.
no code implementations • pproximateinference AABI Symposium 2019 • Alexander A. Alemi
In classic papers, Zellner demonstrated that Bayesian inference could be derived as the solution to an information theoretic functional.
no code implementations • 21 Oct 2019 • Zhe Dong, Deniz Oktay, Ben Poole, Alexander A. Alemi
Certain biological neurons demonstrate a remarkable capability to optimally compress the history of sensory inputs while being maximally informative about the future.
no code implementations • 25 Sep 2019 • Zhe Dong, Deniz Oktay, Ben Poole, Alexander A. Alemi
Certain biological neurons demonstrate a remarkable capability to optimally compress the history of sensory inputs while being maximally informative about the future.
3 code implementations • 16 May 2019 • Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker
Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging.
1 code implementation • 30 Apr 2019 • Colin B. Clement, Matthew Bierbaum, Kevin P. O'Keeffe, Alexander A. Alemi
We use this pipeline to extract and analyze a 6. 7 million edge citation graph, with an 11 billion word corpus of full-text research articles.
no code implementations • 6 Dec 2018 • Emily Fertig, Aryan Arbabi, Alexander A. Alemi
In this paper, we investigate the degree to which the encoding of a $\beta$-VAE captures label information across multiple architectures on Binary Static MNIST and Omniglot.
1 code implementation • 2 Oct 2018 • Hyunsun Choi, Eric Jang, Alexander A. Alemi
Machine learning models encounter Out-of-Distribution (OoD) errors when the data seen at test time are generated from a different stochastic generator than the one used to generate the training data.
no code implementations • 27 Sep 2018 • Alexander A. Alemi, Ian Fischer
In this work we offer an information-theoretic framework for representation learning that connects with a wide class of existing objectives in machine learning.
no code implementations • 11 Jul 2018 • Alexander A. Alemi, Ian Fischer
In this work we offer a framework for reasoning about a wide class of existing objectives in machine learning.
no code implementations • 2 Jul 2018 • Alexander A. Alemi, Ian Fischer, Joshua V. Dillon
We present a simple case study, demonstrating that Variational Information Bottleneck (VIB) can improve a network's classification calibration as well as its ability to detect out-of-distribution data.
1 code implementation • NeurIPS 2018 • Alexander A. Alemi, Ian Fischer
We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund).
1 code implementation • ICML 2018 • Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, Kevin Murphy
Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models.
no code implementations • 25 May 2017 • Lorien X. Hayden, Alexander A. Alemi, Paul H. Ginsparg, James P. Sethna
Neural networks have been shown to have a remarkable ability to uncover low dimensional structure in data: the space of possible reconstructed images form a reduced model manifold in image space.
no code implementations • 8 Dec 2016 • Ben Poole, Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova
We present a framework to understand GAN training as alternating density ratio estimation and approximate divergence minimization.
9 code implementations • 1 Dec 2016 • Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy
We present a variational approximation to the information bottleneck of Tishby et al. (1999).
no code implementations • 25 May 2015 • J. Massey Cashore, Xiaoting Zhao, Alexander A. Alemi, Yujia Liu, Peter I. Frazier
Much of the data being created on the web contains interactions between users and items.
1 code implementation • 20 Mar 2015 • Lorien X. Hayden, Ricky Chachra, Alexander A. Alemi, Paul H. Ginsparg, James P. Sethna
A classification of companies into sectors of the economy is important for macroeconomic analysis and for investments into the sector-specific financial indices and exchange traded funds (ETFs).
1 code implementation • 18 Mar 2015 • Alexander A. Alemi, Paul Ginsparg
We explore the use of semantic word embeddings in text segmentation algorithms, including the C99 segmentation algorithm and new algorithms inspired by the distributed word vector representation.