Search Results for author: Dustin Tran

Found 52 papers, 27 papers with code

Larger language models do in-context learning differently

no code implementations7 Mar 2023 Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, Tengyu Ma

We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e. g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task.

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

no code implementations13 Feb 2023 James Urquhart Allingham, Jie Ren, Michael W Dusenberry, Xiuye Gu, Yin Cui, Dustin Tran, Jeremiah Zhe Liu, Balaji Lakshminarayanan

In particular, we ask "Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?".

Prompt Engineering Zero-Shot Learning

A Simple Approach to Improve Single-Model Deep Uncertainty via Distance-Awareness

2 code implementations1 May 2022 Jeremiah Zhe Liu, Shreyas Padhy, Jie Ren, Zi Lin, Yeming Wen, Ghassen Jerfel, Zack Nado, Jasper Snoek, Dustin Tran, Balaji Lakshminarayanan

The most popular approaches to estimate predictive uncertainty in deep learning are methods that combine predictions from multiple neural networks, such as Bayesian neural networks (BNNs) and deep ensembles.

Data Augmentation Probabilistic Deep Learning

Sparse MoEs meet Efficient Ensembles

1 code implementation7 Oct 2021 James Urquhart Allingham, Florian Wenzel, Zelda E Mariet, Basil Mustafa, Joan Puigcerver, Neil Houlsby, Ghassen Jerfel, Vincent Fortuin, Balaji Lakshminarayanan, Jasper Snoek, Dustin Tran, Carlos Riquelme Ruiz, Rodolphe Jenatton

Machine learning models based on the aggregated outputs of submodels, either at the activation or prediction levels, often exhibit strong performance compared to individual models.

Few-Shot Learning

Deep Classifiers with Label Noise Modeling and Distance Awareness

no code implementations6 Oct 2021 Vincent Fortuin, Mark Collier, Florian Wenzel, James Allingham, Jeremiah Liu, Dustin Tran, Balaji Lakshminarayanan, Jesse Berent, Rodolphe Jenatton, Effrosyni Kokiopoulou

Uncertainty estimation in deep learning has recently emerged as a crucial area of interest to advance reliability and robustness in safety-critical applications.

Out-of-Distribution Detection

Soft Calibration Objectives for Neural Networks

no code implementations NeurIPS 2021 Archit Karandikar, Nicholas Cain, Dustin Tran, Balaji Lakshminarayanan, Jonathon Shlens, Michael C. Mozer, Becca Roelofs

When incorporated into training, these soft calibration losses achieve state-of-the-art single-model ECE across multiple datasets with less than 1% decrease in accuracy.

Decision Making

RecSim NG: Toward Principled Uncertainty Modeling for Recommender Ecosystems

1 code implementation14 Mar 2021 Martin Mladenov, Chih-Wei Hsu, Vihan Jain, Eugene Ie, Christopher Colby, Nicolas Mayoraz, Hubert Pham, Dustin Tran, Ivan Vendrov, Craig Boutilier

The development of recommender systems that optimize multi-turn interaction with users, and model the interactions of different agents (e. g., users, content providers, vendors) in the recommender ecosystem have drawn increasing attention in recent years.

counterfactual Probabilistic Programming +1

Distilling Ensembles Improves Uncertainty Estimates

no code implementations pproximateinference AABI Symposium 2021 Zelda E Mariet, Rodolphe Jenatton, Florian Wenzel, Dustin Tran

We seek to bridge the performance gap between batch ensembles (ensembles of deep networks with shared parameters) and deep ensembles on tasks which require not only predictions, but also uncertainty estimates for these predictions.

Why Are Bootstrapped Deep Ensembles Not Better?

no code implementations NeurIPS Workshop ICBINB 2020 Jeremy Nixon, Balaji Lakshminarayanan, Dustin Tran

Ensemble methods have consistently reached state of the art across predictive, uncertainty, and out-of-distribution robustness benchmarks.

Combining Ensembles and Data Augmentation can Harm your Calibration

no code implementations ICLR 2021 Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W. Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran

Ensemble methods which average over multiple neural network predictions are a simple approach to improve a model's calibration and robustness.

Data Augmentation

Training independent subnetworks for robust prediction

2 code implementations ICLR 2021 Marton Havasi, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew M. Dai, Dustin Tran

Recent approaches to efficiently ensemble neural networks have shown that strong robustness and uncertainty performance can be achieved with a negligible gain in parameters over the original network.

Hyperparameter Ensembles for Robustness and Uncertainty Quantification

3 code implementations NeurIPS 2020 Florian Wenzel, Jasper Snoek, Dustin Tran, Rodolphe Jenatton

Ensembles over neural network weights trained from different random initialization, known as deep ensembles, achieve state-of-the-art accuracy and calibration.

Image Classification

Simple and Principled Uncertainty Estimation with Deterministic Deep Learning via Distance Awareness

3 code implementations NeurIPS 2020 Jeremiah Zhe Liu, Zi Lin, Shreyas Padhy, Dustin Tran, Tania Bedrax-Weiss, Balaji Lakshminarayanan

Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model.

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

1 code implementation ICML 2020 Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning.

On the Discrepancy between Density Estimation and Sequence Generation

1 code implementation EMNLP (spnlp) 2020 Jason Lee, Dustin Tran, Orhan Firat, Kyunghyun Cho

In this paper, by comparing several density estimators on five machine translation tasks, we find that the correlation between rankings of models based on log-likelihood and BLEU varies significantly depending on the range of the model families being compared.

Density Estimation Machine Translation +2

BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning

5 code implementations ICLR 2020 Yeming Wen, Dustin Tran, Jimmy Ba

We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs.

Test

Refining the variational posterior through iterative optimization

no code implementations25 Sep 2019 Marton Havasi, Jasper Snoek, Dustin Tran, Jonathan Gordon, José Miguel Hernández-Lobato

Variational inference (VI) is a popular approach for approximate Bayesian inference that is particularly promising for highly parameterized models such as deep neural networks.

Bayesian Inference Variational Inference

Analyzing the Role of Model Uncertainty for Electronic Health Records

1 code implementation10 Jun 2019 Michael W. Dusenberry, Dustin Tran, Edward Choi, Jonas Kemp, Jeremy Nixon, Ghassen Jerfel, Katherine Heller, Andrew M. Dai

We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups.

Discrete Flows: Invertible Generative Models of Discrete Data

2 code implementations NeurIPS 2019 Dustin Tran, Keyon Vafa, Kumar Krishna Agrawal, Laurent Dinh, Ben Poole

While normalizing flows have led to significant advances in modeling high-dimensional continuous distributions, their applicability to discrete distributions remains unknown.

Language Modelling

Reliable Uncertainty Estimates in Deep Neural Networks using Noise Contrastive Priors

no code implementations ICLR 2019 Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, James Davidson

NCPs are compatible with any model that can output uncertainty estimates, are easy to scale, and yield reliable uncertainty estimates throughout training.

Active Learning

Measuring Calibration in Deep Learning

2 code implementations2 Apr 2019 Jeremy Nixon, Mike Dusenberry, Ghassen Jerfel, Timothy Nguyen, Jeremiah Liu, Linchuan Zhang, Dustin Tran

In this paper, we perform a comprehensive empirical study of choices in calibration measures including measuring all probabilities rather than just the maximum prediction, thresholding probability values, class conditionality, number of bins, bins that are adaptive to the datapoint density, and the norm used to compare accuracies to confidences.

NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport

1 code implementation9 Mar 2019 Matthew Hoffman, Pavel Sountsov, Joshua V. Dillon, Ian Langmore, Dustin Tran, Srinivas Vasudevan

Hamiltonian Monte Carlo is a powerful algorithm for sampling from difficult-to-normalize posterior distributions.

Variational Inference

Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language

2 code implementations NeurIPS 2018 Matthew D. Hoffman, Matthew J. Johnson, Dustin Tran

Deriving conditional and marginal distributions using conjugacy relationships can be time consuming and error prone.

Simple, Distributed, and Accelerated Probabilistic Programming

1 code implementation NeurIPS 2018 Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous

For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips.

Probabilistic Programming

Noise Contrastive Priors for Functional Uncertainty

2 code implementations ICLR 2019 Danijar Hafner, Dustin Tran, Timothy Lillicrap, Alex Irpan, James Davidson

NCPs are compatible with any model that can output uncertainty estimates, are easy to scale, and yield reliable uncertainty estimates throughout training.

Active Learning

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches

3 code implementations ICLR 2018 Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Stochastic neural net weights are used in a variety of contexts, including regularization, Bayesian neural nets, exploration in reinforcement learning, and evolution strategies.

Image Transformer

no code implementations15 Feb 2018 Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Łukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

Image generation has been successfully cast as an autoregressive sequence generation or transformation problem.

Ranked #6 on Image Generation on ImageNet 32x32 (bpd metric)

Image Generation Image Super-Resolution

Generative Models for Alignment and Data Efficiency in Language

no code implementations ICLR 2018 Dustin Tran, Yura Burda, Ilya Sutskever

We examine how learning from unaligned data can improve both the data efficiency of supervised tasks as well as enable alignments without any supervision.

Decipherment Translation +1

Variational Inference via \chi Upper Bound Minimization

no code implementations NeurIPS 2017 Adji Bousso Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David Blei

In this paper we propose CHIVI, a black-box variational inference algorithm that minimizes $D_{\chi}(p || q)$, the $\chi$-divergence from $p$ to $q$.

Variational Inference

TensorFlow Distributions

9 code implementations28 Nov 2017 Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous

The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation.

Probabilistic Programming

Implicit Causal Models for Genome-wide Association Studies

no code implementations ICLR 2018 Dustin Tran, David M. Blei

For the first, we describe implicit causal models, a class of causal models that leverages neural architectures with an implicit density.

Bayesian Inference

Deep Probabilistic Programming

no code implementations13 Jan 2017 Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei

By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning.

Probabilistic Programming Variational Inference

Variational Inference via $χ$-Upper Bound Minimization

no code implementations1 Nov 2016 Adji B. Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David M. Blei

In this paper we propose CHIVI, a black-box variational inference algorithm that minimizes $D_{\chi}(p || q)$, the $\chi$-divergence from $p$ to $q$.

Variational Inference

Operator Variational Inference

no code implementations NeurIPS 2016 Rajesh Ranganath, Jaan Altosaar, Dustin Tran, David M. Blei

Though this divergence has been widely used, the resultant posterior approximation can suffer from undesirable statistical properties.

Bayesian Inference Variational Inference

Spectral M-estimation with Applications to Hidden Markov Models

no code implementations29 Mar 2016 Dustin Tran, Minjae Kim, Finale Doshi-Velez

Method of moment estimators exhibit appealing statistical properties, such as asymptotic unbiasedness, for nonconvex problems.

The Variational Gaussian Process

no code implementations20 Nov 2015 Dustin Tran, Rajesh Ranganath, David M. Blei

Variational inference is a powerful tool for approximate inference, and it has been recently applied for representation learning with deep generative models.

Representation Learning Variational Inference

Hierarchical Variational Models

1 code implementation7 Nov 2015 Rajesh Ranganath, Dustin Tran, David M. Blei

We study HVMs on a variety of deep discrete latent variable models.

Variational Inference

Stochastic gradient descent methods for estimation with large data sets

1 code implementation22 Sep 2015 Dustin Tran, Panos Toulis, Edoardo M. Airoldi

When the update is based on a noisy gradient, the stochastic approximation is known as standard stochastic gradient descent, which has been fundamental in modern applications with large data sets.

Copula variational inference

no code implementations NeurIPS 2015 Dustin Tran, David M. Blei, Edoardo M. Airoldi

We develop a general variational inference method that preserves dependency among the latent variables.

Stochastic Optimization Variational Inference

Towards stability and optimality in stochastic gradient descent

no code implementations10 May 2015 Panos Toulis, Dustin Tran, Edoardo M. Airoldi

For statistical efficiency, AI-SGD employs averaging of the iterates, which achieves the optimal Cram\'{e}r-Rao bound under strong convexity, i. e., it is an optimal unbiased estimator of the true parameter value.

Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data

2 code implementations16 Dec 2014 Aki Vehtari, Andrew Gelman, Tuomas Sivula, Pasi Jylänki, Dustin Tran, Swupnil Sahai, Paul Blomstedt, John P. Cunningham, David Schiminovich, Christian Robert

A common divide-and-conquer approach for Bayesian computation with big data is to partition the data, perform local inference for each piece separately, and combine the results to obtain a global posterior approximation.

Bayesian Inference

Convex Techniques for Model Selection

no code implementations27 Nov 2014 Dustin Tran

We develop a robust convex algorithm to select the regularization parameter in model selection.

Model Selection regression +1

Cannot find the paper you are looking for? You can Submit a new open access paper.