Why Are Bootstrapped Deep Ensembles Not Better?

Ensemble methods have consistently reached state of the art across predictive, uncertainty, and out-of-distribution robustness benchmarks. One of the most popular ways to construct an ensemble is to independently train each model on are sampled (bootstrapped) version of the dataset. Bootstrapping is popular in the literature on decision trees and frequentist statistics, with strong theoretical guarantees, but it is not used often in practice for deep neural networks. We investigate a common hypothesis for bootstrap’s weak performance—percentage of unique points in the subsampled dataset—and find that even when adjusting for it, boot-strap ensembles of deep neural networks yield no benefit over simpler baselines.This brings to question the role of data randomization as a source of uncertainty in deep learning.

PDF Abstract
No code implementations yet. Submit your code now


Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here