MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

29 Sep 2021 · Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer ·

Ensembles are a very effective way of increasing both the robustness and accuracy of a learning system. Yet they are memory and compute intensive; in a naive ensemble, $n$ networks are trained independently and $n$ networks must be stored. Recently, BatchEnsemble \citep{wen2020batchensemble} and MIMO \citep{havasi2020training} has significantly decreased the memory footprint with classification performance that approaches that of a naive ensemble. We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates. Then, each ensemble member is defined as a different set of linear combination weights. By modulating the number of templates available, MixtureEnsembles are uniquely flexible and allow easy scaling between the low-parameter and high-parameter regime. In the low parameter regime, MixtureEnsembles outperforms BatchEnsemble on both ImageNet and CIFAR, and are competitive with MIMO. In the high-parameter regime, MixtureEnsembles outperform all baselines on CIFAR and ImageNet. This flexibility allows users to control the precise performance-memory cost trade-off without making any changes in the backbone architecture. When we additionally tune the backbone architecture width, we can outperform all baselines in the low-parameter regime with the same inference FLOP footprint.

PDF Abstract