MixtureEnsembles: Leveraging Parameter Sharing for Efficient Ensembles

29 Sep 2021  ·  Piotr Teterwak, Nikoli Dryden, Dina Bashkirova, Kate Saenko, Bryan A. Plummer ·

Ensembles are a very effective way of increasing both the robustness and accuracy of a learning system. Yet they are memory and compute intensive; in a naive ensemble, $n$ networks are trained independently and $n$ networks must be stored. Recently, BatchEnsemble \citep{wen2020batchensemble} and MIMO \citep{havasi2020training} has significantly decreased the memory footprint with classification performance that approaches that of a naive ensemble. We improve on these methods with MixtureEnsembles, which learns to factorize ensemble members with shared parameters by constructing each layer with a linear combination of templates. Then, each ensemble member is defined as a different set of linear combination weights. By modulating the number of templates available, MixtureEnsembles are uniquely flexible and allow easy scaling between the low-parameter and high-parameter regime. In the low parameter regime, MixtureEnsembles outperforms BatchEnsemble on both ImageNet and CIFAR, and are competitive with MIMO. In the high-parameter regime, MixtureEnsembles outperform all baselines on CIFAR and ImageNet. This flexibility allows users to control the precise performance-memory cost trade-off without making any changes in the backbone architecture. When we additionally tune the backbone architecture width, we can outperform all baselines in the low-parameter regime with the same inference FLOP footprint.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here