Open-sampling: Re-balancing Long-tailed Datasets with Out-of-Distribution Data

29 Sep 2021 · Hongxin Wei, Lue Tao, Renchunzi Xie, Lei Feng, Bo An ·

Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. To handle this issue, popular re-sampling methods generally require in-distribution data to balance the class priors. However, obtaining suitable in-distribution data with precise labels for selected classes is challenging. In this paper, we theoretically show that out-of-distribution data (i.e., open-set samples) could be leveraged to augment the minority classes from a Bayesian perspective. Based on this motivation, we propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset. For each open-set instance, the label is sampled from our pre-defined distribution that is complementary to the original class priors. Furthermore, class-dependent weights are generated to provide stronger regularization on the minority classes than on the majority classes. We empirically show that Open-sampling not only re-balances the class prior but also encourages the neural network to learn separable representations. Extensive experiments on benchmark datasets demonstrate that our proposed method significantly outperforms existing data re-balancing methods and can be easily incorporated into existing state-of-the-art methods to enhance their performance.

PDF Abstract