Generalizing MLPs With Dropouts, Batch Normalization, and Skip Connections

18 Aug 2021  ·  Taewoon Kim ·

A multilayer perceptron (MLP) is typically made of multiple fully connected layers with nonlinear activation functions. There have been several approaches to make them better (e.g. faster convergence, better convergence limit, etc.). But the researches lack structured ways to test them. We test different MLP architectures by carrying out the experiments on the age and gender datasets. We empirically show that by whitening inputs before every linear layer and adding skip connections, our proposed MLP architecture can result in better performance. Since the whitening process includes dropouts, it can also be used to approximate Bayesian inference. We have open sourced our code, and released models and docker images at

PDF Abstract


Results from the Paper

Ranked #2 on Age And Gender Classification on Adience Gender (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Age And Gender Classification Adience Age RetinaFace + ArcFace + MLP + IC + Skip connections Accuracy (5-fold) 60.86 # 7
Age And Gender Classification Adience Gender RetinaFace + ArcFace + MLP + Skip connections Accuracy (5-fold) 90.66 # 2