Mish: A Self Regularized Non-Monotonic Activation Function

We propose $\textit{Mish}$, a novel self-regularized non-monotonic activation function which can be mathematically defined as: $f(x)=x\tanh(softplus(x))$. As activation functions play a crucial role in the performance and training dynamics in neural networks, we validated experimentally on several well-known benchmarks against the best combinations of architectures and activation functions... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Image Classification CIFAR-10 ResNet v2-20 (Mish activation) Percentage correct 92.02 # 62
Image Classification CIFAR-10 ResNet 9 + Mish Percentage correct 94.05 # 50
Image Classification CIFAR-100 ResNet v2-110 (Mish activation) Percentage correct 74.41 # 45
Object Detection COCO test-dev CSP-p6 + Mish (single scale) box AP 54.3 # 6
AP50 72.3 # 7
AP75 59.5 # 6
APS 36.6 # 5
APM 58.2 # 4
APL 65.5 # 9
Object Detection COCO test-dev CSP-p6 + Mish (multi-scale) box AP 54.9 # 4
AP50 72.6 # 5
AP75 60.2 # 3
APS 37.4 # 3
APM 58.8 # 3
APL 66.7 # 6
Image Classification ImageNet CSPResNeXt-50 + Mish Top 1 Accuracy 79.8% # 70
Top 5 Accuracy 95.2% # 38

Methods used in the Paper


METHOD TYPE
Tanh Activation
Activation Functions
Softplus
Activation Functions
SimpleNet
Convolutional Neural Networks
ResNeXt Block
Skip Connection Blocks
Channel Shuffle
Miscellaneous Components
Concatenated Skip Connection
Skip Connections
Depthwise Convolution
Convolutions
Pointwise Convolution
Convolutions
Xavier Initialization
Initialization
Fire Module
Image Model Blocks
SqueezeNet
Convolutional Neural Networks
MobileNetV1
Convolutional Neural Networks
ShuffleNet V2 Downsampling Block
Image Model Blocks
ShuffleNet V2 Block
Image Model Blocks
ShuffleNet v2
Convolutional Neural Networks
Dense Block
Image Model Blocks
DenseNet
Convolutional Neural Networks
Softmax
Output Functions
Depthwise Separable Convolution
Convolutions
Xception
Convolutional Neural Networks
Average Pooling
Pooling Operations
Sigmoid Activation
Activation Functions
Dense Connections
Feedforward Networks
Squeeze-and-Excitation Block
Image Model Blocks
Wide Residual Block
Skip Connection Blocks
WideResNet
Image Models
Mixup
Image Data Augmentation
Cosine Annealing
Learning Rate Schedules
Dropout
Regularization
NADAM
Stochastic Optimization
L1 Regularization
Regularization
Weight Decay
Regularization
Grouped Convolution
Convolutions
Bottleneck Residual Block
Skip Connection Blocks
Global Average Pooling
Pooling Operations
Residual Block
Skip Connection Blocks
Residual Connection
Skip Connections
Kaiming Initialization
Initialization
Max Pooling
Pooling Operations
1x1 Convolution
Convolutions
Convolution
Convolutions
Batch Normalization
Normalization
ResNet
Convolutional Neural Networks
ResNeXt
Convolutional Neural Networks
Swish
Activation Functions
Mish
Activation Functions
ReLU
Activation Functions
Leaky ReLU
Activation Functions