Softmax Optimizations for Intel Xeon Processor-based Platforms
Softmax is popular normalization method used in machine learning. Deep learning solutions like Transformer or BERT use the softmax function intensively, so it is worthwhile to optimize its performance. This article presents our methodology of optimization and its results applied to softmax. By presenting this methodology, we hope to increase an interest in deep learning optimizations for CPUs. We believe that the optimization process presented here could be transferred to other deep learning frameworks such as TensorFlow or PyTorch.
PDF AbstractDatasets
Add Datasets
introduced or used in this paper
Results from the Paper
Submit
results from this paper
to get state-of-the-art GitHub badges and help the
community compare results to other papers.
Methods
Absolute Position Encodings •
Adam •
Attention Dropout •
BERT •
BPE •
Dense Connections •
Dropout •
GELU •
Label Smoothing •
Layer Normalization •
Linear Layer •
Linear Warmup With Linear Decay •
Multi-Head Attention •
Position-Wise Feed-Forward Layer •
ReLU •
Residual Connection •
Scaled Dot-Product Attention •
Softmax •
Transformer •
Weight Decay •
WordPiece