SGD with Momentum

SGD with Momentum is a stochastic optimization method that adds a momentum term to regular stochastic gradient descent:

$$v_{t} = \gamma{v}_{t-1} + \eta\nabla_{\theta}J\left(\theta\right)$$ $$\theta_{t} = \theta_{t-1} - v_{t} $$

A typical value for $\gamma$ is $0.9$. The momentum name comes from an analogy to physics, such as ball accelerating down a slope. In the case of weight updates, we can think of the weights as a particle traveling through parameter space which incurs acceleration from the gradient of the loss.

Image Source: Juan Du

Latest Papers

PAPER DATE
Dual Averaging is Surprisingly Effective for Deep Learning Optimization
Samy JelassiAaron Defazio
2020-10-20
AEGD: Adaptive Gradient Decent with Energy
Hailiang LiuXuping Tian
2020-10-10
Understanding the Role of Momentum in Non-Convex Optimization: Practical Insights from a Lyapunov Analysis
Aaron Defazio
2020-10-01
Optimization of Graph Neural Networks with Natural Gradient Descent
| Mohammad Rasool IzadiYihao FangRobert StevensonLizhen Lin
2020-08-21
SpinalNet: Deep Neural Network with Gradual Input
| H M Dipu KabirMoloud AbdarSeyed Mohammad Jafar JalaliAbbas KhosraviAmir F. AtiyaSaeid NahavandiDipti Srinivasan
2020-07-07
Adaptive Braking for Mitigating Gradient Delay
Abhinav VenigallaAtli KossonVitaliy ChileyUrs Köster
2020-07-02
SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization
Navjot SinghDeepesh DataJemin GeorgeSuhas Diggavi
2020-05-13
Supervised Contrastive Learning
| Prannay KhoslaPiotr TeterwakChen WangAaron SarnaYonglong TianPhillip IsolaAaron MaschinotCe LiuDilip Krishnan
2020-04-23
YOLOv4: Optimal Speed and Accuracy of Object Detection
| Alexey BochkovskiyChien-Yao WangHong-Yuan Mark Liao
2020-04-23
ResNeSt: Split-Attention Networks
| Hang ZhangChongruo WuZhongyue ZhangYi ZhuZhi ZhangHaibin LinYue SunTong HeJonas MuellerR. ManmathaMu LiAlexander Smola
2020-04-19
Designing Network Design Spaces
| Ilija RadosavovicRaj Prateek KosarajuRoss GirshickKaiming HePiotr Dollár
2020-03-30
Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection
| Jianyuan GuoKai HanYunhe WangChao ZhangZhaohui YangHan WuXinghao ChenChang Xu
2020-03-26
Revisiting the Sibling Head in Object Detector
| Guanglu SongYu LiuXiaogang Wang
2020-03-17
Improved Baselines with Momentum Contrastive Learning
| Xinlei ChenHaoqi FanRoss GirshickKaiming He
2020-03-09
Momentum Improves Normalized SGD
Ashok CutkoskyHarsh Mehta
2020-02-09
Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network
| Jungkyu LeeTaeryun WonTae Kwan LeeHyemin LeeGeonmo GuKiho Hong
2020-01-17
Training Deep Networks with Stochastic Gradient Normalized by Layerwise Adaptive Second Moments
Boris GinsburgPatrice CastonguayOleksii HrinchukOleksii KuchaievVitaly LavrukhinRyan LearyJason LiHuyen NguyenYang ZhangJonathan M. Cohen
2020-01-01
Big Transfer (BiT): General Visual Representation Learning
| Alexander KolesnikovLucas BeyerXiaohua ZhaiJoan PuigcerverJessica YungSylvain GellyNeil Houlsby
2019-12-24
PointRend: Image Segmentation as Rendering
| Alexander KirillovYuxin WuKaiming HeRoss Girshick
2019-12-17
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
| Xianzhi DuTsung-Yi LinPengchong JinGolnaz GhiasiMingxing TanYin CuiQuoc V. LeXiaodan Song
2019-12-10
Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
| Shifeng ZhangCheng ChiYongqiang YaoZhen LeiStan Z. Li
2019-12-05
What's Hidden in a Randomly Weighted Neural Network?
| Vivek RamanujanMitchell WortsmanAniruddha KembhaviAli FarhadiMohammad Rastegari
2019-11-29
GhostNet: More Features from Cheap Operations
| Kai HanYunhe WangQi TianJianyuan GuoChunjing XuChang Xu
2019-11-27
CSPNet: A New Backbone that can Enhance Learning Capability of CNN
| Chien-Yao WangHong-Yuan Mark LiaoI-Hau YehYueh-Hua WuPing-Yang ChenJun-Wei Hsieh
2019-11-27
Learning Spatial Fusion for Single-Shot Object Detection
| Songtao LiuDi HuangYunhong Wang
2019-11-21
EfficientDet: Scalable and Efficient Object Detection
| Mingxing TanRuoming PangQuoc V. Le
2019-11-20
CenterMask : Real-Time Anchor-Free Instance Segmentation
| Youngwan LeeJongyoul Park
2019-11-15
Momentum Contrast for Unsupervised Visual Representation Learning
| Kaiming HeHaoqi FanYuxin WuSaining XieRoss Girshick
2019-11-13
An Exponential Learning Rate Schedule for Deep Learning
Zhiyuan LiSanjeev Arora
2019-10-16
Decaying momentum helps neural network training
| John ChenCameron WolfeZhao LiAnastasios Kyrillidis
2019-10-11
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks
| Qilong WangBanggu WuPengfei ZhuPeihua LiWangmeng ZuoQinghua Hu
2019-10-08
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation
| Hang GaoXizhou ZhuSteve LinJifeng Dai
2019-10-07
RandAugment: Practical automated data augmentation with a reduced search space
| Ekin D. CubukBarret ZophJonathon ShlensQuoc V. Le
2019-09-30
diffGrad: An Optimization Method for Convolutional Neural Networks
| Shiv Ram DubeySoumendu ChakrabortySwalpa Kumar RoySnehasis MukherjeeSatish Kumar SinghBidyut Baran Chaudhuri
2019-09-12
Deep High-Resolution Representation Learning for Visual Recognition
| Jingdong WangKe SunTianheng ChengBorui JiangChaorui DengYang ZhaoDong LiuYadong MuMingkui TanXinggang WangWenyu LiuBin Xiao
2019-08-20
LIP: Local Importance-based Pooling
| Ziteng GaoLimin WangGangshan Wu
2019-08-12
Attentive Normalization
| Xilai LiWei SunTianfu Wu
2019-08-04
MoGA: Searching Beyond MobileNetV3
| Xiangxiang ChuBo ZhangRuijun Xu
2019-08-04
Compact Global Descriptor for Neural Networks
| Xiangyu HeKe ChengQiang ChenQinghao HuPeisong WangJian Cheng
2019-07-23
Densely Connected Search Space for More Flexible Neural Architecture Search
| Jiemin FangYuzhu SunQian ZhangYuan LiWenyu LiuXinggang Wang
2019-06-23
Contrastive Multiview Coding
| Yonglong TianDilip KrishnanPhillip Isola
2019-06-13
Selective Kernel Networks
Xiang Li Wenhai Wang Xiaolin Hu Jian Yang
2019-06-01
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
| Boris GinsburgPatrice CastonguayOleksii HrinchukOleksii KuchaievVitaly LavrukhinRyan LearyJason LiHuyen NguyenYang ZhangJonathan M. Cohen
2019-05-27
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks
| Xiang LiXiaolin HuJian Yang
2019-05-23
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
| Sangdoo YunDongyoon HanSeong Joon OhSanghyuk ChunJunsuk ChoeYoungjoon Yoo
2019-05-13
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
Hao YuRong JinSen Yang
2019-05-09
Billion-scale semi-supervised learning for image classification
| I. Zeki YalnizHervé JégouKan ChenManohar PaluriDhruv Mahajan
2019-05-02
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
| Yue CaoJiarui XuStephen LinFangyun WeiHan Hu
2019-04-25
Local Relation Networks for Image Recognition
| Han HuZheng ZhangZhenda XieStephen Lin
2019-04-25
Attention Augmented Convolutional Networks
| Irwan BelloBarret ZophAshish VaswaniJonathon ShlensQuoc V. Le
2019-04-22
Data-Driven Neuron Allocation for Scale Aggregation Networks
| Yi LiZhanghui KuangYimin ChenWayne Zhang
2019-04-20
FoveaBox: Beyond Anchor-based Object Detector
| Tao KongFuchun SunHuaping LiuYuning JiangLei LiJianbo Shi
2019-04-08
Adaptive NMS: Refining Pedestrian Detection in a Crowd
Songtao LiuDi HuangYunhong Wang
2019-04-07
Res2Net: A New Multi-scale Backbone Architecture
| Shang-Hua GaoMing-Ming ChengKai ZhaoXin-Yu ZhangMing-Hsuan YangPhilip Torr
2019-04-02
Exploring Randomly Wired Neural Networks for Image Recognition
| Saining XieAlexander KirillovRoss GirshickKaiming He
2019-04-02
FCOS: Fully Convolutional One-Stage Object Detection
| Zhi TianChunhua ShenHao ChenTong He
2019-04-02
Feature Intertwiner for Object Detection
| Hongyang LiBo DaiShaoshuai ShiWanli OuyangXiaogang Wang
2019-03-28
ThunderNet: Towards Real-time Generic Object Detection
| Zheng QinZeming LiZhaoning ZhangYiping BaoGang YuYuxing PengJian Sun
2019-03-28
SRM : A Style-based Recalibration Module for Convolutional Neural Networks
| HyunJae LeeHyo-Eun KimHyeonseob Nam
2019-03-26
DetNAS: Backbone Search for Object Detection
| Yukang ChenTong YangXiangyu ZhangGaofeng MengXinyu XiaoJian Sun
2019-03-26
Feature Selective Anchor-Free Module for Single-Shot Object Detection
| Chenchen ZhuYihui HeMarios Savvides
2019-03-02
RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free
| Cheng-Yang FuMykhailo ShvetsAlexander C. Berg
2019-01-10
ELASTIC: Improving CNNs with Dynamic Scaling Policies
| Huiyu WangAniruddha KembhaviAli FarhadiAlan YuilleMohammad Rastegari
2018-12-13
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search
| Bichen WuXiaoliang DaiPeizhao ZhangYanghan WangFei SunYiming WuYuandong TianPeter VajdaYangqing JiaKurt Keutzer
2018-12-09
Pelee: A Real-Time Object Detection System on Mobile Devices
| Jun WangTanner BohnCharles Ling
2018-12-01
Grid R-CNN
| Xin LuBuyu LiYuxin YueQuanquan LiJunjie Yan
2018-11-29
Deformable ConvNets v2: More Deformable, Better Results
| Xizhou ZhuHan HuStephen LinJifeng Dai
2018-11-27
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network
| Qijie ZhaoTao ShengYongtao WangZhi TangYing ChenLing CaiHaibin Ling
2018-11-12
Biologically-plausible learning algorithms can scale to large datasets
| Will XiaoHonglin ChenQianli LiaoTomaso Poggio
2018-11-08
Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization
| James Vuckovic
2018-10-29
PSANet: Point-wise Spatial Attention Network for Scene Parsing
| Hengshuang ZhaoYi ZhangShu LiuJianping ShiChen Change LoyDahua LinJiaya Jia
2018-09-01
Weighted AdaGrad with Unified Momentum
Fangyu ZouLi ShenZequn JieJu SunWei Liu
2018-08-10
Acquisition of Localization Confidence for Accurate Object Detection
| Borui JiangRuixuan LuoJiayuan MaoTete XiaoYuning Jiang
2018-07-30
CBAM: Convolutional Block Attention Module
| Sanghyun WooJongchan ParkJoon-Young LeeIn So Kweon
2018-07-17
Representation Learning with Contrastive Predictive Coding
| Aaron van den OordYazhe LiOriol Vinyals
2018-07-10
Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks
Guangzeng XieYitan WangShuchang ZhouZhihua Zhang
2018-05-17
Exploring the Limits of Weakly Supervised Pretraining
| Dhruv MahajanRoss GirshickVignesh RamanathanKaiming HeManohar PaluriYixuan LiAshwin BharambeLaurens van der Maaten
2018-05-02
DetNet: A Backbone network for Object Detection
| Zeming LiChao PengGang YuXiangyu ZhangYangdong DengJian Sun
2018-04-17
SqueezeNext: Hardware-Aware Neural Network Design
| Amir GholamiKiseok KwonBichen WuZizheng TaiXiangyu YuePeter JinSicheng ZhaoKurt Keutzer
2018-03-23
Path Aggregation Network for Instance Segmentation
| Shu LiuLu QiHaifang QinJianping ShiJiaya Jia
2018-03-05
Regularized Evolution for Image Classifier Architecture Search
| Esteban RealAlok AggarwalYanping HuangQuoc V Le
2018-02-05
Kronecker-factored Curvature Approximations for Recurrent Neural Networks
James MartensJimmy BaMatt Johnson
2018-01-01
Fixing Weight Decay Regularization in Adam
Ilya LoshchilovFrank Hutter
2018-01-01
Deep Extreme Cut: From Extreme Points to Object Segmentation
| Kevis-Kokitsi ManinisSergi CaellesJordi Pont-TusetLuc Van Gool
2017-11-24
Non-local Neural Networks
| Xiaolong WangRoss GirshickAbhinav GuptaKaiming He
2017-11-21
Receptive Field Block Net for Accurate and Fast Object Detection
| Songtao LiuDi HuangYunhong Wang
2017-11-21
Decoupled Weight Decay Regularization
| Ilya LoshchilovFrank Hutter
2017-11-14
Squeeze-and-Excitation Networks
| Jie HuLi ShenSamuel AlbanieGang SunEnhua Wu
2017-09-05
Focal Loss for Dense Object Detection
| Tsung-Yi LinPriya GoyalRoss GirshickKaiming HePiotr Dollár
2017-08-07
Learning Transferable Architectures for Scalable Image Recognition
| Barret ZophVijay VasudevanJonathon ShlensQuoc V. Le
2017-07-21
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
| Chen SunAbhinav ShrivastavaSaurabh SinghAbhinav Gupta
2017-07-10
Rethinking Atrous Convolution for Semantic Image Segmentation
| Liang-Chieh ChenGeorge PapandreouFlorian SchroffHartwig Adam
2017-06-17
Stochastic Gradient Descent as Approximate Bayesian Inference
| Stephan MandtMatthew D. HoffmanDavid M. Blei
2017-04-13
Active Convolution: Learning the Shape of Convolution for Image Classification
| Yunho JeonJunmo Kim
2017-03-27
YOLO9000: Better, Faster, Stronger
| Joseph RedmonAli Farhadi
2016-12-25
Learning to Segment Object Candidates via Recursive Neural Networks
Tianshui ChenLiang LinXian WuNong XiaoXiaonan Luo
2016-12-04
Pyramid Scene Parsing Network
| Hengshuang ZhaoJianping ShiXiaojuan QiXiaogang WangJiaya Jia
2016-12-04
Aggregated Residual Transformations for Deep Neural Networks
| Saining XieRoss GirshickPiotr DollárZhuowen TuKaiming He
2016-11-16
Xception: Deep Learning with Depthwise Separable Convolutions
| François Chollet
2016-10-07
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
| Liang-Chieh ChenGeorge PapandreouIasonas KokkinosKevin MurphyAlan L. Yuille
2016-06-02
FractalNet: Ultra-Deep Neural Networks without Residuals
| Gustav LarssonMichael MaireGregory Shakhnarovich
2016-05-24
Identity Mappings in Deep Residual Networks
| Kaiming HeXiangyu ZhangShaoqing RenJian Sun
2016-03-16
Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units
Wenling ShangKihyuk SohnDiogo AlmeidaHonglak Lee
2016-03-16
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size
| Forrest N. IandolaSong HanMatthew W. MoskewiczKhalid AshrafWilliam J. DallyKurt Keutzer
2016-02-24
Deep Residual Learning for Image Recognition
| Kaiming HeXiangyu ZhangShaoqing RenJian Sun
2015-12-10
SSD: Single Shot MultiBox Detector
| Wei LiuDragomir AnguelovDumitru ErhanChristian SzegedyScott ReedCheng-Yang FuAlexander C. Berg
2015-12-08
Rethinking the Inception Architecture for Computer Vision
| Christian SzegedyVincent VanhouckeSergey IoffeJonathon ShlensZbigniew Wojna
2015-12-02
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
| Vijay BadrinarayananAlex KendallRoberto Cipolla
2015-11-02
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
| Emily DentonSoumith ChintalaArthur SzlamRob Fergus
2015-06-18
You Only Look Once: Unified, Real-Time Object Detection
| Joseph RedmonSantosh DivvalaRoss GirshickAli Farhadi
2015-06-08
Highway Networks
| Rupesh Kumar SrivastavaKlaus GreffJürgen Schmidhuber
2015-05-03
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
| Sergey IoffeChristian Szegedy
2015-02-11
Conditional Random Fields as Recurrent Neural Networks
| Shuai ZhengSadeep JayasumanaBernardino Romera-ParedesVibhav VineetZhizhong SuDalong DuChang HuangPhilip H. S. Torr
2015-02-11
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
| Kaiming HeXiangyu ZhangShaoqing RenJian Sun
2015-02-06
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs
| Liang-Chieh ChenGeorge PapandreouIasonas KokkinosKevin MurphyAlan L. Yuille
2014-12-22
Going Deeper with Convolutions
| Christian SzegedyWei LiuYangqing JiaPierre SermanetScott ReedDragomir AnguelovDumitru ErhanVincent VanhouckeAndrew Rabinovich
2014-09-17
Very Deep Convolutional Networks for Large-Scale Image Recognition
| Karen SimonyanAndrew Zisserman
2014-09-04
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
| Pierre SermanetDavid EigenXiang ZhangMichael MathieuRob FergusYann LeCun
2013-12-21
Visualizing and Understanding Convolutional Networks
| Matthew D ZeilerRob Fergus
2013-11-12
ImageNet Classification with Deep Convolutional Neural Networks
| Alex KrizhevskyIlya SutskeverGeoffrey E. Hinton
2012-12-01

Components

COMPONENT TYPE
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories