Stochastic Gradient Descent

Stochastic Gradient Descent is an iterative optimization technique that uses minibatches of data to form an expectation of the gradient, rather than the full gradient using all available data. That is for weights $w$ and a loss function $L$ we have:

$$ w_{t+1} = w_{t} - \eta\hat{\nabla}_{w}{L(w_{t})} $$

Where $\eta$ is a learning rate. SGD reduces redundancy compared to batch gradient descent - which recomputes gradients for similar examples before each parameter update - so it is usually much faster.

(Image Source: here)

Latest Papers

PAPER DATE
A High Probability Analysis of Adaptive SGD with Momentum
Xiaoyu LiFrancesco Orabona
2020-07-28
Stochastic Normalized Gradient Descent with Momentum for Large Batch Training
Shen-Yi ZhaoYin-Peng XieWu-Jun Li
2020-07-28
Stochastic Gradient Descent applied to Least Squares regularizes in Sobolev spaces
Stefan Steinerberger
2020-07-27
Multi-Level Local SGD for Heterogeneous Hierarchical Networks
Timothy CastigliaAnirban DasStacy Patterson
2020-07-27
CSER: Communication-efficient SGD with Error Reset
Cong XieShuai ZhengOluwasanmi KoyejoIndranil GuptaMu LiHaibin Lin
2020-07-26
How to Democratise and Protect AI: Fair and Differentially Private Decentralised Deep Learning
Lingjuan LyuYitong LiKarthik NandakumarJiangshan YuXingjun Ma
2020-07-18
On regularization of gradient descent, layer imbalance and flat minima
Boris Ginsburg
2020-07-18
Distributed Reinforcement Learning of Targeted Grasping with Active Vision for Mobile Manipulators
Yasuhiro FujitaKota UenishiAvinash UmmadisinguPrabhat NagarajanShimpei MasudaMario Ynocente Castro
2020-07-16
Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons
| Paul MicaelliAmos Storkey
2020-07-15
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent
Bowen WengHuaqing XiongYingbin LiangWei Zhang
2020-07-15
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Valentin De BortoliAlain DurmusXavier FontaineUmut Simsekli
2020-07-13
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Peng JiangGagan Agrawal
2020-07-13
AdaScale SGD: A User-Friendly Algorithm for Distributed Training
Tyler B. JohnsonPulkit AgrawalHaijie GuCarlos Guestrin
2020-07-09
How benign is benign overfitting?
Amartya SanyalPuneet K DokaniaVarun KanadePhilip H. S. Torr
2020-07-08
Streaming Complexity of SVMs
Alexandr AndoniCollin BurnsYi LiSepideh MahabadiDavid P. Woodruff
2020-07-07
Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle
Shaocong MaYi Zhou
2020-07-07
Lossless CNN Channel Pruning via Gradient Resetting and Convolutional Re-parameterization
| Xiaohan DingTianxiang HaoJi LiuJungong HanYuchen GuoGuiguang Ding
2020-07-07
Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification
Yingxue ZhouZhiwei Steven WuArindam Banerjee
2020-07-07
TDprop: Does Jacobi Preconditioning Help Temporal Difference Learning?
Joshua RomoffPeter HendersonDavid KanaaEmmanuel BengioAhmed TouatiPierre-Luc BaconJoelle Pineau
2020-07-06
Weak error analysis for stochastic gradient descent optimization algorithms
Aritz BercherLukas GononArnulf JentzenDiyora Salimova
2020-07-03
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems
Zhan GaoAlec KoppelAlejandro Ribeiro
2020-07-02
Adaptive Braking for Mitigating Gradient Delay
Abhinav VenigallaAtli KossonVitaliy ChileyUrs Köster
2020-07-02
ECPE-2D: Emotion-Cause Pair Extraction based on Joint Two-Dimensional Representation, Interaction and Prediction
| Zixiang DingRui XiaJianfei Yu
2020-07-01
Online Robust Regression via SGD on the l1 loss
Scott PesmeNicolas Flammarion
2020-07-01
AdaSGD: Bridging the gap between SGD and Adam
Jiaxuan WangJenna Wiens
2020-06-30
Adai: Separating the Effects of Adaptive Learning Rate and Momentum Inertia
| Zeke XieXinrui WangHuishuai ZhangIssei SatoMasashi Sugiyama
2020-06-29
Understanding Gradient Clipping in Private SGD: A Geometric Perspective
Xiangyi ChenZhiwei Steven WuMingyi Hong
2020-06-27
On the Generalization Benefit of Noise in Stochastic Gradient Descent
Samuel L. SmithErich ElsenSoham De
2020-06-26
Is SGD a Bayesian sampler? Well, almost
Chris MingardGuillermo Valle-PérezJoar SkalseArd A. Louis
2020-06-26
Stability Enhanced Privacy and Applications in Private Stochastic Gradient Descent
Lauren WatsonBenedek RozemberczkiRik Sarkar
2020-06-25
Private Stochastic Non-Convex Optimization: Adaptive Algorithms and Tighter Generalization Bounds
Yingxue ZhouXiangyi ChenMingyi HongZhiwei Steven WuArindam Banerjee
2020-06-24
Dynamic of Stochastic Gradient Descent with State-Dependent Noise
Qi MengShiqi GongWei ChenZhi-Ming MaTie-Yan Liu
2020-06-24
Spherical Perspective on Learning with Batch Norm
| Simon RoburinYann de Mont-MarinAndrei BursucRenaud MarletPatrick PérezMathieu Aubry
2020-06-23
Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data
Deepesh DataSuhas Diggavi
2020-06-22
Training (Overparametrized) Neural Networks in Near-Linear Time
Jan van den BrandBinghui PengZhao SongOmri Weinstein
2020-06-20
How do SGD hyperparameters in natural training affect adversarial robustness?
Sandesh KamathAmit DeshpandeK V Subrahmanyam
2020-06-20
Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization
Ahmed KhaledOthmane SebbouhNicolas LoizouRobert M. GowerPeter Richtárik
2020-06-20
Differentially Private Variational Autoencoders with Term-wise Gradient Aggregation
Tsubasa TakahashiShun TakagiHajime OnoTatsuya Komatsu
2020-06-19
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems
Panayotis MertikopoulosNadav HallakAli KavisVolkan Cevher
2020-06-19
DEED: A General Quantization Scheme for Communication Efficiency in Bits
Tian YePeijun XiaoRuoyu Sun
2020-06-19
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert M. GowerOthmane SebbouhNicolas Loizou
2020-06-18
Stochastic Gradient Descent in Hilbert Scales: Smoothness, Preconditioning and Earlier Stopping
Nicole MückeEnrico Reiss
2020-06-18
Communication-Efficient Robust Federated Learning Over Heterogeneous Datasets
Yanjie DongGeorgios B. GiannakisTianyi ChenJulian ChengMd. Jahangir HossainVictor C. M. Leung
2020-06-17
Directional Pruning of Deep Neural Networks
| Shih-Kang ChaoZhanyu WangYue XingGuang Cheng
2020-06-16
Hausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks
Umut ŞimşekliOzan SenerGeorge DeligiannidisMurat A. Erdogdu
2020-06-16
Curvature is Key: Sub-Sampled Loss Surfaces and the Implications for Large Batch Training
Diego Granziol
2020-06-16
Flatness is a False Friend
Diego Granziol
2020-06-16
Federated Accelerated Stochastic Gradient Descent
Honglin YuanTengyu Ma
2020-06-16
On sparse connectivity, adversarial robustness, and a novel model of the artificial neuron
Sergey Bochkanov
2020-06-16
Slowing Down the Weight Norm Increase in Momentum-based Optimizers
| Byeongho HeoSanghyuk ChunSeong Joon OhDongyoon HanSangdoo YunYoungjung UhJung-Woo Ha
2020-06-15
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent
Yunwen LeiYiming Ying
2020-06-15
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
| Jeff Z. HaoChenColin WeiJason D. LeeTengyu Ma
2020-06-15
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
Lu YuKrishnakumar BalasubramanianStanislav VolgushevMurat A. Erdogdu
2020-06-14
On the convergence of the Stochastic Heavy Ball Method
Othmane SebbouhRobert M. GowerAaron Defazio
2020-06-14
Differentially Private Decentralized Learning
Shangwei GuoTianwei ZhangTao XiangYang Liu
2020-06-14
The Pitfalls of Simplicity Bias in Neural Networks
Harshay ShahKaustav TamulyAditi RaghunathanPrateek JainPraneeth Netrapalli
2020-06-13
Auditing Differentially Private Machine Learning: How Private is Private SGD?
| Matthew JagielskiJonathan UllmanAlina Oprea
2020-06-13
Adaptive Gradient Methods Can Be Provably Faster than SGD after Finite Epochs
Xunpeng HuangHao ZhouRunxin XuZhe WangLei Li
2020-06-12
A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization
Zhize LiPeter Richtárik
2020-06-12
SGD with shuffling: optimal rates without component convexity and large epoch requirements
Kwangjun AhnChulhee YunSuvrit Sra
2020-06-12
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Raef BassilyVitaly FeldmanCristóbal GuzmánKunal Talwar
2020-06-12
O(1) Communication for Distributed SGD through Two-Level Gradient Averaging
Subhadeep BhattacharyaWeikuan YuFahim Tahmid Chowdhury
2020-06-12
AdaS: Adaptive Scheduling of Stochastic Gradients
| Mahdi S. HosseiniKonstantinos N. Plataniotis
2020-06-11
STL-SGD: Speeding Up Local SGD with Stagewise Communication Period
Shuheng ShenYifei ChengJingchang LiuLinli Xu
2020-06-11
Multiplicative noise and heavy tails in stochastic optimization
Liam HodgkinsonMichael W. Mahoney
2020-06-11
Borrowing From the Future: Addressing Double Sampling in Model-free Control
Yuhua ZhuZach IzzoLexing Ying
2020-06-11
Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)
Sharan VaswaniFrederik KunstnerIssam LaradjiSi Yi MengMark SchmidtSimon Lacoste-Julien
2020-06-11
Non-Convex SGD Learns Halfspaces with Adversarial Label Noise
Ilias DiakonikolasVasilis KontonisChristos TzamosNikos Zarifis
2020-06-11
Random Reshuffling: Simple Analysis with Vast Improvements
Konstantin MishchenkoAhmed KhaledPeter Richtárik
2020-06-10
Sketchy Empirical Natural Gradient Methods for Deep Learning
Minghan YangDong XuYongfeng LiZaiwen WenMengyun Chen
2020-06-10
Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification
Francesca MignaccoFlorent KrzakalaPierfrancesco UrbaniLenka Zdeborová
2020-06-10
The Heavy-Tail Phenomenon in SGD
Mert GurbuzbalabanUmut SimsekliLingjiong Zhu
2020-06-08
Minibatch vs Local SGD for Heterogeneous Distributed Learning
Blake WoodworthKumar Kshitij PatelNathan Srebro
2020-06-08
Stochastic Optimization with Non-stationary Noise
Jingzhao ZhangHongzhou LinSubhro DasSuvrit SraAli Jadbabaie
2020-06-08
The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization
W. TaoZ. PanG. WuQ. Tao
2020-06-08
An Efficient Algorithm For Generalized Linear Bandit: Online Stochastic Gradient Descent and Thompson Sampling
Qin DingCho-Jui HsiehJames Sharpnack
2020-06-07
Scaling Distributed Training with Adaptive Summation
Saeed MalekiMadan MusuvathiTodd MytkowiczOlli SaarikiviTianju XuVadim EksarevskiyJaliya EkanayakeEmad Barsoum
2020-06-04
Towards Asymptotic Optimality with Conditioned Stochastic Gradient Descent
Rémi LelucFrançois Portier
2020-06-04
Bayesian Neural Network via Stochastic Gradient Descent
Abhinav Sagar
2020-06-04
A Primal-Dual SGD Algorithm for Distributed Nonconvex Optimization
Xinlei YiShengjun ZhangTao YangTianyou ChaiKarl H. Johansson
2020-06-04
On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs
Matilde GargianiAndrea ZanelliMoritz DiehlFrank Hutter
2020-06-03
Local SGD With a Communication Overhead Depending Only on the Number of Workers
Artin SpiridonoffAlex OlshevskyIoannis Ch. Paschalidis
2020-06-03
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
| Zhewei YaoAmir GholamiSheng ShenKurt KeutzerMichael W. Mahoney
2020-06-01
Auto-Tuning Structured Light by Optical Stochastic Gradient Descent
Wenzheng Chen Parsa Mirdehghan Sanja Fidler Kiriakos N. Kutulakos
2020-06-01
Augment Your Batch: Improving Generalization Through Instance Repetition
Elad Hoffer Tal Ben-Nun Itay Hubara Niv Giladi Torsten Hoefler Daniel Soudry
2020-06-01
DaSGD: Squeezing SGD Parallelization Performance in Distributed Training Using Delayed Averaging
Qinggang ZhouYawen ZhangPengcheng LiXiaoyong LiuJun YangRunsheng WangRu Huang
2020-05-31
Inherent Noise in Gradient Based Methods
Arushi Gupta
2020-05-26
Microphone Array Based Surveillance Audio Classification
Dimitri Leandro de Oliveira SilvaTito SpadiniRicardo Suyama
2020-05-22
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping
Eduard GorbunovMarina DanilovaAlexander Gasnikov
2020-05-21
rTop-k: A Statistical Estimation Approach to Distributed SGD
Leighton Pate BarnesHuseyin A. InanBerivan IsikAyfer Ozgur
2020-05-21
Accelerated Convergence for Counterfactual Learning to Rank
| Rolf JagermanMaarten de Rijke
2020-05-21
Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data
Deepesh DataSuhas Diggavi
2020-05-16
Learning the gravitational force law and other analytic functions
Atish AgarwalaAbhimanyu DasRina PanigrahyQiuyi Zhang
2020-05-15
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training
| Yemao XuDezun DongWeixia XuXiangke Liao
2020-05-14
SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization
Navjot SinghDeepesh DataJemin GeorgeSuhas Diggavi
2020-05-13
Convergence of Online Adaptive and Recurrent Optimization Algorithms
Pierre-Yves MasséYann Ollivier
2020-05-12
RSO: A Gradient Free Sampling Based Approach For Training Deep Neural Networks
Rohun TripathiBharat Singh
2020-05-12
Geoopt: Riemannian Optimization in PyTorch
| Max KochurovRasul KarimovSerge Kozlukov
2020-05-06
Riemannian Stochastic Proximal Gradient Methods for Nonsmooth Optimization over the Stiefel Manifold
Bokun WangShiqian MaLingzhou Xue
2020-05-03
Adaptive Learning of the Optimal Mini-Batch Size of SGD
Motasem AlfarraSlavomir HanzelyAlyazeed AlbasyoniBernard GhanemPeter Richtarik
2020-05-03
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Shigang LiTal Ben-NunGiorgi NadiradzeSalvatore Di GirolamoNikoli DrydenDan AlistarhTorsten Hoefler
2020-04-30
Learning Polynomials of Few Relevant Dimensions
Sitan ChenRaghu Meka
2020-04-28
The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent
Xin QianDiego Klabjan
2020-04-27
Federated Learning with Only Positive Labels
Felix X. YuAnkit Singh RawatAditya Krishna MenonSanjiv Kumar
2020-04-21
Stochastic gradient algorithms from ODE splitting perspective
Daniil MerkulovIvan Oseledets
2020-04-19
On Tight Convergence Rates of Without-replacement SGD
Kwangjun AhnSuvrit Sra
2020-04-18
Understanding the Difficulty of Training Transformers
| Liyuan LiuXiaodong LiuJianfeng GaoWeizhu ChenJiawei Han
2020-04-17
On Learning Rates and Schrödinger Operators
Bin ShiWeijie J. SuMichael I. Jordan
2020-04-15
Exploit Where Optimizer Explores via Residuals
An XuZhouyuan HuoHeng Huang
2020-04-11
Continuous and Discrete-Time Analysis of Stochastic Gradient Descent for Convex and Non-Convex Functions
Xavier FontaineValentin De BortoliAlain Durmus
2020-04-08
Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions
Vivak Patel
2020-04-01
A Simple Class Decision Balancing for Incremental Learning
Hongjoon AhnTaesup Moon
2020-03-31
Stochastic Proximal Gradient Algorithm with Minibatches. Application to Large Scale Learning Models
Andrei PatrascuCiprian PaduraruPaul Irofti
2020-03-30
On Infinite-Width Hypernetworks
Etai LittwinTomer GalantiLior WolfGreg Yang
2020-03-27
A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate
Naeimeh OmidvarMohammad Ali Maddah-AliHamed Mahdavi
2020-03-27
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training
Namhoon LeeThalaiyasingam AjanthanPhilip H. S. TorrMartin Jaggi
2020-03-25
Pipelined Backpropagation at Scale: Training Large Models without Batches
Atli KossonVitaliy ChileyAbhinav VenigallaJoel HestnessUrs Köster
2020-03-25
Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence
Abhishek GuptaWilliam B. Haskell
2020-03-25
Finite-Time Analysis of Stochastic Gradient Descent under Markov Randomness
Thinh T. DoanLam M. NguyenNhan H. PhamJustin Romberg
2020-03-24
FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection
Ruixuan LiuYang CaoMasatoshi YoshikawaHong Chen
2020-03-24
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Anastasia KoloskovaNicolas LoizouSadra BoreiriMartin JaggiSebastian U. Stich
2020-03-23
A classification for the performance of online SGD for high-dimensional inference
Gerard Ben ArousReza GheissariAukosh Jagannath
2020-03-23
Slow and Stale Gradients Can Win the Race
Sanghamitra DuttaJianyu WangGauri Joshi
2020-03-23
Distributed Gradient Methods for Nonconvex Optimization: Local and Global Convergence Guarantees
Brian SwensonSoummya KarH. Vincent PoorJosé M. F. MouraAaron Jaech
2020-03-23
NeCPD: An Online Tensor Decomposition with Optimal Stochastic Gradient Descent
Ali AnaissiBasem SuleimanSeid Miad Zandavi
2020-03-18
Weak and Strong Gradient Directions: Explaining Memorization, Generalization, and Hardness of Examples at Scale
Piotr ZielinskiShankar KrishnanSatrajit Chatterjee
2020-03-16
Investigating Generalization in Neural Networks under Optimally Evolved Training Perturbations
| Subhajit ChaudhuryToshihiko Yamasaki
2020-03-14
Can Implicit Bias Explain Generalization? Stochastic Convex Optimization as a Case Study
Assaf DauberMeir FederTomer KorenRoi Livni
2020-03-13
Machine Learning on Volatile Instances
Xiaoxi ZhangJianyu WangGauri JoshiCarlee Joe-Wong
2020-03-12
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth
Yiping LuChao MaYulong LuJianfeng LuLexing Ying
2020-03-11
ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training
Qinqing ZhengBor-Yiing SuJiyan YangAlisson AzzoliniQiang WuOu JinShri KarandikarHagay LupeskoLiang XiongEric Zhou
2020-03-07
On the Convergence of Adam and Adagrad
Alexandre DéfossezLéon BottouFrancis BachNicolas Usunier
2020-03-05
Distributed Stochastic Gradient Descent and Convergence to Local Minima
Brian SwensonRyan MurraySoummya KarH. Vincent Poor
2020-03-05
Neural Kernels Without Tangents
Vaishaal ShankarAlex FangWenshuo GuoSara Fridovich-KeilLudwig SchmidtJonathan Ragan-KelleyBenjamin Recht
2020-03-04
BASGD: Buffered Asynchronous SGD for Byzantine Learning
Yi-Rui YangWu-Jun Li
2020-03-02
On the Global Convergence of Training Deep Linear ResNets
Difan ZouPhilip M. LongQuanquan Gu
2020-03-02
Gadam: Combining Adaptivity with Iterate Averaging Gives Greater Generalisation
| Diego GranziolXingchen WanStephen Roberts
2020-03-02
Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning
Chaoyue LiuLibin ZhuMikhail Belkin
2020-02-29
Do optimization methods in deep learning applications matter?
Buse Melis OzyildirimMariam Kiran
2020-02-28
Distributed Momentum for Byzantine-resilient Learning
| El-Mahdi El-MhamdiRachid GuerraouiSébastien Rouault
2020-02-28
Decentralized Federated Learning via SGD over Wireless D2D Networks
Hong XingOsvaldo SimeoneSuzhi Bi
2020-02-28
Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods
| Daniel Y. FuMayee F. ChenFrederic SalaSarah M. HooperKayvon FatahalianChristopher Ré
2020-02-27
On Biased Compression for Distributed Learning
Aleksandr BeznosikovSamuel HorváthPeter RichtárikMher Safaryan
2020-02-27
Stagewise Enlargement of Batch Size for SGD-based Learning
Shen-Yi ZhaoYin-Peng XieWu-Jun Li
2020-02-26
Non-Asymptotic Bounds for Zeroth-Order Stochastic Optimization
Nirav BhavsarPrashanth L A
2020-02-26
Moniqua: Modulo Quantized Communication in Decentralized SGD
Yucheng LuChristopher De Sa
2020-02-26
LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning
Tianyi ChenYuejiao SunWotao Yin
2020-02-26
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers
Serge Kas HannaRawad BitarParimal ParagVenkat DasariSalim El Rouayheb
2020-02-25
Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees
Richeng JinYufan HuangXiaofan HeTianfu WuHuaiyu Dai
2020-02-25
Closing the convergence gap of SGD without replacement
Shashank RajputAnant GuptaDimitris Papailiopoulos
2020-02-24
Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent
| Bao WangTan M. NguyenAndrea L. BertozziRichard G. BaraniukStanley J. Osher
2020-02-24
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
| Nicolas LoizouSharan VaswaniIssam LaradjiSimon Lacoste-Julien
2020-02-24
HYDRA: Pruning Adversarially Robust Neural Networks
| Vikash SehwagShiqi WangPrateek MittalSuman Jana
2020-02-24
Improve SGD Training via Aligning Mini-batches
Xiangrui LiDeng PanXin LiDongxiao Zhu
2020-02-23
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw JastrzebskiMaciej SzymczakStanislav FortDevansh ArpitJacek TaborKyunghyun ChoKrzysztof Geras
2020-02-21
Learning to Continually Learn
| Shawn BeaulieuLapo FratiThomas MiconiJoel LehmanKenneth O. StanleyJeff CluneNick Cheney
2020-02-21
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD
| Jianyu WangHao LiangGauri Joshi
2020-02-21
Bounding the expected run-time of nonconvex optimization with early stopping
Thomas FlynnKwang Min YuAbid MalikNicolas D'ImperioShinjae Yoo
2020-02-20
Embedding Graph Auto-Encoder with Joint Clustering via Adjacency Sharing
Xuelong LiHongyuan ZhangRui Zhang
2020-02-20
Stochastic Runge-Kutta methods and adaptive SGD-G2 stochastic gradient descent
Imen AyadiGabriel Turinici
2020-02-20
Distributed Optimization over Block-Cyclic Data
Yucheng DingChaoyue NiuYikai YanZhenzhe ZhengFan WuGuihai ChenShaojie TangRongfei Jia
2020-02-18
Is Local SGD Better than Minibatch SGD?
Blake WoodworthKumar Kshitij PatelSebastian U. StichZhen DaiBrian BullinsH. Brendan McMahanOhad ShamirNathan Srebro
2020-02-18
Learning Halfspaces with Massart Noise Under Structured Distributions
Ilias DiakonikolasVasilis KontonisChristos TzamosNikos Zarifis
2020-02-13
Fast Convergence for Langevin Diffusion with Matrix Manifold Structure
Ankur MoitraAndrej Risteski
2020-02-13
A Fully Online Approach for Covariance Matrices Estimation of Stochastic Gradient Descent Solutions
Wanrong ZhuXi ChenWei Biao Wu
2020-02-10
Semi-Implicit Back Propagation
Ren LiuXiaoqun Zhang
2020-02-10
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke XieIssei SatoMasashi Sugiyama
2020-02-10
Federated Learning of a Mixture of Global and Local Models
Filip HanzelyPeter Richtárik
2020-02-10
On the distance between two neural networks and the stability of learning
| Jeremy BernsteinArash VahdatYisong YueMing-Yu Liu
2020-02-09
Better Theory for SGD in the Nonconvex World
Ahmed KhaledPeter Richtárik
2020-02-09
Momentum Improves Normalized SGD
Ashok CutkoskyHarsh Mehta
2020-02-09
How Good is the Bayes Posterior in Deep Neural Networks Really?
Florian WenzelKevin RothBastiaan S. VeelingJakub ŚwiątkowskiLinh TranStephan MandtJasper SnoekTim SalimansRodolphe JenattonSebastian Nowozin
2020-02-06
A mean-field theory of lazy training in two-layer neural nets: entropic regularization and controlled McKean-Vlasov dynamics
Belinda TzenMaxim Raginsky
2020-02-05
Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker
Pavel GulyaevEugenia ElistratovaVasily KonovalovYuri KuratovLeonid PugachevMikhail Burtsev
2020-02-05
Improving Efficiency in Large-Scale Decentralized Distributed Training
Wei ZhangXiaodong CuiAbdullah KayiMingrui LiuUlrich FinklerBrian KingsburyGeorge SaonYoussef MrouehAlper BuyuktosunogluPayel DasDavid KungMichael Picheny
2020-02-04
Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform
Jun LiLi FuxinSinisa Todorovic
2020-02-04
On the Convergence of Stochastic Gradient Descent with Low-Rank Projections for Convex Low-Rank Matrix Problems
Dan Garber
2020-01-31
How Does BN Increase Collapsed Neural Network Filters?
Sheng ZhouXinjiang WangPing LuoLitong FengWenjie LiWei Zhang
2020-01-30
Variance Reduction with Sparse Gradients
Melih ElibolLihua LeiMichael I. Jordan
2020-01-27
On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation
Nicolas BrosseCarlos RiquelmeAlice MartinSylvain GellyÉric Moulines
2020-01-22
Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning
Haozhao WangZhihao QuSong GuoXin GaoRuixuan LiBaoliu Ye
2020-01-22
Stochastic Item Descent Method for Large Scale Equal Circle Packing Problem
Kun HeMin ZhangJianrong ZhouYan JinChu-min Li
2020-01-22
Harmonic Convolutional Networks based on Discrete Cosine Transform
| Matej UlicnyVladimir A. KrylovRozenn Dahyot
2020-01-18
Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent
Giorgi NadiradzeIlia MarkovBapi ChatterjeeVyacheslav KungurtsevDan Alistarh
2020-01-16
Backward Feature Correction: How Deep Learning Performs Deep Learning
Zeyuan Allen-ZhuYuanzhi Li
2020-01-13
Choosing the Sample with Lowest Loss makes SGD Robust
Vatsal ShahXiaoxia WuSujay Sanghavi
2020-01-10
Distributionally Robust Deep Learning using Hardness Weighted Sampling
Lucas FidonSebastien OurselinTom Vercauteren
2020-01-08
Poly-time universality and limitations of deep learning
Emmanuel AbbeColin Sandon
2020-01-07
How neural networks find generalizable solutions: Self-tuned annealing in deep learning
Yu FengYuhai Tu
2020-01-06
AN EXPONENTIAL LEARNING RATE SCHEDULE FOR BATCH NORMALIZED NETWORKS
Anonymous
2020-01-01
A Non-asymptotic comparison of SVRG and SGD: tradeoffs between compute and speed
Anonymous
2020-01-01
PopSGD: Decentralized Stochastic Gradient Descent in the Population Model
Anonymous
2020-01-01
Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization
Anonymous
2020-01-01
Mode Connectivity and Sparse Neural Networks
Anonymous
2020-01-01
P-BN: Towards Effective Batch Normalization in the Path Space
Anonymous
2020-01-01
Step Size Optimization
Anonymous
2020-01-01
SoftAdam: Unifying SGD and Adam for better stochastic gradient descent
Anonymous
2020-01-01
On the expected running time of nonconvex optimization with early stopping
Anonymous
2020-01-01
Dynamic Instance Hardness
Anonymous
2020-01-01
Acutum: When Generalization Meets Adaptability
Anonymous
2020-01-01
Amortized Nesterov's Momentum: Robust and Lightweight Momentum for Deep Learning
Anonymous
2020-01-01
The Break-Even Point on the Optimization Trajectories of Deep Neural Networks
Anonymous
2020-01-01
DP-LSSGD: An Optimization Method to Lift the Utility in Privacy-Preserving ERM
Anonymous
2020-01-01
Escaping Saddle Points Faster with Stochastic Momentum
Anonymous
2020-01-01
Prune or quantize? Strategy for Pareto-optimally low-cost and accurate CNN
Anonymous
2020-01-01
SloMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Anonymous
2020-01-01
Training Deep Neural Networks with Partially Adaptive Momentum
Anonymous
2020-01-01
A Mean-Field Theory for Kernel Alignment with Random Features in Generative Adverserial Networks
Anonymous
2020-01-01
Gap-Aware Mitigation of Gradient Staleness
Anonymous
2020-01-01
Low Rank Training of Deep Neural Networks for Emerging Memory Technology
Anonymous
2020-01-01
Towards understanding the true loss surface of deep neural networks using random matrix theory and iterative spectral methods
Diego GranziolTimur GaripovDmitry VetrovStefan ZohrenStephen RobertsAndrew Gordon Wilson
2020-01-01
Training Deep Networks with Stochastic Gradient Normalized by Layerwise Adaptive Second Moments
Boris GinsburgPatrice CastonguayOleksii HrinchukOleksii KuchaievVitaly LavrukhinRyan LearyJason LiHuyen NguyenYang ZhangJonathan M. Cohen
2020-01-01
Accelerating First-Order Optimization Algorithms
Anonymous
2020-01-01
PowerSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization
Anonymous
2020-01-01
On the Tunability of Optimizers in Deep Learning
Anonymous
2020-01-01
AdaScale SGD: A Scale-Invariant Algorithm for Distributed Training
Anonymous
2020-01-01
A Dynamic Sampling Adaptive-SGD Method for Machine Learning
Achraf BahamouDonald Goldfarb
2019-12-31
Variance Reduced Local SGD with Lower Communication Complexity
| Xianfeng LiangShuheng ShenJingchang LiuZhen PanEnhong ChenYifei Cheng
2019-12-30
Federated Variance-Reduced Stochastic Gradient Descent with Robustness to Byzantine Attacks
| Zhaoxian WuQing LingTianyi ChenGeorgios B. Giannakis
2019-12-29
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
| Konpat PreechakulBoonserm Kijsirikul
2019-12-24
Second-order Information in First-order Optimization Methods
| Yuzheng HuLicong LinShange Tang
2019-12-20
Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks
Alexander ShevchenkoMarco Mondelli
2019-12-20
Optimization for deep learning: theory and algorithms
Ruoyu Sun
2019-12-19
Gradient-based training of Gaussian Mixture Models in High-Dimensional Spaces
Alexander GepperthBenedikt Pfülb
2019-12-18
Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data
Felipe Petroski SuchAditya RawalJoel LehmanKenneth O. StanleyJeff Clune
2019-12-17
Parallel Restarted SPIDER -- Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity
Pranay SharmaPrashant KhanduriSaikiran BulusuKetan RajawatPramod K. Varshney
2019-12-12
SiamMan: Siamese Motion-aware Network for Visual Tracking
Wenzhang ZhouLongyin WenLibo ZhangDawei DuTiejian LuoYanjun Wu
2019-12-11
Linear Mode Connectivity and the Lottery Ticket Hypothesis
| Jonathan FrankleGintare Karolina DziugaiteDaniel M. RoyMichael Carbin
2019-12-11
Why ADAM Beats SGD for Attention Models
Jingzhao ZhangSai Praneeth KarimireddyAndreas VeitSeungyeon KimSashank J ReddiSanjiv KumarSuvrit Sra
2019-12-06
An Empirical Study on the Intrinsic Privacy of SGD
Stephanie L. HylandShruti Tople
2019-12-05
Domain-independent Dominance of Adaptive Methods
| Pedro SavareseDavid McAllesterSudarshan BabuMichael Maire
2019-12-04
Stochastic Variational Inference via Upper Bound
Chunlin JiHaige Shen
2019-12-02
Efficient Meta Learning via Minibatch Proximal Update
Pan ZhouXiaotong YuanHuan XuShuicheng YanJiashi Feng
2019-12-01
Optimal Sparsity-Sensitive Bounds for Distributed Mean Estimation
| Zengfeng HuangZiyue HuangYilei WangKe Yi
2019-12-01
Fast and Accurate Stochastic Gradient Estimation
| Beidi ChenYingchen XuAnshumali Shrivastava
2019-12-01
Communication trade-offs for Local-SGD with large step size
| Aymeric DieuleveutKumar Kshitij Patel
2019-12-01
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations
| Debraj BasuDeepesh DataCan KarakusSuhas Diggavi
2019-12-01
Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence
Fengxiang HeTongliang LiuDacheng Tao
2019-12-01
A Hybrid Approach Towards Two Stage Bengali Question Classification Utilizing Smart Data Balancing Technique
Md. Hasibur RahmanChowdhury Rafeed RahmanRuhul AminMd. Habibur Rahman SifatAfra Anika
2019-11-30
On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks
Umut ŞimşekliMert GürbüzbalabanThanh Huy NguyenGaël RichardLevent Sagun
2019-11-29
Non-parametric Uni-modality Constraints for Deep Ordinal Classification
| Soufiane BelharbiIsmail Ben AyedLuke McCaffreyEric Granger
2019-11-25
Neural Networks Learning and Memorization with (almost) no Over-Parameterization
Amit Daniely
2019-11-22
Parameter-Free Locally Differentially Private Stochastic Subgradient Descent
Kwang-Sung JunFrancesco Orabona
2019-11-21
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates
Cong XieOluwasanmi KoyejoIndranil GuptaHaibin Lin
2019-11-20
Bayesian interpretation of SGD as Ito process
Soma YokoiIssei Sato
2019-11-20
Optimal Mini-Batch Size Selection for Fast Gradient Descent
Michael P. PerroneHaidar KhanChanghoan KimAnastasios KyrillidisJerry QuinnValentina Salapura
2019-11-15
Throughput Prediction of Asynchronous SGD in TensorFlow
Zhuojin LiWumo YanMarco PaolieriLeana Golubchik
2019-11-12
MindTheStep-AsyncPSGD: Adaptive Asynchronous Parallel Stochastic Gradient Descent
Karl BäckströmMarina PapatriantafilouPhilippas Tsigas
2019-11-08
A Rule for Gradient Estimator Selection, with an Application to Variational Inference
Tomas GeffnerJustin Domke
2019-11-05
Reinforced Product Metadata Selection for Helpfulness Assessment of Customer Reviews
Miao FanChao FengMingming SunPing Li
2019-11-01
On the Convergence of Local Descent Methods in Federated Learning
Farzin HaddadpourMehrdad Mahdavi
2019-10-31
Mixing of Stochastic Accelerated Gradient Descent
Peiyuan ZhangHadi DaneshmandThomas Hofmann
2019-10-31
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization
| Farzin HaddadpourMohammad Mahdi KamaniMehrdad MahdaviViveck R. Cadambe
2019-10-30
Lsh-sampling Breaks the Computation Chicken-and-egg Loop in Adaptive Stochastic Gradient Estimation
Beidi ChenYingchen XuAnshumali Shrivastava
2019-10-30
Online Stochastic Gradient Descent with Arbitrary Initialization Solves Non-smooth, Non-convex Phase Retrieval
Yan Shuo TanRoman Vershynin
2019-10-28
SwarmSGD: Scalable Decentralized SGD with Local Updates
Giorgi NadiradzeAmirmojtaba SabourDan AlistarhAditya SharmaIlia MarkovVitaly Aksenov
2019-10-27
Sound Event Recognition in a Smart City Surveillance Context
Tito SpadiniDimitri Leandro de Oliveira SilvaRicardo Suyama
2019-10-27
A geometric interpretation of stochastic gradient descent using diffusion metrics
R. FioresiP. ChaudhariS. Soatto
2019-10-27
Bias-Variance Tradeoff in a Sliding Window Implementation of the Stochastic Gradient Algorithm
Yakup Ceki Papo
2019-10-25
The Practicality of Stochastic Optimization in Imaging Inverse Problems
Junqi TangKaren EgiazarianMohammad GolbabaeeMike Davies
2019-10-22
Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD
Rosa CandelaGiulio FranzeseMaurizio FilipponePietro Michiardi
2019-10-21
Communication-Efficient Local Decentralized SGD Methods
Xiang LiWenhao YangShusen WangZhihua Zhang
2019-10-21
Error Lower Bounds of Constant Step-size Stochastic Gradient Descent
Zhiyan DingYiding ChenQin LiXiaojin Zhu
2019-10-18
Interpreting Basis Path Set in Neural Networks
Juanping ZhuQi MengWei ChenZhi-ming Ma
2019-10-18
Improving the convergence of SGD via adaptive batch sizes
Scott Sievert
2019-10-18
Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic
Matteo SordelloHangfeng HeWeijie Su
2019-10-18
Why bigger is not always better: on finite and infinite neural networks
Laurence Aitchison
2019-10-17
DP-MAC: The Differentially Private Method of Auxiliary Coordinates for Deep Learning
| Frederik HarderJonas KöhlerMax WellingMijung Park
2019-10-15
Derivative-Free Optimization of Neural Networks using Local Search
| Ahmed AlyGianluca GuadagniJoanne Bechta Dugan
2019-10-15
Decaying momentum helps neural network training
| John ChenCameron WolfeZhao LiAnastasios Kyrillidis
2019-10-11
The Complexity of Finding Stationary Points with Stochastic Gradient Descent
Yoel DroriOhad Shamir
2019-10-04
Distributed Learning of Deep Neural Networks using Independent Subnet Training
| Binhang YuanAnastasios KyrillidisChristopher M. Jermaine
2019-10-04
Accelerating Deep Learning by Focusing on the Biggest Losers
| Angela H. JiangDaniel L. -K. WongGiulio ZhouDavid G. AndersenJeffrey DeanGregory R. GangerGauri JoshiMichael KaminksyMichael KozuchZachary C. LiptonPadmanabhan Pillai
2019-10-02
How noise affects the Hessian spectrum in overparameterized neural networks
Mingwei WeiDavid J Schwab
2019-10-01
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Jianyu WangVinayak TantiaNicolas BallasMichael Rabbat
2019-10-01
Small Steps and Giant Leaps: Minimal Newton Solvers for Deep Learning
Joao F. Henriques Sebastien Ehrhardt Samuel Albanie Andrea Vedaldi
2019-10-01
Adaptive Activation Thresholding: Dynamic Routing Type Behavior for Interpretability in Convolutional Neural Networks
Yiyou Sun Sathya N. Ravi Vikas Singh
2019-10-01
Distributed SGD Generalizes Well Under Asynchrony
Jayanth RegattiGaurav TendolkarYi ZhouAbhishek GuptaYingbin Liang
2019-09-29
Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization
Ruyi JiLongyin WenLibo ZhangDawei DuYanjun WuChen ZhaoXianglong LiuFeiyue Huang
2019-09-25
A Mean-Field Theory for Kernel Alignment with Random Features in Generative and Discriminative Models
Masoud Badiei KhuzaniLiyue ShenShahin ShahrampourLei Xing
2019-09-25
Gap Aware Mitigation of Gradient Staleness
Saar BarkaiIdo HakimiAssaf Schuster
2019-09-24
Algorithm for Training Neural Networks on Resistive Device Arrays
Tayfun GokmenWilfried Haensch
2019-09-17
Finite Depth and Width Corrections to the Neural Tangent Kernel
Boris HaninMihai Nica
2019-09-13
diffGrad: An Optimization Method for Convolutional Neural Networks
| Shiv Ram DubeySoumendu ChakrabortySwalpa Kumar RoySnehasis MukherjeeSatish Kumar SinghBidyut Baran Chaudhuri
2019-09-12
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
Sebastian U. StichSai Praneeth Karimireddy
2019-09-11
Byzantine-Resilient Stochastic Gradient Descent for Distributed Learning: A Lipschitz-Inspired Coordinate-wise Median Approach
Haibo YangXin ZhangMinghong FangJia Liu
2019-09-10
Tighter Theory for Local SGD on Identical and Heterogeneous Data
Ahmed KhaledKonstantin MishchenkoPeter Richtárik
2019-09-10
A Stochastic Quasi-Newton Method with Nesterov's Accelerated Gradient
S. IndrapriyadarsiniShahrzad MahboubiHiroshi NinomiyaHideki Asai
2019-09-09
Communication-Censored Distributed Stochastic Gradient Descent
Weiyu LiTianyi ChenLiping LiZhaoxian WuQing Ling
2019-09-09
Distributed Training of Embeddings using Graph Analytics
Gurbinder GillRoshan DathathriSaeed MalekiMadan MusuvathiTodd MytkowiczOlli Saarikivi
2019-09-08
FreeAnchor: Learning to Match Anchors for Visual Object Detection
| Xiaosong ZhangFang WanChang LiuRongrong JiQixiang Ye
2019-09-05
Quasi-Newton Optimization Methods For Deep Learning Applications
Jacob RafatiRoummel F. Marcia
2019-09-04
A Concert-planning Tool for Independent Musicians by Machine Learning Models
Xiaohan YangQingyin Ge
2019-08-29
Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent
| Tomer LancewickiSelcuk Kopru
2019-08-20
AdaCliP: Adaptive Clipping for Private SGD
Venkatadheeraj PichapatiAnanda Theertha SureshFelix X. YuSashank J. ReddiSanjiv Kumar
2019-08-20
Multi Target Tracking by Learning from Generalized Graph Differences
Håkan ArdöMikael Nilsson
2019-08-19
Towards Better Generalization: BP-SVRG in Training Deep Neural Networks
Hao JinDachao LinZhihua Zhang
2019-08-18
NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization
| Ali Ramezani-KebryaFartash FaghriDaniel M. Roy
2019-08-16
On the Convergence of AdaBound and its Connection to SGD
Pedro Savarese
2019-08-13
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
Shigang LiTal Ben-NunSalvatore Di GirolamoDan AlistarhTorsten Hoefler
2019-08-12
The HSIC Bottleneck: Deep Learning without Back-Propagation
| Wan-Duo Kurt MaJ. P. LewisW. Bastiaan Kleijn
2019-08-05
How Good is SGD with Random Shuffling?
Itay SafranOhad Shamir
2019-07-31
Deep Gradient Boosting -- Layer-wise Input Normalization of Neural Networks
Erhan Bilal
2019-07-29
DEAM: Adaptive Momentum with Discriminative Weight for Stochastic Optimization
Jiyang BaiYuxiang RenJiawei Zhang
2019-07-25
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Xinyan LiQilong GuYingxue ZhouTiancong ChenArindam Banerjee
2019-07-24
Mix and Match: An Optimistic Tree-Search Approach for Learning Models from Mixture Distributions
| Matthew FawRajat SenKarthikeyan ShanmugamConstantine CaramanisSanjay Shakkottai
2019-07-23
Speeding Up Iterative Closest Point Using Stochastic Gradient Descent
Fahira Afzal MakenFabio RamosLionel Ott
2019-07-22
Practical Newton-Type Distributed Learning using Gradient Based Approximations
Samira Sheikhi
2019-07-22
Post-synaptic potential regularization has potential
Enzo TartaglioneDaniele PerloMarco Grangetto
2019-07-19
SGD momentum optimizer with step estimation by online parabola model
Jarek Duda
2019-07-16
Amplifying Rényi Differential Privacy via Shuffling
Eloïse BerthierSai Praneeth Karimireddy
2019-07-11
A Highly Efficient Distributed Deep Learning System For Automatic Speech Recognition
Wei ZhangXiaodong CuiUlrich FinklerGeorge SaonAbdullah KayiAlper BuyuktosunogluBrian KingsburyDavid KungMichael Picheny
2019-07-10
Unified Optimal Analysis of the (Stochastic) Gradient Method
Sebastian U. Stich
2019-07-09
Ordered SGD: A New Stochastic Optimization Framework for Empirical Risk Minimization
| Kenji KawaguchiHaihao Lu
2019-07-09
Stochastic Gradient and Langevin Processes
Xiang ChengDong YinPeter L. BartlettMichael I. Jordan
2019-07-07
Next Generation Radiogenomics Sequencing for Prediction of EGFR and KRAS Mutation Status in NSCLC Patients Using Multimodal Imaging and Machine Learning Approaches
Isaac ShiriHassan MalekiGhasem HajianfarHamid AbdollahiSaeed AshrafiniaMathieu HattMehrdad OveisiArman Rahmim
2019-07-03
On Symmetry and Initialization for Neural Networks
Ido NachumAmir Yehudayoff
2019-07-01
Approximate matrix completion based on cavity method
Chihiro NoguchiYoshiyuki Kabashima
2019-06-29
DP-LSSGD: A Stochastic Optimization Method to Lift the Utility in Privacy-Preserving ERM
Bao WangQuanquan GuMarch BoedihardjoFarzin BarekatStanley J. Osher
2019-06-28
Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent
Shuheng ShenLinli XuJingchang LiuXianfeng LiangYifei Cheng
2019-06-28
Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD
Kosuke HarukiTaiji SuzukiYohei HamakawaTakeshi TodaRyuji SakaiMasahiro OzawaMitsuhiro Kimura
2019-06-26
Learning Data Augmentation Strategies for Object Detection
| Barret ZophEkin D. CubukGolnaz GhiasiTsung-Yi LinJonathon ShlensQuoc V. Le
2019-06-26
First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise
Thanh Huy NguyenUmut ŞimşekliMert GürbüzbalabanGaël Richard
2019-06-21
Data Cleansing for Models Trained with SGD
| Satoshi HaraAtsushi NitandaTakanori Maehara
2019-06-20
On the Noisy Gradient Descent that Generalizes as SGD
Jingfeng WuWenqing HuHaoyi XiongJun HuanVladimir BravermanZhanxing Zhu
2019-06-18
Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup
| Sebastian GoldtMadhu S. AdvaniAndrew M. SaxeFlorent KrzakalaLenka Zdeborová
2019-06-18
REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval
Syed Sameed HusainMiroslaw Bober
2019-06-15
Stochastic Proximal AUC Maximization
Yunwen LeiYiming Ying
2019-06-14
Training Neural Networks for and by Interpolation
| Leonard BerradaAndrew ZissermanM. Pawan Kumar
2019-06-13
Layered SGD: A Decentralized and Synchronous SGD Algorithm for Scalable Deep Neural Network Training
Kwangmin YuThomas FlynnShinjae YooNicholas D'Imperio
2019-06-13
ADASS: Adaptive Sample Selection for Training Acceleration
Shen-Yi ZhaoHao GaoWu-Jun Li
2019-06-11
Adaptively Preconditioned Stochastic Gradient Langevin Dynamics
| Chandrasekaran Anirudh Bhardwaj
2019-06-10
Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization
| Navid AzizanSahin LaleBabak Hassibi
2019-06-10
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Alham Fikri AjiKenneth Heafield
2019-06-08
Bad Global Minima Exist and SGD Can Reach Them
| Shengchao LiuDimitris PapailiopoulosDimitris Achlioptas
2019-06-06
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations
Debraj BasuDeepesh DataCan KarakusSuhas Diggavi
2019-06-06
Binarized Collaborative Filtering with Distilling Graph Convolutional Networks
Haoyu WangDefu LianYong Ge
2019-06-05
Embedded hyper-parameter tuning by Simulated Annealing
Matteo FischettiMatteo Stringher
2019-06-04
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
| Thijs VogelsSai Praneeth KarimireddyMartin Jaggi
2019-05-31
Global Momentum Compression for Sparse Communication in Distributed SGD
Shen-Yi ZhaoYin-Peng XieHao GaoWu-Jun Li
2019-05-30
P3SGD: Patient Privacy Preserving SGD for Regularizing Deep CNNs in Pathological Image Classification
Bingzhe WuShiwan ZhaoGuangyu SunXiaolu ZhangZhong SuCaihong ZengZhihong Liu
2019-05-30
On the Convergence of Memory-Based Distributed SGD
Shen-Yi ZhaoHao GaoWu-Jun Li
2019-05-30
Privacy Amplification by Mixing and Diffusion Mechanisms
Borja BalleGilles BartheMarco GaboardiJoseph Geumlek
2019-05-29
Accelerated Sparsified SGD with Error Feedback
Tomoya MurataTaiji Suzuki
2019-05-29
SGD on Neural Networks Learns Functions of Increasing Complexity
Preetum NakkiranGal KaplunDimitris KalimerisTristan YangBenjamin L. EdelmanFred ZhangBoaz Barak
2019-05-28
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Tianle CaiRuiqi GaoJikai HouSiyu ChenDong WangDi HeZhihua ZhangLiwei Wang
2019-05-28
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
| Boris GinsburgPatrice CastonguayOleksii HrinchukOleksii KuchaievVitaly LavrukhinRyan LearyJason LiHuyen NguyenYang ZhangJonathan M. Cohen
2019-05-27
Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback
Shuai ZhengZiyue HuangJames T. Kwok
2019-05-27
Natural Compression for Distributed Deep Learning
Samuel HorvathChen-Yu HoLudovit HorvathAtal Narayan SahuMarco CaniniPeter Richtarik
2019-05-27
Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers
| Liwei WuShuqing LiCho-Jui HsiehJames Sharpnack
2019-05-25
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
| Sharan VaswaniAaron MishkinIssam LaradjiMark SchmidtGauthier GidelSimon Lacoste-Julien
2019-05-24
VecHGrad for Solving Accurately Complex Tensor Decomposition
Jeremy CharlierVladimir Makarenkov
2019-05-24
MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling
| Jianyu WangAnit Kumar SahuZhouyi YangGauri JoshiSoummya Kar
2019-05-23
Fine-grained Optimization of Deep Neural Networks
Mete Ozay
2019-05-22
Time-Smoothed Gradients for Online Forecasting
Tianhao ZhuSergul Aydore
2019-05-21
Shaping the learning landscape in neural networks around wide flat minima
Carlo BaldassiFabrizio PittorinoRiccardo Zecchina
2019-05-20
Adaptively Truncating Backpropagation Through Time to Control Gradient Bias
| Christopher AicherNicholas J. FotiEmily B. Fox
2019-05-17
Meta Reinforcement Learning with Task Embedding and Shared Policy
| Lin LanZhenguo LiXiaohong GuanPinghui Wang
2019-05-16
DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression
Hanlin TangXiangru LianChen YuTong ZhangJi Liu
2019-05-15
On the Computation and Communication Complexity of Parallel SGD with Dynamic Batch Sizes for Stochastic Non-Convex Optimization
Hao YuRong Jin
2019-05-10
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Daniel S. ParkJascha Sohl-DicksteinQuoc V. LeSamuel L. Smith
2019-05-09
On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization
Hao YuRong JinSen Yang
2019-05-09
AutoAssist: A Framework to Accelerate Training of Deep Neural Networks
| Jiong ZhangHsiang-fu YuInderjit S. Dhillon
2019-05-08
Fast and Secure Distributed Learning in High Dimension
El-Mahdi El-MhamdiRachid Guerraoui
2019-05-05
Processing Megapixel Images with Deep Attention-Sampling Models
| Angelos KatharopoulosFrançois Fleuret
2019-05-03
On the Trajectory of Stochastic Gradient Descent in the Information Plane
Emilio Rafael BaldaArash BehboodiRudolf Mathar
2019-05-01
Distributionally Robust Optimization Leads to Better Generalization: on SGD and Beyond
Jikai HouKaixuan HuangZhihua Zhang
2019-05-01
LSH Microbatches for Stochastic Gradients: Value in Rearrangement
Eliav BuchnikEdith CohenAvinatan HassidimYossi Matias
2019-05-01
A unified theory of adaptive stochastic gradient descent as Bayesian filtering
Laurence Aitchison
2019-05-01
Asynchronous SGD without gradient delay for efficient distributed training
Roman TalyanskyPavel KisilevZach MelamedNatan PeterfreundUri Verner
2019-05-01
DANA: Scalable Out-of-the-box Distributed ASGD Without Retuning
Ido HakimiSaar BarkaiMoshe GabelAssaf Schuster
2019-05-01
G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space
Qi MengShuxin ZhengHuishuai ZhangWei ChenZhi-Ming MaTie-Yan Liu
2019-05-01
A Walk with SGD: How SGD Explores Regions of Deep Network Loss?
Chen XingDevansh ArpitChristos TsirigotisYoshua Bengio
2019-05-01
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Minima and Regularization Effects
Zhanxing ZhuJingfeng WuBing YuLei WuJinwen Ma
2019-05-01
Padam: Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui ChenQuanquan Gu
2019-05-01
Online Hyperparameter Adaptation via Amortized Proximal Optimization
Paul VicolJeffery Z. HaoChenRoger Grosse
2019-05-01
Learn From Neighbour: A Curriculum That Train Low Weighted Samples By Imitating
Benyuan SunYizhou Wang
2019-05-01
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares
Rong GeSham M. KakadeRahul KidambiPraneeth Netrapalli
2019-04-29
Making the Last Iterate of SGD Information Theoretically Optimal
Prateek JainDheeraj NagarajPraneeth Netrapalli
2019-04-29
SWALP : Stochastic Weight Averaging in Low-Precision Training
| Guandao YangTianyi ZhangPolina KirichenkoJunwen BaiAndrew Gordon WilsonChristopher De Sa
2019-04-26
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
| Haibin LinHang ZhangYifei MaTong HeZhi ZhangSheng ZhaMu Li
2019-04-26
Communication trade-offs for synchronized distributed SGD with large step size
Kumar Kshitij PatelAymeric Dieuleveut
2019-04-25
Stability and Optimization Error of Stochastic Gradient Descent for Pairwise Learning
Wei ShenZhenhuan YangYiming YingXiaoming Yuan
2019-04-25
RepPoints: Point Set Representation for Object Detection
| Ze YangShaohui LiuHan HuLiwei WangStephen Lin
2019-04-25
Semi-Cyclic Stochastic Gradient Descent
Hubert EichnerTomer KorenH. Brendan McMahanNathan SrebroKunal Talwar
2019-04-23
Distributed Deep Learning Strategies For Automatic Speech Recognition
Wei ZhangXiaodong CuiUlrich FinklerBrian KingsburyGeorge SaonDavid KungMichael Picheny
2019-04-10
Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution
| Yunpeng ChenHaoqi FanBing XuZhicheng YanYannis KalantidisMarcus RohrbachShuicheng YanJiashi Feng
2019-04-10
Centripetal SGD for Pruning Very Deep Convolutional Networks with Complicated Structure
| Xiaohan DingGuiguang DingYuchen GuoJungong Han
2019-04-08
Normal Approximation for Stochastic Gradient Descent via Non-Asymptotic Rates of Martingale CLT
Andreas AnastasiouKrishnakumar BalasubramanianMurat A. Erdogdu
2019-04-03
Exponentially convergent stochastic k-PCA without variance reduction
Cheng Tang
2019-04-03
A Stochastic Interpretation of Stochastic Mirror Descent: Risk-Sensitive Optimality
Navid AzizanBabak Hassibi
2019-04-03
Lessons from Building Acoustic Models with a Million Hours of Speech
Sree Hari Krishnan ParthasarathiNikko Strom
2019-04-02
On the Stability and Generalization of Learning with Kernel Activation Functions
Michele CirilloSimone ScardapaneSteven Van VaerenberghAurelio Uncini
2019-03-28
Learning Competitive and Discriminative Reconstructions for Anomaly Detection
Kai TianShuigeng ZhouJianping FanJihong Guan
2019-03-17
Inefficiency of K-FAC for Large Batch Size Training
Linjian MaGabe MontagueJiayu YeZhewei YaoAmir GholamiKurt KeutzerMichael W. Mahoney
2019-03-14
Communication-efficient distributed SGD with Sketching
| Nikita IvkinDaniel RothchildEnayat UllahVladimir BravermanIon StoicaRaman Arora
2019-03-12
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction
Fan ZhouGuojing Cong
2019-03-12
Partially Shuffling the Training Data to Improve Language Models
| Ofir Press
2019-03-11
Partially Shuffling the Training Data to Improve Language Models
| Ofir Press
2019-03-11
Accelerating Minibatch Stochastic Gradient Descent using Typicality Sampling
Xinyu PengLi LiFei-Yue Wang
2019-03-11
Fall of Empires: Breaking Byzantine-tolerant SGD by Inner Product Manipulation
Cong XieSanmi KoyejoIndranil Gupta
2019-03-10
Time-Delay Momentum: A Regularization Perspective on the Convergence and Generalization of Stochastic Momentum for Deep Learning
Ziming ZhangWenju XuAlan Sullivan
2019-03-02
Equi-normalization of Neural Networks
| Pierre StockBenjamin GrahamRémi GribonvalHervé Jégou
2019-02-27
Distributed Byzantine Tolerant Stochastic Gradient Descent in the Era of Big Data
Richeng JinXiaofan HeHuaiyu Dai
2019-02-27
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
| Liangchen LuoYuanhao XiongYan LiuXu Sun
2019-02-26
Beating SGD Saturation with Tail-Averaging and Minibatching
Nicole MückeGergely NeuLorenzo Rosasco
2019-02-22
Optimizing Stochastic Gradient Descent in Text Classification Based on Fine-Tuning Hyper-Parameters Approach. A Case Study on Automatic Classification of Global Terrorist Attacks
Shadi Diab
2019-02-18
MultiGrain: a unified image embedding for classes and instances
| Maxim BermanHervé JégouAndrea VedaldiIasonas KokkinosMatthijs Douze
2019-02-14
On Nonconvex Optimization for Machine Learning: Gradients, Stochasticity, and Saddle Points
Chi JinPraneeth NetrapalliRong GeSham M. KakadeMichael I. Jordan
2019-02-13
Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation
Greg Yang
2019-02-13
A Simple Baseline for Bayesian Uncertainty in Deep Learning
| Wesley MaddoxTimur GaripovPavel IzmailovDmitry VetrovAndrew Gordon Wilson
2019-02-07
A Scale Invariant Flatness Measure for Deep Network Minima
Akshay RangamaniNam H. NguyenAbhishek KumarDzung PhanSang H. ChinTrac D. Tran
2019-02-06
The role of a layer in deep neural networks: a Gaussian Process perspective
Oded Ben-DavidZohar Ringel
2019-02-06
Distribution-Dependent Analysis of Gibbs-ERM Principle
Ilja KuzborskijNicolò Cesa-BianchiCsaba Szepesvári
2019-02-05
Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions
Yunwen LeiTing HuGuiying LiKe Tang
2019-02-03
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
| Haowei HeGao HuangYang Yuan
2019-02-02
Uniform-in-Time Weak Error Analysis for Stochastic Gradient Descent Algorithms via Diffusion Approximation
Yuanyuan FengTingran GaoLei LiJian-Guo LiuYulong Lu
2019-02-02
Sharp Analysis for Nonconvex SGD Escaping from Saddle Points
Cong FangZhouchen LinTong Zhang
2019-02-01
Compressing Gradient Optimizers via Count-Sketches
| Ryan SpringAnastasios KyrillidisVijai MohanAnshumali Shrivastava
2019-02-01
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
| Sai Praneeth KarimireddyQuentin RebjockSebastian U. StichMartin Jaggi
2019-01-28
ICLR Reproducibility Challenge Report (Padam : Closing The Generalization Gap Of Adaptive Gradient Methods in Training Deep Neural Networks)
| Harshal MittalKartikey PandeyYash Kant
2019-01-28
99% of Distributed Optimization is a Waste of Time: The Issue and How to Fix it
Konstantin MishchenkoFilip HanzelyPeter Richtárik
2019-01-27
SGD: General Analysis and Improved Rates
Robert Mansel GowerNicolas LoizouXun QianAlibek SailanbayevEgor ShulginPeter Richtarik
2019-01-27
Augment your batch: better training with larger batches
| Elad HofferTal Ben-NunItay HubaraNiv GiladiTorsten HoeflerDaniel Soudry
2019-01-27
Escaping Saddle Points with Adaptive Gradient Methods
Matthew StaibSashank J. ReddiSatyen KaleSanjiv KumarSuvrit Sra
2019-01-26
Generalisation dynamics of online learning in over-parameterised neural networks
Sebastian GoldtMadhu S. AdvaniAndrew M. SaxeFlorent KrzakalaLenka Zdeborová
2019-01-25
Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization
| Zhenxun ZhuangAshok CutkoskyFrancesco Orabona
2019-01-25
Fitting ReLUs via SGD and Quantized SGD
Seyed Mohammadreza Mousavi KalanMahdi SoltanolkotabiA. Salman Avestimehr
2019-01-19
Quasi-potential as an implicit regularizer for the loss function in the stochastic gradient descent
Wenqing HuZhanxing ZhuHaoyi XiongJun Huan
2019-01-18
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
| Umut SimsekliLevent SagunMert Gurbuzbalaban
2019-01-18
Recombination of Artificial Neural Networks
Aaron VoseJacob BalmaAlex HeyeAlessandro RigazziCharles SiegelDiana MoiseBenjamin RobbinsRangan Sukumar
2019-01-12
Quantized Epoch-SGD for Communication-Efficient Distributed Learning
Shen-Yi ZhaoHao GaoWu-Jun Li
2019-01-10
FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters
Tong GengTianqi WangAng LiXi JinMartin Herbordt
2019-01-04
SGD Converges to Global Minimum in Deep Learning via Star-convex Path
Yi ZhouJunjie YangHuishuai ZhangYingbin LiangVahid Tarokh
2019-01-02
Provable Guarantees on Learning Hierarchical Generative Models with Deep CNNs
Eran MalachShai Shalev-Shwartz
2019-01-01
A continuous-time analysis of distributed stochastic gradient
Nicholas M. BoffiJean-Jacques E. Slotine
2018-12-28
Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?
Samet OymakMahdi Soltanolkotabi
2018-12-25
Stochastic Doubly Robust Gradient
Kanghoon LeeJihye ChoiMoonsu ChaJung-Kwon LeeTaeyoon Kim
2018-12-21
Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL
Anusha NagabandiChelsea FinnSergey Levine
2018-12-18
Provable limitations of deep learning
Emmanuel AbbeColin Sandon
2018-12-16
Stagewise Training Accelerates Convergence of Testing Error Over SGD
Zhuoning YuanYan YanRong JinTianbao Yang
2018-12-10
What is the Effect of Importance Weighting in Deep Learning?
Jonathon ByrdZachary C. Lipton
2018-12-08
Elastic Gossip: Distributing Neural Network Training Using Gossip-like Protocols
| Siddharth Pramod
2018-12-06
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Xiaowu DaiYuhua Zhu
2018-12-03
Stochastic Composite Mirror Descent: Optimal Bounds with High Probabilities
Yunwen LeiKe Tang
2018-12-01
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication
Peng JiangGagan Agrawal
2018-12-01
Exact natural gradient in deep linear networks and its application to the nonlinear case
Alberto BernacchiaMate LengyelGuillaume Hennequin
2018-12-01
On the Local Hessian in Back-propagation
Huishuai ZhangWei ChenTie-Yan Liu
2018-12-01
Training Deep Models Faster with Robust, Approximate Importance Sampling
Tyler B. JohnsonCarlos Guestrin
2018-12-01
How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective
| Lei WuChao MaWeinan E
2018-12-01
Variance-Reduced Stochastic Gradient Descent on Streaming Data
Ellango JothimurugesanAshraf TahmasbiPhillip GibbonsSrikanta Tirthapura
2018-12-01
Stochastic Primal-Dual Method for Empirical Risk Minimization with O(1) Per-Iteration Complexity
Conghui TanTong ZhangShiqian MaJi Liu
2018-12-01
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network
| Sachin MehtaMohammad RastegariLinda ShapiroHannaneh Hajishirzi
2018-11-28
Stochastic Gradient Push for Distributed Deep Learning
| Mahmoud AssranNicolas LoizouNicolas BallasMichael Rabbat
2018-11-27
The promises and pitfalls of Stochastic Gradient Langevin Dynamics
Nicolas BrosseAlain DurmusEric Moulines
2018-11-25
Hydra: A Peer to Peer Distributed Training & Data Collection Framework
| Vaibhav MathurKaranbir Chahal
2018-11-24
HyperAdam: A Learnable Task-Adaptive Adam for Network Training
| Shipeng WangJian SunZongben Xu
2018-11-22
Deep Frank-Wolfe For Neural Network Optimization
| Leonard BerradaAndrew ZissermanM. Pawan Kumar
2018-11-19
Minimum weight norm models do not always generalize well for over-parameterized problems
Vatsal ShahAnastasios KyrillidisSujay Sanghavi
2018-11-16
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Zeyuan Allen-ZhuYuanzhi LiYingyu Liang
2018-11-12
New Convergence Aspects of Stochastic Gradient Algorithms
Lam M. NguyenPhuong Ha NguyenPeter RichtárikKatya ScheinbergMartin TakáčMarten van Dijk
2018-11-10
A Convergence Theory for Deep Learning via Over-Parameterization
Zeyuan Allen-ZhuYuanzhi LiZhao Song
2018-11-09
Deep Reinforcement Learning via L-BFGS Optimization
Jacob RafatiRoummel F. Marcia
2018-11-06
On exponential convergence of SGD in non-convex over-parametrized learning
Raef BassilyMikhail BelkinSiyuan Ma
2018-11-06
Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations
Qianxiao LiCheng TaiWeinan E
2018-11-05
Stochastic Primal-Dual Method for Empirical Risk Minimization with $\mathcal{O}(1)$ Per-Iteration Complexity
Conghui TanTong ZhangShiqian MaJi Liu
2018-11-03
Implicit Regularization of Stochastic Gradient Descent in Natural Language Processing: Observations and Implications
Deren LeiZichen SunYijun XiaoWilliam Yang Wang
2018-11-01
Accelerating SGD with momentum for over-parameterized learning
| Chaoyue LiuMikhail Belkin
2018-10-31
Kalman Gradient Descent: Adaptive Variance Reduction in Stochastic Optimization
| James Vuckovic
2018-10-29
On the Convergence Rate of Training Recurrent Neural Networks
Zeyuan Allen-ZhuYuanzhi LiZhao Song
2018-10-29
Finding Mixed Nash Equilibria of Generative Adversarial Networks
Ya-Ping HsiehChen LiuVolkan Cevher
2018-10-23
Optimality of the final model found via Stochastic Gradient Descent
Andrea Schioppa
2018-10-22
ensmallen: a flexible C++ library for efficient function optimization
| Shikhar BhardwajRyan R. CurtinMarcus EdelYannis MentekidisConrad Sanderson
2018-10-22
Exchangeability and Kernel Invariance in Trained MLPs
Russell TsuchidaFred RoostaMarcus Gallagher
2018-10-19
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu WangGauri Joshi
2018-10-19
Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity
Chulhee YunSuvrit SraAli Jadbabaie
2018-10-17
Quasi-hyperbolic momentum and Adam for deep learning
| Jerry MaDenis Yarats
2018-10-16
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
Xiaodong CuiWei ZhangZoltán TüskeMichael Picheny
2018-10-16
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
Sharan VaswaniFrancis BachMark Schmidt
2018-10-16
Training Deep Neural Network in Limited Precision
Hyunsun ParkJun Haeng LeeYoungmin OhSangwon HaSeungwon Lee
2018-10-12
Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
Roman NovakLechao XiaoJaehoon LeeYasaman BahriGreg YangJiri HronDaniel A. AbolafiaJeffrey PenningtonJascha Sohl-Dickstein
2018-10-11
signSGD with Majority Vote is Communication Efficient And Fault Tolerant
Jeremy BernsteinJiawei ZhaoKamyar AzizzadenesheliAnima Anandkumar
2018-10-11
Tight Dimension Independent Lower Bound on the Expected Convergence Rate for Diminishing Step Sizes in SGD
Phuong Ha NguyenLam M. NguyenMarten van Dijk
2018-10-10
Anytime Stochastic Gradient Descent: A Time to Hear from all the Workers
Nuwan FerdinandStark Draper
2018-10-06
Continuous-time Models for Stochastic Optimization Algorithms
| Antonio OrvietoAurelien Lucchi
2018-10-05
Large batch size training of neural networks with adversarial training and second-order information
| Zhewei YaoAmir GholamiDaiyaan ArfeenRichard LiawJoseph GonzalezKurt KeutzerMichael Mahoney
2018-10-02
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning
Cheolhyoung LeeKyunghyun ChoWanmo Kang
2018-09-29
The Convergence of Sparsified Gradient Methods
Dan AlistarhTorsten HoeflerMikael JohanssonSarit KhiriratNikola KonstantinovCédric Renggli
2018-09-27
Preconditioner on Matrix Lie Group for SGD
| Xi-Lin Li
2018-09-26
Sparsified SGD with Memory
| Sebastian U. StichJean-Baptiste CordonnierMartin Jaggi
2018-09-20
Graph-Dependent Implicit Regularisation for Distributed Stochastic Subgradient Descent
Dominic RichardsPatrick Rebeschini
2018-09-18
Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference
Jeffrey L. McKinstrySteven K. EsserRathinakumar AppuswamyDeepika BablaniJohn V. ArthurIzzet B. YildizDharmendra S. Modha
2018-09-11
Learning Rate Adaptation for Federated and Differentially Private Learning
| Antti KoskelaAntti Honkela
2018-09-11
Privacy-Preserving Deep Learning via Weight Transmission
Le Trieu PhongTran Thi Phuong
2018-09-10
Stochastic Gradient Descent Learns State Equations with Nonlinear Activations
Samet Oymak
2018-09-09
Decentralized Differentially Private Without-Replacement Stochastic Gradient Descent
Richeng JinXiaofan HeHuaiyu Dai
2018-09-08
Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes
Chris Junchi LiZhaoran WangHan Liu
2018-08-29
Don't Use Large Mini-Batches, Use Local SGD
| Tao LinSebastian U. StichKumar Kshitij PatelMartin Jaggi
2018-08-22
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Jianyu WangGauri Joshi
2018-08-22
Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions
Zaiyi ChenZhuoning YuanJinfeng YiBowen ZhouEnhong ChenTianbao Yang
2018-08-20
Weighted AdaGrad with Unified Momentum
Fangyu ZouLi ShenZequn JieJu SunWei Liu
2018-08-10
Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks
Nikola B. KovachkiAndrew M. Stuart
2018-08-10
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Yuanzhi LiYingyu Liang
2018-08-03
Stochastic Gradient Descent with Biased but Consistent Gradient Estimators
| Jie ChenRonny Luss
2018-07-31
A Surprising Linear Relationship Predicts Test Performance in Deep Networks
| Qianli LiaoBrando MirandaAndrzej BanburskiJack HidaryTomaso Poggio
2018-07-25
signProx: One-Bit Proximal Algorithm for Nonconvex Stochastic Optimization
Xiaojian XuUlugbek S. Kamilov
2018-07-20
Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods
Laurence Aitchison
2018-07-19
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning
Hao YuSen YangShenghuo Zhu
2018-07-17
Evolving Differentiable Gene Regulatory Networks
Dennis G WilsonKyle HarringtonSylvain Cussat-BlancHervé Luga
2018-07-16
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
| Stanisław JastrzębskiZachary KentonNicolas BallasAsja FischerYoshua BengioAmos Storkey
2018-07-13
Metalearning with Hebbian Fast Weights
Tsendsuren MunkhdalaiAdam Trischler
2018-07-12
Maximizing Invariant Data Perturbation with Stochastic Optimization
| Kouichi IkenoSatoshi Hara
2018-07-12
Quasi-Monte Carlo Variational Inference
Alexander BuchholzFlorian WenzelStephan Mandt
2018-07-04
Batch IS NOT Heavy: Learning Word Representations From All Samples
Xin XinFajie YuanXiangnan HeJoemon M. Jose
2018-07-01
Random Shuffling Beats SGD after Finite Epochs
Jeff Z. HaoChenSuvrit Sra
2018-06-26
Faster SGD training by minibatch persistency
Matteo FischettiIacopo MandatelliDomenico Salvagnin
2018-06-19
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
| Jinghui ChenDongruo ZhouYiqi TangZiyan YangYuan CaoQuanquan Gu
2018-06-18
Using Mode Connectivity for Loss Landscape Analysis
Akhilesh GotmareNitish Shirish KeskarCaiming XiongRichard Socher
2018-06-18
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
| Ben AthiwaratkunMarc FinziPavel IzmailovAndrew Gordon Wilson
2018-06-14
Boosted Training of Convolutional Neural Networks for Multi-Class Segmentation
Lorenz BergerEoin HydeMatt GibbNevil PavithranGarin KellyFaiz MumtazSébastien Ourselin
2018-06-13
When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models?
Tengyu XuYi ZhouKaiyi JiYingbin Liang
2018-06-12
The Effect of Network Width on the Performance of Large-batch Training
Lingjiao ChenHongyi WangJinman ZhaoDimitris PapailiopoulosParaschos Koutris
2018-06-11
Full deep neural network training on a pruned weight budget
| Maximilian GolubGuy LemieuxMieszko Lis
2018-06-11
Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data
Shuai ZhengJames T. Kwok
2018-06-08
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel NacsonNathan SrebroDaniel Soudry
2018-06-05
Probabilistic Deep Learning using Random Sum-Product Networks
Robert PeharzAntonio VergariKarl StelznerAlejandro MolinaMartin TrappKristian KerstingZoubin Ghahramani
2018-06-05
Backdrop: Stochastic Backpropagation
| Siavash GolkarKyle Cranmer
2018-06-04
Geometry Aware Constrained Optimization Techniques for Deep Learning
Soumava Kumar RoyZakaria MhammediMehrtash Harandi
2018-06-01
On Consensus-Optimality Trade-offs in Collaborative Deep Learning
Zhanhong JiangAditya BaluChinmay HegdeSoumik Sarkar
2018-05-30
How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery?
Richard Y. ZhangCédric JoszSomayeh SojoudiJavad Lavaei
2018-05-25
Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes
Loucas Pillaud-VivienAlessandro RudiFrancis Bach
2018-05-25
Zeno: Distributed Stochastic Gradient Descent with Suspicion-based Fault-tolerance
| Cong XieOluwasanmi KoyejoIndranil Gupta
2018-05-25
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
2018-05-24
Predictive Local Smoothness for Stochastic Gradient Methods
Jun LiHongfu LiuBineng ZhongYue WuYun Fu
2018-05-23
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
| Joeri HermansGilles Louppe
2018-05-22
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
| Wei WenYandan WangFeng YanCong XuChunpeng WuYiran ChenHai Li
2018-05-21
Small steps and giant leaps: Minimal Newton solvers for Deep Learning
| João F. HenriquesSebastien EhrhardtSamuel AlbanieAndrea Vedaldi
2018-05-21
Interpolatron: Interpolation or Extrapolation Schemes to Accelerate Optimization for Deep Neural Networks
Guangzeng XieYitan WangShuchang ZhouZhihua Zhang
2018-05-17
Differential Equations for Modeling Asynchronous Algorithms
Li HeQi MengWei ChenZhi-Ming MaTie-Yan Liu
2018-05-08
Implementation of Stochastic Quasi-Newton's Method in PyTorch
| Yingkai LiHuidong Liu
2018-05-07
Trainability and Accuracy of Neural Networks: An Interacting Particle System Approach
Grant M. RotskoffEric Vanden-Eijnden
2018-05-02
A Scalable Discrete-Time Survival Model for Neural Networks
| Michael F. GensheimerBalasubramanian Narasimhan
2018-05-02
k-SVRG: Variance Reduction for Large Scale Optimization
Anant RajSebastian U. Stich
2018-05-02
Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT
Danielle SaundersFelix StahlbergAdria de GispertBill Byrne
2018-05-01
A Mean Field View of the Landscape of Two-Layers Neural Networks
Song MeiAndrea MontanariPhan-Minh Nguyen
2018-04-18
Active Mini-Batch Sampling using Repulsive Point Processes
| Cheng ZhangCengiz ÖztireliStephan MandtGiampiero Salvi
2018-04-08
Byzantine Stochastic Gradient Descent
Dan AlistarhZeyuan Allen-ZhuJerry Li
2018-03-23
The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory
Dan AlistarhChristopher De SaNikola Konstantinov
2018-03-23
Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
Arnulf JentzenPhilippe von Wurstemberger
2018-03-22
Escaping Saddles with Stochastic Gradients
Hadi DaneshmandJonas KohlerAurelien LucchiThomas Hofmann
2018-03-15
GossipGraD: Scalable Deep Learning using Gossip Communication based Asynchronous Gradient Descent
Jeff DailyAbhinav VishnuCharles SiegelThomas WarfelVinay Amatya
2018-03-15
On the insufficiency of existing momentum schemes for Stochastic Optimization
| Rahul KidambiPraneeth NetrapalliPrateek JainSham M. Kakade
2018-03-15
Averaging Weights Leads to Wider Optima and Better Generalization
| Pavel IzmailovDmitrii PodoprikhinTimur GaripovDmitry VetrovAndrew Gordon Wilson
2018-03-14
Self-Similar Epochs: Value in Arrangement
Eliav BuchnikEdith CohenAvinatan HassidimYossi Matias
2018-03-14
Model-Agnostic Private Learning via Stability
Raef BassilyOm ThakkarAbhradeep Thakurta
2018-03-14
High-Accuracy Low-Precision Training
| Christopher De SaMegan LeszczynskiJian ZhangAlana MarzoevChristopher R. AbergerKunle OlukotunChristopher Ré
2018-03-09
Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
Partha P Mitra
2018-03-08
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra DuttaGauri JoshiSoumyadip GhoshParijat DubePriya Nagpurkar
2018-03-03
Not All Samples Are Created Equal: Deep Learning with Importance Sampling
| Angelos KatharopoulosFrançois Fleuret
2018-03-02
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects
Zhanxing ZhuJingfeng WuBing YuLei WuJinwen Ma
2018-03-01
Train Feedfoward Neural Network with Layer-wise Adaptive Rate via Approximating Back-matching Propagation
Huishuai ZhangWei ChenTie-Yan Liu
2018-02-27
Shampoo: Preconditioned Stochastic Tensor Optimization
| Vineet GuptaTomer KorenYoram Singer
2018-02-26
A Walk with SGD
Chen XingDevansh ArpitChristos TsirigotisYoshua Bengio
2018-02-24
Stochastic Gradient Descent on Highly-Parallel Architectures
| Yujing MaFlorin RusuMartin Torres
2018-02-24
Asynchronous Byzantine Machine Learning (the case of SGD)
Georgios DamaskinosEl Mahdi El MhamdiRachid GuerraouiRhicheek PatraMahsa Taziki
2018-02-22
The Hidden Vulnerability of Distributed Learning in Byzantium
El Mahdi El MhamdiRachid GuerraouiSébastien Rouault
2018-02-22
Generalization Error Bounds with Probabilistic Guarantee for SGD in Nonconvex Optimization
Yi ZhouYingbin LiangHuishuai Zhang
2018-02-19
An Alternative View: When Does SGD Escape Local Minima?
Robert KleinbergYuanzhi LiYang Yuan
2018-02-17
The Role of Information Complexity and Randomization in Representation Learning
Matías VeraPablo PiantanidaLeonardo Rey Vega
2018-02-14
Uncertainty Quantification for Online Learning and Stochastic Approximation via Hierarchical Incremental Gradient Descent
Weijie J. SuYuancheng Zhu
2018-02-13
signSGD: Compressed Optimisation for Non-Convex Problems
| Jeremy BernsteinYu-Xiang WangKamyar AzizzadenesheliAnima Anandkumar
2018-02-13
SGD and Hogwild! Convergence Without the Bounded Gradients Assumption
Lam M. NguyenPhuong Ha NguyenMarten van DijkPeter RichtárikKatya ScheinbergMartin Takáč
2018-02-11
A predictor-corrector method for the training of deep neural networks
Yatin Saraiya
2018-01-19
When Does Stochastic Gradient Algorithm Work Well?
Lam M. NguyenNam H. NguyenDzung T. PhanJayant R. KalagnanamKatya Scheinberg
2018-01-18
MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning
Amith R MamidalaGeorgios KolliasChris WardFausto Artico
2018-01-11
How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD
Zeyuan Allen-Zhu
2018-01-08
Theory of Deep Learning IIb: Optimization Properties of SGD
Chiyuan ZhangQianli LiaoAlexander RakhlinBrando MirandaNoah GolowichTomaso Poggio
2018-01-07
Kronecker-factored Curvature Approximations for Recurrent Neural Networks
James MartensJimmy BaMatt Johnson
2018-01-01
Fixing Weight Decay Regularization in Adam
Ilya LoshchilovFrank Hutter
2018-01-01
Better Generalization by Efficient Trust Region Method
Xuanqing LiuJason D. LeeCho-Jui Hsieh
2018-01-01
Sparse Regularized Deep Neural Networks For Efficient Embedded Learning
Jia Bi
2018-01-01
Convergence rate of sign stochastic gradient descent for non-convex functions
Jeremy BernsteinKamyar AzizzadenesheliYu-Xiang WangAnima Anandkumar
2018-01-01
A comparison of second-order methods for deep convolutional neural networks
Patrick H. ChenCho-jui Hsieh
2018-01-01
Faster Distributed Synchronous SGD with Weak Synchronization
Cong XieOluwasanmi O. KoyejoIndranil Gupta
2018-01-01
LSH-SAMPLING BREAKS THE COMPUTATIONAL CHICKEN-AND-EGG LOOP IN ADAPTIVE STOCHASTIC GRADIENT ESTIMATION
Beidi ChenYingchen XuAnshumali Shrivastava
2018-01-01
Demystifying overcomplete nonlinear auto-encoders: fast SGD convergence towards sparse representation from random initialization
Cheng TangClaire Monteleoni
2018-01-01
Learning Efficient Tensor Representations with Ring Structure Networks
Qibin ZhaoMasashi SugiyamaLonghao YuanAndrzej Cichocki
2018-01-01
A Painless Attention Mechanism for Convolutional Neural Networks
Pau RodríguezGuillem CucurullJordi GonzàlezJosep M. GonfausXavier Roca
2018-01-01
True Asymptotic Natural Gradient Optimization
Yann Ollivier
2017-12-22
Improving Generalization Performance by Switching from Adam to SGD
| Nitish Shirish KeskarRichard Socher
2017-12-20
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan MaRaef BassilyMikhail Belkin
2017-12-18
On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent
Xingwen ZhangJeff CluneKenneth O. Stanley
2017-12-18
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
| Yujun LinSong HanHuizi MaoYu WangWilliam J. Dally
2017-12-05
Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent
Peva BlanchardEl Mahdi El MhamdiRachid GuerraouiJulien Stainer
2017-12-01
Online to Offline Conversions, Universality and Adaptive Minibatch Sizes
Kfir Levy
2017-12-01
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
Alberto BiettiJulien Mairal
2017-12-01
Nonlinear Acceleration of Stochastic Algorithms
Damien ScieurFrancis BachAlexandre D'Aspremont
2017-12-01
Neon2: Finding Local Minima via First-Order Oracles
Zeyuan Allen-ZhuYuanzhi Li
2017-11-17
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
Shaohuai ShiXiaowen Chu
2017-11-16
Decoupled Weight Decay Regularization
| Ilya LoshchilovFrank Hutter
2017-11-14
Three Factors Influencing Minima in SGD
Stanisław JastrzębskiZachary KentonDevansh ArpitNicolas BallasAsja FischerYoshua BengioAmos Storkey
2017-11-13
Analysis of Biased Stochastic Gradient Descent Using Sequential Semidefinite Programs
Bin HuPeter SeilerLaurent Lessard
2017-11-03
Don't Decay the Learning Rate, Increase the Batch Size
| Samuel L. SmithPieter-Jan KindermansChris YingQuoc V. Le
2017-11-01
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik ChaudhariStefano Soatto
2017-10-30
Linearly convergent stochastic heavy ball method for minimizing generalization error
Nicolas LoizouPeter Richtárik
2017-10-30
SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
Alon BrutzkusAmir GlobersonEran MalachShai Shalev-Shwartz
2017-10-27
Improving Negative Sampling for Word Representation using Self-embedded Features
Long ChenFajie YuanJoemon M. JoseWeinan Zhang
2017-10-26
A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)
Prateek JainSham M. KakadeRahul KidambiPraneeth NetrapalliVenkata Krishna PillutlaAaron Sidford
2017-10-25
Stability and Generalization of Learning Algorithms that Converge to Global Optima
Zachary CharlesDimitris Papailiopoulos
2017-10-23
A Novel Stochastic Stratified Average Gradient Method: Convergence Rate and Its Complexity
Aixiang ChenBingchuan ChenXiaolong ChaiRui BianHengguang Li
2017-10-21
Asynchronous Decentralized Parallel Stochastic Gradient Descent
| Xiangru LianWei ZhangCe ZhangJi Liu
2017-10-18
Graph Drawing by Stochastic Gradient Descent
Jonathan X. ZhengSamraat PawarDan F. M. Goodman
2017-10-12
Synkhronos: a Multi-GPU Theano Extension for Data Parallelism
Adam StookePieter Abbeel
2017-10-11
Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling
Qi MengWei ChenYue WangZhi-Ming MaTie-Yan Liu
2017-09-29
Probabilistic Synchronous Parallel
Liang WangBen CatterallRichard Mortier
2017-09-22
Neural Optimizer Search with Reinforcement Learning
| Irwan BelloBarret ZophVijay VasudevanQuoc V. Le
2017-09-21
A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent
| Ben London
2017-09-19
Scalable Support Vector Clustering Using Budget
Tung PhamTrung LeHang Dang
2017-09-19
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
Vivak Patel
2017-09-14
Normalized Direction-preserving Adam
| Zijun ZhangLin MaZongpeng LiChuan Wu
2017-09-13
Stochastic Gradient Descent: Going As Fast As Possible But Not Faster
Alice Schoenauer-SebagMarc SchoenauerMichèle Sebag
2017-09-05
Natasha 2: Faster Non-Convex Optimization Than SGD
Zeyuan Allen-Zhu
2017-08-29
Second-Order Optimization for Non-Convex Machine Learning: An Empirical Study
Peng XuFarbod Roosta-KhorasaniMichael W. Mahoney
2017-08-25
Weighted parallel SGD for distributed unbalanced-workload training system
Cheng DaningLi ShigangZhang Yunquan
2017-08-16
Noisy Softmax: Improving the Generalization Ability of DCNN via Postponing the Early Softmax Saturation
Binghui ChenWeihong DengJunping Du
2017-08-12
Stochastic Optimization with Bandit Sampling
Farnood SalehiL. Elisa CelisPatrick Thiran
2017-08-08
Neural Optimizer Search using Reinforcement Learning
Irwan BelloBarret ZophVijay VasudevanQuoc V. Le
2017-08-01
Stochastic Adaptive Quasi-Newton Methods for Minimizing Expected Values
Chaoxu ZhouWenbo GaoDonald Goldfarb
2017-08-01
Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains
Aymeric DieuleveutAlain DurmusFrancis Bach
2017-07-20
Block-Normalized Gradient Method: An Empirical Study for Training Deep Neural Network
| Adams Wei YuLei HuangQihang LinRuslan SalakhutdinovJaime Carbonell
2017-07-16
Dual Path Networks
| Yunpeng ChenJianan LiHuaxin XiaoXiaojie JinShuicheng YanJiashi Feng
2017-07-06
Parle: parallelizing stochastic gradient descent
Pratik ChaudhariCarlo BaldassiRiccardo ZecchinaStefano SoattoAmeet TalwalkarAdam Oberman
2017-07-03
On Scalable Inference with Stochastic Gradient Descent
Yixin FangJinfeng XuLei Yang
2017-07-01
Spectrally-normalized margin bounds for neural networks
Peter BartlettDylan J. FosterMatus Telgarsky
2017-06-26
Collaborative Deep Learning in Fixed Topology Networks
Zhanhong JiangAditya BaluChinmay HegdeSoumik Sarkar
2017-06-23
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong YinAshwin PananjadyMax LamDimitris PapailiopoulosKannan RamchandranPeter Bartlett
2017-06-18
YellowFin and the Art of Momentum Tuning
| Jian ZhangIoannis Mitliagkas
2017-06-12
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
| Priya GoyalPiotr DollárRoss GirshickPieter NoordhuisLukasz WesolowskiAapo KyrolaAndrew TullochYangqing JiaKaiming He
2017-06-08
CATERPILLAR: Coarse Grain Reconfigurable Architecture for Accelerating the Training of Deep Neural Networks
Yuanfang LiArdavan Pedram
2017-06-01
Reinforcement Learning for Learning Rate Control
Chang XuTao QinGang WangTie-Yan Liu
2017-05-31
Online to Offline Conversions, Universality and Adaptive Minibatch Sizes
Kfir Y. Levy
2017-05-30
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
Yuanzhi LiYang Yuan
2017-05-28
The Marginal Value of Adaptive Gradient Methods in Machine Learning
| Ashia C. WilsonRebecca RoelofsMitchell SternNathan SrebroBenjamin Recht
2017-05-23
On the diffusion approximation of nonconvex stochastic gradient descent
Wenqing HuChris Junchi LiLei LiJian-Guo Liu
2017-05-22
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Lukas BallesPhilipp Hennig
2017-05-22
Parallel Stochastic Gradient Descent with Sound Combiners
Saeed MalekiMadanlal MusuvathiTodd Mytkowicz
2017-05-22
Statistical inference using SGD
Tianyang LiLiu LiuAnastasios KyrillidisConstantine Caramanis
2017-05-21
Determinantal Point Processes for Mini-Batch Diversification
Cheng ZhangHedvig KjellstromStephan Mandt
2017-05-01
Parseval Networks: Improving Robustness to Adversarial Examples
Moustapha CissePiotr BojanowskiEdouard GraveYann DauphinNicolas Usunier
2017-04-28
Linear Convergence of Accelerated Stochastic Gradient Descent for Nonconvex Nonsmooth Optimization
Feihu HuangSongcan Chen
2017-04-26
Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples
| Haw-Shiuan ChangErik Learned-MillerAndrew McCallum
2017-04-24
Stochastic Gradient Descent as Approximate Bayesian Inference
| Stephan MandtMatthew D. HoffmanDavid M. Blei
2017-04-13
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
| Gintare Karolina DziugaiteDaniel M. Roy
2017-03-31
Theory II: Landscape of the Empirical Risk in Deep Learning
Qianli LiaoTomaso Poggio
2017-03-28
Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling
Wenpeng LiBinBin ZhangLei XieDong Yu
2017-03-17
Separation of time scales and direct computation of weights in deep neural networks
Nima DehmamyNeda RohaniAggelos Katsaggelos
2017-03-14
Data-Dependent Stability of Stochastic Gradient Descent
Ilja KuzborskijChristoph H. Lampert
2017-03-05
A Robust Adaptive Stochastic Gradient Method for Deep Learning
| Caglar GulcehreJose SoteloMarcin MoczulskiYoshua Bengio
2017-03-02
SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient
Lam M. NguyenJie LiuKatya ScheinbergMartin Takáč
2017-03-01
Learning What Data to Learn
Yang FanFei TianTao QinJiang BianTie-Yan Liu
2017-02-28
McKernel: A Library for Approximate Kernel Expansions in Log-linear Time
| Joachim D. CurtóIrene C. ZarzaFeng YangAlexander J. SmolaFernando De La TorreChong-Wah NgoLuc Van Gool
2017-02-27
SGD Learns the Conjugate Kernel Class of the Network
Amit Daniely
2017-02-27
On SGD's Failure in Practice: Characterizing and Overcoming Stalling
Vivak Patel
2017-02-01
Reinforced stochastic gradient descent for deep neural network learning
Haiping HuangTaro Toyoizumi
2017-01-27
Optimization on Product Submanifolds of Convolution Kernels
Mete OzayTakayuki Okatani
2017-01-22
Efficient Distributed Semi-Supervised Learning using Stochastic Regularization over Affinity Graphs
Sunil ThulasidasanJeffrey BilmesGarrett Kenyon
2016-12-15
Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization
Valentin DalibardMichael SchaarschmidtEiko Yoneki
2016-12-01
Spatial contrasting for deep unsupervised learning
Elad HofferItay HubaraNir Ailon
2016-11-21
How to scale distributed deep learning?
Peter H. JinQiaochu YuanForrest IandolaKurt Keutzer
2016-11-14
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
| Pratik ChaudhariAnna ChoromanskaStefano SoattoYann LeCunCarlo BaldassiChristian BorgsJennifer ChayesLevent SagunRiccardo Zecchina
2016-11-06
Quasi-Recurrent Neural Networks
| James BradburyStephen MerityCaiming XiongRichard Socher
2016-11-05
Statistical Inference for Model Parameters in Stochastic Gradient Descent
Xi ChenJason D. LeeXin T. TongYichen Zhang
2016-10-27
Optimization on Submanifolds of Convolution Kernels in CNNs
Mete OzayTakayuki Okatani
2016-10-22
An Efficient Minibatch Acceptance Test for Metropolis-Hastings
Daniel SeitaXinlei PanHaoyu ChenJohn Canny
2016-10-19
CuMF_SGD: Fast and Scalable Matrix Factorization
| Xiaolong XieWei TanLiana L. FongYun Liang
2016-10-19
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Soham DeAbhay YadavDavid JacobsTom Goldstein
2016-10-18
Parallelizing Stochastic Gradient Descent for Least Squares Regression: mini-batching, averaging, and model misspecification
| Prateek JainSham M. KakadeRahul KidambiPraneeth NetrapalliAaron Sidford
2016-10-12
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
| Dan AlistarhDemjan GrubicJerry LiRyota TomiokaMilan Vojnovic
2016-10-07
Near-Data Processing for Differentiable Machine Learning Models
Hyeokjun ChoeSeil LeeHyunha NamSeongsik ParkSeijoon KimEui-Young ChungSungroh Yoon
2016-10-06
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
| Alberto BiettiJulien Mairal
2016-10-04
Deep unsupervised learning through spatial contrasting
Elad HofferItay HubaraNir Ailon
2016-10-02
Asynchronous Stochastic Gradient Descent with Delay Compensation
| Shuxin ZhengQi MengTaifeng WangWei ChenNenghai YuZhi-Ming MaTie-Yan Liu
2016-09-27
Generalization Error Bounds for Optimization Algorithms via Stability
Qi MengYue WangWei ChenTaifeng WangZhi-Ming MaTie-Yan Liu
2016-09-27
Data Dependent Convergence for Distributed Stochastic Optimization
Avleen S. Bijral
2016-08-30
Uniform Generalization, Concentration, and Adaptive Learning
Ibrahim Alabdulmohsin
2016-08-22
Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures
| Seyyed Hossein HasanpourMohammad RouhaniMohsen FayyazMohammad Sabokrou
2016-08-22
Parallel SGD: When does averaging help?
Jian ZhangChristopher De SaIoannis MitliagkasChristopher Ré
2016-06-23
Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics
| Xi WuFengan LiArun KumarKamalika ChaudhuriSomesh JhaJeffrey F. Naughton