Variance Reduction in Deep Learning: More Momentum is All You Need

no code implementations23 Nov 2021 Lionel Tondji, Sergii Kashubin, Moustapha Cisse

Variance reduction (VR) techniques have contributed significantly to accelerating learning with massive datasets in the smooth and strongly convex setting (Schmidt et al., 2017; Johnson & Zhang, 2013; Roux et al., 2012).

Data Augmentation Distributed Optimization

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

2 code implementations ICLR 2020 Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, Olivier Bousquet

We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures.

BIG-bench Machine Learning Question Answering

