1 code implementation • 2 Jun 2020 • Francesco Cosentino, Harald Oberhauser, Alessandro Abate
Various flavours of Stochastic Gradient Descent (SGD) replace the expensive summation that computes the full gradient by approximating it with a small sum over a randomly selected subsample of the data set that in turn suffers from a high variance.
1 code implementation • NeurIPS 2020 • Francesco Cosentino, Harald Oberhauser, Alessandro Abate
Given a discrete probability measure supported on $N$ atoms and a set of $n$ real-valued functions, there exists a probability measure that is supported on a subset of $n+1$ of the original $N$ atoms and has the same mean when integrated against each of the $n$ functions.