no code implementations • 5 Oct 2024 • Dmitry Yarotsky, Maksim Velikanov
In the non-stochastic setting, the optimal exponent $\xi$ in the loss convergence $L_t\sim C_Lt^{-\xi}$ is double that in plain GD and is achievable using Heavy Ball (HB) with a suitable schedule; this no longer works in the presence of mini-batch noise.
no code implementations • 18 Mar 2024 • Maksim Velikanov, Maxim Panov, Dmitry Yarotsky
In the present work, we consider the training of kernels with a family of $\textit{spectral algorithms}$ specified by profile $h(\lambda)$, and including KRR and GD as special cases.
no code implementations • 26 Feb 2024 • Dmitry Yarotsky
We explore the theoretical possibility of learning $d$-dimensional targets with $W$-parameter models by gradient flow (GF) when $W<d$.
1 code implementation • 22 Jun 2022 • Maksim Velikanov, Denis Kuznedelev, Dmitry Yarotsky
Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models.
no code implementations • 24 Feb 2022 • Maksim Velikanov, Roman Kail, Ivan Anokhin, Roman Vashurin, Maxim Panov, Alexey Zaytsev, Dmitry Yarotsky
In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models.
no code implementations • 2 Feb 2022 • Maksim Velikanov, Dmitry Yarotsky
In this paper, we propose a new spectral condition providing tighter upper bounds for problems with power law optimization trajectories.
no code implementations • NeurIPS 2021 • Maksim Velikanov, Dmitry Yarotsky
Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values.
no code implementations • 2 May 2021 • Maksim Velikanov, Dmitry Yarotsky
Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values.
no code implementations • 22 Feb 2021 • Dmitry Yarotsky
We call a finite family of activation functions superexpressive if any multivariate continuous function can be approximated by a neural network that uses these activations and has a fixed architecture only depending on the number of input variables (i. e., to achieve any accuracy we only need to adjust the weights, without increasing the number of neurons).
1 code implementation • ICML 2020 • Ivan Anokhin, Dmitry Yarotsky
Recent research shows that sublevel sets of the loss surfaces of overparameterized networks are connected, exactly or approximately.
no code implementations • NeurIPS 2020 • Dmitry Yarotsky, Anton Zhevnerchuk
We explore the phase diagram of approximation rates for deep neural networks and prove several new theoretical results.
no code implementations • 9 Oct 2018 • Dmitry Yarotsky
We test our general method in the special case of linear free-knot splines, and find good agreement between theory and experiment in observations of global optima, stability of stationary points, and convergence rates.
no code implementations • 26 Apr 2018 • Dmitry Yarotsky
We prove this model to be a universal approximator for continuous SE(2)--equivariant signal transformations.
no code implementations • 10 Feb 2018 • Dmitry Yarotsky
We consider approximations of general continuous functions on finite-dimensional cubes by general deep ReLU neural networks and study the approximation rates with respect to the modulus of continuity of the function and the total number of weights $W$ in the network.
no code implementations • 3 May 2017 • Dmitry Yarotsky
We consider approximations of 1D Lipschitz functions by deep ReLU networks of a fixed width.
1 code implementation • 16 Jan 2017 • Dmitry Yarotsky
We introduce a library of geometric voxel features for CAD surface recognition/retrieval tasks.
2 code implementations • 3 Oct 2016 • Dmitry Yarotsky
We study expressive power of shallow and deep neural networks with piece-wise linear activation functions.
1 code implementation • 5 Sep 2016 • Mikhail Belyaev, Evgeny Burnaev, Ermek Kapushev, Maxim Panov, Pavel Prikhodko, Dmitry Vetrov, Dmitry Yarotsky
We describe GTApprox - a new tool for medium-scale surrogate modeling in industrial design.