no code implementations • 7 Oct 2024 • Jingwei Zuo, Maksim Velikanov, Dhia Eddine Rhaiem, Ilyas Chahed, Younes Belkada, Guillaume Kunsch, Hakim Hacid
It is on par with Gemma 7B and outperforms models with different architecture designs, such as RecurrentGemma 9B and RWKV-v6 Finch 7B/14B.
no code implementations • 5 Oct 2024 • Dmitry Yarotsky, Maksim Velikanov
In the non-stochastic setting, the optimal exponent $\xi$ in the loss convergence $L_t\sim C_Lt^{-\xi}$ is double that in plain GD and is achievable using Heavy Ball (HB) with a suitable schedule; this no longer works in the presence of mini-batch noise.
no code implementations • 20 Jul 2024 • Quentin Malartic, Nilabhra Roy Chowdhury, Ruxandra Cojocaru, Mugariya Farooq, Giulia Campesan, Yasser Abdelaziz Dahou Djilali, Sanath Narayan, Ankit Singh, Maksim Velikanov, Basma El Amel Boussaha, Mohammed Al-Yafeai, Hamza Alobeidli, Leen Al Qadi, Mohamed El Amine Seddik, Kirill Fedyanin, REDA ALAMI, Hakim Hacid
We introduce Falcon2-11B, a foundation model trained on over five trillion tokens, and its multimodal counterpart, Falcon2-11B-vlm, which is a vision-to-text model.
no code implementations • 18 Mar 2024 • Maksim Velikanov, Maxim Panov, Dmitry Yarotsky
In the present work, we consider the training of kernels with a family of $\textit{spectral algorithms}$ specified by profile $h(\lambda)$, and including KRR and GD as special cases.
no code implementations • 25 Dec 2023 • Vincent Plassier, Nikita Kotelevskii, Aleksandr Rubashevskii, Fedor Noskov, Maksim Velikanov, Alexander Fishkov, Samuel Horvath, Martin Takac, Eric Moulines, Maxim Panov
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification, which is crucial for ensuring the reliability of predictions.
1 code implementation • 22 Jun 2022 • Maksim Velikanov, Denis Kuznedelev, Dmitry Yarotsky
Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models.
no code implementations • 24 Feb 2022 • Maksim Velikanov, Roman Kail, Ivan Anokhin, Roman Vashurin, Maxim Panov, Alexey Zaytsev, Dmitry Yarotsky
In this limit, we identify two ensemble regimes - independent and collective - depending on the architecture and initialization strategy of ensemble models.
no code implementations • 2 Feb 2022 • Maksim Velikanov, Dmitry Yarotsky
In this paper, we propose a new spectral condition providing tighter upper bounds for problems with power law optimization trajectories.
no code implementations • NeurIPS 2021 • Maksim Velikanov, Dmitry Yarotsky
Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values.
no code implementations • 2 May 2021 • Maksim Velikanov, Dmitry Yarotsky
Current theoretical results on optimization trajectories of neural networks trained by gradient descent typically have the form of rigorous but potentially loose bounds on the loss values.