1 code implementation • 27 Jan 2023 • Max Ryabinin, Tim Dettmers, Michael Diskin, Alexander Borzunov
Many deep learning applications benefit from using large models with billions of parameters.
no code implementations • 2 Sep 2022 • Alexander Borzunov, Dmitry Baranchuk, Tim Dettmers, Max Ryabinin, Younes Belkada, Artem Chumachenko, Pavel Samygin, Colin Raffel
However, these techniques have innate limitations: offloading is too slow for interactive inference, while APIs are not flexible enough for research that requires access to weights, attention or logits.
1 code implementation • 7 Jul 2022 • Alexander Borzunov, Max Ryabinin, Tim Dettmers, Quentin Lhoest, Lucile Saulnier, Michael Diskin, Yacine Jernite, Thomas Wolf
The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions.
3 code implementations • 21 Jun 2021 • Eduard Gorbunov, Alexander Borzunov, Michael Diskin, Max Ryabinin
Training such models requires a lot of computational resources (e. g., HPC clusters) that are not available to small research groups and independent researchers.
2 code implementations • NeurIPS 2021 • Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko
Modern deep learning applications require increasingly more compute to train state-of-the-art models.