no code implementations • 8 Aug 2023 • Daria Cherniuk, Stanislav Abukhovich, Anh-Huy Phan, Ivan Oseledets, Andrzej Cichocki, Julia Gusak
Tensor decomposition of convolutional and fully-connected layers is an effective way to reduce parameters and FLOP in neural networks.
1 code implementation • 3 Jul 2023 • Xunyi Zhao, Théotime Le Hellard, Lionel Eyraud, Julia Gusak, Olivier Beaumont
We show through experiments on many models that Rockmate is as fast as Rotor and as efficient as Checkmate, and that it allows in many cases to obtain a significantly lower memory consumption for activations (by a factor of 2 to 5) for a rather negligible overhead (of the order of 10% to 20%).
no code implementations • 5 Jun 2023 • Viktoriia Chekalina, Georgii Novikov, Julia Gusak, Ivan Oseledets, Alexander Panchenko
On the downstream tasks, including language understanding and text summarization, the model performs similarly to the original GPT-2 model.
no code implementations • 21 Feb 2022 • Julia Gusak, Daria Cherniuk, Alena Shilova, Alexander Katrutsa, Daniel Bershatsky, Xunyi Zhao, Lionel Eyraud-Dubois, Oleg Shlyazhko, Denis Dimitrov, Ivan Oseledets, Olivier Beaumont
Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training.
2 code implementations • 1 Feb 2022 • Georgii Novikov, Daniel Bershatsky, Julia Gusak, Alex Shonenkov, Denis Dimitrov, Ivan Oseledets
Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients.
2 code implementations • 31 Jan 2022 • Daniel Bershatsky, Aleksandr Mikhalev, Alexandr Katrutsa, Julia Gusak, Daniil Merkulov, Ivan Oseledets
Also, we investigate the variance of the gradient estimate induced by the randomized matrix multiplication.
1 code implementation • 15 Mar 2021 • Julia Gusak, Alexandr Katrutsa, Talgat Daulbaev, Andrzej Cichocki, Ivan Oseledets
Moreover, we show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks.
no code implementations • ECCV 2020 • Anh-Huy Phan, Konstantin Sobolev, Konstantin Sozykin, Dmitry Ermilov, Julia Gusak, Petr Tichavsky, Valeriy Glukhov, Ivan Oseledets, Andrzej Cichocki
Most state of the art deep neural networks are overparameterized and exhibit a high computational cost.
1 code implementation • ICLR Workshop DeepDiffEq 2019 • Julia Gusak, Larisa Markeeva, Talgat Daulbaev, Alexandr Katrutsa, Andrzej Cichocki, Ivan Oseledets
Normalization is an important and vastly investigated technique in deep learning.
1 code implementation • NeurIPS 2020 • Talgat Daulbaev, Alexandr Katrutsa, Larisa Markeeva, Julia Gusak, Andrzej Cichocki, Ivan Oseledets
We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models.
1 code implementation • 29 Oct 2019 • Chunfeng Cui, Kaiqi Zhang, Talgat Daulbaev, Julia Gusak, Ivan Oseledets, Zheng Zhang
Secondly, we propose analyzing the vulnerability of a neural network using active subspace and finding an additive universal adversarial attack vector that can misclassify a dataset with a high probability.
no code implementations • 15 Oct 2019 • Julia Gusak, Talgat Daulbaev, Evgeny Ponomarev, Andrzej Cichocki, Ivan Oseledets
We introduce a new method for speeding up the inference of deep neural networks.
3 code implementations • 24 Mar 2019 • Julia Gusak, Maksym Kholiavchenko, Evgeny Ponomarev, Larisa Markeeva, Ivan Oseledets, Andrzej Cichocki
The low-rank tensor approximation is very promising for the compression of deep neural networks.