2 code implementations • Nature 2022 • Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis, Pushmeet Kohli
Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor’s algorithm improves on Strassen’s two-level algorithm for the first time, to our knowledge, since its discovery 50 years ago2.
2 code implementations • 5 Oct 2021 • James Martens, Andy Ballard, Guillaume Desjardins, Grzegorz Swirszcz, Valentin Dalibard, Jascha Sohl-Dickstein, Samuel S. Schoenholz
Using an extended and formalized version of the Q/C map analysis of Poole et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the "shape" of the network's initialization-time kernel function.
no code implementations • CVPR 2021 • Mateusz Malinowski, Dimitrios Vytiniotis, Grzegorz Swirszcz, Viorica Patraucean, Joao Carreira
How can neural networks be trained on large-volume temporal data efficiently?
no code implementations • CVPR 2020 • Mateusz Malinowski, Grzegorz Swirszcz, Joao Carreira, Viorica Patraucean
We propose Sideways, an approximate backpropagation scheme for training video models.
no code implementations • ICLR 2019 • Chongli Qin, Krishnamurthy, Dvijotham, Brendan O'Donoghue, Rudy Bunel, Robert Stanforth, Sven Gowal, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli
We show that a number of important properties of interest can be modeled within this class, including conservation of energy in a learned dynamics model of a physical system; semantic consistency of a classifier's output labels under adversarial perturbations and bounding errors in a system that predicts the summation of handwritten digits.
no code implementations • 6 Feb 2019 • Wojciech Marian Czarnecki, Razvan Pascanu, Simon Osindero, Siddhant M. Jayakumar, Grzegorz Swirszcz, Max Jaderberg
The transfer of knowledge from one policy to another is an important tool in Deep Reinforcement Learning.
no code implementations • ICLR 2019 • Edward Grefenstette, Robert Stanforth, Brendan O'Donoghue, Jonathan Uesato, Grzegorz Swirszcz, Pushmeet Kohli
We show that increasing the number of parameters in adversarially-trained models increases their robustness, and in particular that ensembling smaller models while adversarially training the entire ensemble as a single model is a more efficient way of spending said budget than simply using a larger single model.
1 code implementation • 19 Nov 2016 • Grzegorz Swirszcz, Wojciech Marian Czarnecki, Razvan Pascanu
Given that deep networks are highly nonlinear systems optimized by local gradient methods, why do they not seem to be affected by bad local minima?
no code implementations • NeurIPS 2009 • Grzegorz Swirszcz, Naoki Abe, Aurelie C. Lozano
We consider the problem of variable group selection for least squares regression, namely, that of selecting groups of variables for best regression performance, leveraging and adhering to a natural grouping structure within the explanatory variables.