Search Results for author: Aleksandar Botev

Found 17 papers, 9 papers with code

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

1 code implementation • 11 Apr 2024 • Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent SIfre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Armand Joulin, Noah Fiedel, Evan Senter, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, David Budden, Arnaud Doucet, Sharad Vikram, Adam Paszke, Trevor Gale, Sebastian Borgeaud, Charlie Chen, Andy Brock, Antonia Paterson, Jenny Brennan, Meg Risdal, Raj Gundluru, Nesh Devanathan, Paul Mooney, Nilay Chauhan, Phil Culliton, Luiz GUStavo Martins, Elisa Bandy, David Huntsperger, Glenn Cameron, Arthur Zucker, Tris Warkentin, Ludovic Peran, Minh Giang, Zoubin Ghahramani, Clément Farabet, Koray Kavukcuoglu, Demis Hassabis, Raia Hadsell, Yee Whye Teh, Nando de Frietas

We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture.

Language Modelling

534

Paper
Code

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

2 code implementations • 29 Feb 2024 • Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando de Freitas, Caglar Gulcehre

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale.

Language Modelling

534

Paper
Code

Applications of flow models to the generation of correlated lattice QCD ensembles

no code implementations • 19 Jan 2024 • Ryan Abbott, Aleksandar Botev, Denis Boyda, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban

Machine-learned normalizing flows can be used in the context of lattice quantum field theory to generate statistically correlated ensembles of lattice gauge fields at different action parameters.

Paper
Add Code

Normalizing flows for lattice gauge theory in arbitrary space-time dimension

no code implementations • 3 May 2023 • Ryan Abbott, Michael S. Albergo, Aleksandar Botev, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Alexander G. D. G. Matthews, Sébastien Racanière, Ali Razavi, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban

Applications of normalizing flows to the sampling of field configurations in lattice gauge theory have so far been explored almost exclusively in two space-time dimensions.

Paper
Add Code

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

no code implementations • 20 Feb 2023 • Bobby He, James Martens, Guodong Zhang, Aleksandar Botev, Andrew Brock, Samuel L Smith, Yee Whye Teh

Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood.

Paper
Add Code

Aspects of scaling and scalability for flow-based sampling of lattice QCD

no code implementations • 14 Nov 2022 • Ryan Abbott, Michael S. Albergo, Aleksandar Botev, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Alexander G. D. G. Matthews, Sébastien Racanière, Ali Razavi, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, Julian M. Urban

Recent applications of machine-learned normalizing flows to sampling in lattice field theory suggest that such methods may be able to mitigate critical slowing down and topological freezing.

Paper
Add Code

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers

1 code implementation • ICLR 2022 • Guodong Zhang, Aleksandar Botev, James Martens

However, this method (called Deep Kernel Shaping) isn't fully compatible with ReLUs, and produces networks that overfit significantly more than ResNets on ImageNet.

Paper
Code

SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision

1 code implementation • NeurIPS 2021 • Irina Higgins, Peter Wirnsberger, Andrew Jaegle, Aleksandar Botev

Using SyMetric, we identify a set of architectural choices that significantly improve the performance of a previously proposed model for inferring latent dynamics from pixels, the Hamiltonian Generative Network (HGN).

Autonomous Driving Image Reconstruction

Paper
Code

Which priors matter? Benchmarking models for learning latent dynamics

2 code implementations • 9 Nov 2021 • Aleksandar Botev, Andrew Jaegle, Peter Wirnsberger, Daniel Hennes, Irina Higgins

Learning dynamics is at the heart of many important applications of machine learning (ML), such as robotics and autonomous driving.

Autonomous Driving Benchmarking

Paper
Code

Better, Faster Fermionic Neural Networks

2 code implementations • 13 Nov 2020 • James S. Spencer, David Pfau, Aleksandar Botev, W. M. C. Foulkes

The Fermionic Neural Network (FermiNet) is a recently-developed neural network architecture that can be used as a wavefunction Ansatz for many-electron systems, and has already demonstrated high accuracy on small systems.

645

Paper
Code

Disentangling by Subspace Diffusion

1 code implementation • NeurIPS 2020 • David Pfau, Irina Higgins, Aleksandar Botev, Sébastien Racanière

We present a novel nonparametric algorithm for symmetry-based disentangling of data manifolds, the Geometric Manifold Component Estimator (GEOMANCER).

Metric Learning Representation Learning

12,870

Paper
Code

Hamiltonian Generative Networks

1 code implementation • ICLR 2020 • Peter Toth, Danilo Jimenez Rezende, Andrew Jaegle, Sébastien Racanière, Aleksandar Botev, Irina Higgins

The Hamiltonian formalism plays a central role in classical and quantum physics.

Paper
Code

Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting

no code implementations • NeurIPS 2018 • Hippolyt Ritter, Aleksandar Botev, David Barber

In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature.

Permuted-MNIST

Paper
Add Code

A Scalable Laplace Approximation for Neural Networks

1 code implementation • ICLR 2018 • Hippolyt Ritter, Aleksandar Botev, David Barber

Pytorch implementations of Bayes By Backprop, MC Dropout, SGLD, the Local Reparametrization Trick, KF-Laplace and more

Bayesian Inference

1,758

Paper
Code

Practical Gauss-Newton Optimisation for Deep Learning

no code implementations • ICML 2017 • Aleksandar Botev, Hippolyt Ritter, David Barber

We present an efficient block-diagonal ap- proximation to the Gauss-Newton matrix for feedforward neural networks.

Paper
Add Code

Nesterov's Accelerated Gradient and Momentum as approximations to Regularised Update Descent

no code implementations • 7 Jul 2016 • Aleksandar Botev, Guy Lever, David Barber

We present a unifying framework for adapting the update direction in gradient-based iterative optimization methods.

Paper
Add Code

Dealing with a large number of classes -- Likelihood, Discrimination or Ranking?

no code implementations • 22 Jun 2016 • David Barber, Aleksandar Botev

We consider training probabilistic classifiers in the case of a large number of classes.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.