no code implementations • 8 Dec 2024 • Nicholas Alonso, Beren Millidge
Recent advances have extended the context window of frontier LLMs dramatically, from a few thousand tokens up to millions, enabling entire books and codebases to fit into context.
no code implementations • 22 Nov 2024 • Paolo Glorioso, Quentin Anthony, Yury Tokpanov, Anna Golubeva, Vasudev Shyam, James Whittington, Jonathan Pilault, Beren Millidge
In this technical report, we present the Zamba2 series -- a suite of 1. 2B, 2. 7B, and 7. 4B parameter hybrid Mamba2-transformer models that achieve state of the art performance against the leading open-weights models of their class, while achieving substantial gains in inference latency, throughput, and memory efficiency.
no code implementations • 9 Nov 2024 • Yury Tokpanov, Paolo Glorioso, Quentin Anthony, Beren Millidge
In this technical report, we present Zyda-2: a five trillion token dataset for language model pretraining.
no code implementations • 13 Sep 2024 • Miguel de Llanza Varona, Christopher L. Buckley, Beren Millidge
The efficient coding hypothesis claims that organisms seek to maximize the information about the sensory input in an efficient manner.
1 code implementation • 7 Aug 2024 • Vasudev Shyam, Jonathan Pilault, Emily Shepperd, Quentin Anthony, Beren Millidge
Self-attention is the core mathematical operation of modern transformer architectures and is also a significant computational bottleneck due to its quadratic complexity in the sequence length.
1 code implementation • 4 Jun 2024 • Yury Tokpanov, Beren Millidge, Paolo Glorioso, Jonathan Pilault, Adam Ibrahim, James Whittington, Quentin Anthony
The size of large language models (LLMs) has scaled dramatically in recent years and their computational and data requirements have surged correspondingly.
1 code implementation • 29 May 2024 • Nick Alonso, Tomás Figliolia, Anthony Ndirango, Beren Millidge
There has recently been growing interest in conversational agents with long-term memory which has led to the rapid development of language models that use retrieval-augmented generation (RAG).
no code implementations • 26 May 2024 • Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington, Jonathan Pilault, Adam Ibrahim, Beren Millidge
Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay.
no code implementations • 16 Feb 2024 • Tommaso Salvatori, Beren Millidge, Yuhang Song, Rafal Bogacz, Thomas Lukasiewicz
This problem can be easily solved by computing \emph{similarities} in an embedding space instead of the pixel space.
no code implementations • 16 Feb 2024 • Alexander Ororbia, Ankur Mali, Adam Kohan, Beren Millidge, Tommaso Salvatori
As a result, it accommodates hardware and scientific modeling, e. g. learning with physical systems and non-differentiable behavior.
1 code implementation • 1 Feb 2024 • Quentin Anthony, Yury Tokpanov, Paolo Glorioso, Beren Millidge
In this paper, we present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both.
no code implementations • 27 Jun 2023 • Tommaso Salvatori, Luca Pinchetti, Amine M'Charrak, Beren Millidge, Thomas Lukasiewicz
Recently, there has been extensive research on the capabilities of biologically plausible algorithms.
no code implementations • 2 Dec 2022 • Karl J Friston, Maxwell J D Ramstead, Alex B Kiefer, Alexander Tschantz, Christopher L Buckley, Mahault Albarracin, Riddhi J Pitliya, Conor Heins, Brennan Klein, Beren Millidge, Dalton A R Sakthivadivel, Toby St Clere Smithe, Magnus Koudahl, Safae Essafi Tremblay, Capm Petersen, Kaiser Fung, Jason G Fox, Steven Swanson, Dan Mapes, Gabriel René
In this context, we understand intelligence as the capacity to accumulate evidence for a generative model of one's sensed world -- also known as self-evidencing.
no code implementations • 22 Nov 2022 • Sid Black, Lee Sharkey, Leo Grinsztajn, Eric Winsor, Dan Braun, Jacob Merizian, Kip Parker, Carlos Ramón Guevara, Beren Millidge, Gabriel Alfour, Connor Leahy
Previous mechanistic descriptions have used individual neurons or their linear combinations to understand the representations a network has learned.
no code implementations • 16 Nov 2022 • Tommaso Salvatori, Yuhang Song, Yordan Yordanov, Beren Millidge, Zhenghua Xu, Lei Sha, Cornelius Emde, Rafal Bogacz, Thomas Lukasiewicz
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
no code implementations • 7 Nov 2022 • Luca Pinchetti, Tommaso Salvatori, Yordan Yordanov, Beren Millidge, Yuhang Song, Thomas Lukasiewicz
A large amount of recent research has the far-reaching goal of finding training methods for deep neural networks that can serve as alternatives to backpropagation (BP).
1 code implementation • 6 Sep 2022 • Alex B. Kiefer, Beren Millidge, Alexander Tschantz, Christopher L. Buckley
Capsule networks are a neural network architecture specialized for visual scene recognition.
no code implementations • 15 Aug 2022 • Paul F Kinghorn, Beren Millidge, Christopher L Buckley
Predictive Coding Networks (PCNs) aim to learn a generative model of the world.
1 code implementation • 21 Jul 2022 • Beren Millidge, Yuhang Song, Tommaso Salvatori, Thomas Lukasiewicz, Rafal Bogacz
In this paper, we provide a comprehensive theoretical analysis of the properties of PCNs trained with prospective configuration.
1 code implementation • 20 Jul 2022 • Beren Millidge, Christopher L Buckley
Recent work has uncovered close links between between classical reinforcement learning algorithms, Bayesian filtering, and Active Inference which lets us understand value functions in terms of Bayesian posteriors.
1 code implementation • 1 Jun 2022 • Nick Alonso, Beren Millidge, Jeff Krichmar, Emre Neftci
Our novel implementation considerably improves the stability of IL across learning rates, which is consistent with our theory, as a key property of implicit SGD is its stability.
1 code implementation • 31 May 2022 • Beren Millidge, Yuhang Song, Tommaso Salvatori, Thomas Lukasiewicz, Rafal Bogacz
How the brain performs credit assignment is a fundamental unsolved problem in neuroscience.
no code implementations • 5 Apr 2022 • Alexander Tschantz, Beren Millidge, Anil K Seth, Christopher L Buckley
This is at odds with evidence that several aspects of visual perception - including complex forms of object recognition - arise from an initial "feedforward sweep" that occurs on fast timescales which preclude substantial recurrent activity.
no code implementations • 18 Feb 2022 • Beren Millidge, Tommaso Salvatori, Yuhang Song, Rafal Bogacz, Thomas Lukasiewicz
The backpropagation of error algorithm used to train deep neural networks has been fundamental to the successes of deep learning.
1 code implementation • 9 Feb 2022 • Beren Millidge, Tommaso Salvatori, Yuhang Song, Thomas Lukasiewicz, Rafal Bogacz
A large number of neural network models of associative memory have been proposed in the literature.
no code implementations • 31 Jan 2022 • Tommaso Salvatori, Luca Pinchetti, Beren Millidge, Yuhang Song, TianYi Bao, Rafal Bogacz, Thomas Lukasiewicz
Training with backpropagation (BP) in standard deep learning consists of two main steps: a forward pass that maps a data point to its prediction, and a backward pass that propagates the error of this prediction back through the network.
1 code implementation • 11 Jan 2022 • Conor Heins, Beren Millidge, Daphne Demekas, Brennan Klein, Karl Friston, Iain Couzin, Alexander Tschantz
Active inference is an account of cognition and behavior in complex systems which brings together action, perception, and learning under the theoretical mantle of Bayesian inference.
no code implementations • 3 Dec 2021 • Pablo Lanillos, Cristian Meo, Corrado Pezzato, Ajith Anil Meera, Mohamed Baioumy, Wataru Ohata, Alexander Tschantz, Beren Millidge, Martijn Wisse, Christopher L. Buckley, Jun Tani
Active inference is a mathematical framework which originated in computational neuroscience as a theory of how the brain implements action, perception and learning.
no code implementations • 2 Sep 2021 • Paul F. Kinghorn, Beren Millidge, Christopher L. Buckley
In cognitive science, behaviour is often separated into two types.
no code implementations • 30 Aug 2021 • Beren Millidge, Anil Seth, Christopher L Buckley
The Free-Energy-Principle (FEP) is an influential and controversial theory which postulates a deep and powerful connection between the stochastic thermodynamics of self-organization and learning through variational inference.
no code implementations • 27 Jul 2021 • Beren Millidge, Anil Seth, Christopher L Buckley
Predictive coding offers a potentially unifying account of cortical function -- postulating that the core function of the brain is to minimize prediction errors with respect to a generative model of the world.
no code implementations • 30 Jun 2021 • Beren Millidge
Firstly, we focus on predictive coding, a neurobiologically plausible process theory derived from the free energy principle which argues that the primary function of the brain is to minimize prediction errors, showing how predictive coding can be scaled up and extended to be more biologically plausible, and elucidating its close links with other methods such as Kalman Filtering.
1 code implementation • 4 Jun 2021 • Alejandro Daniel Noel, Charel van Hoof, Beren Millidge
Our model is capable of solving sparse-reward problems with a very high sample efficiency due to its objective function, which encourages directed exploration of uncertain states.
no code implementations • 3 Jun 2021 • Beren Millidge
We provide a precise characterisation of what an abstraction is and, perhaps more importantly, suggest how abstractions can be learnt directly from data both for static datasets and for dynamical systems.
no code implementations • 24 May 2021 • Miguel Aguilera, Beren Millidge, Alexander Tschantz, Christopher L. Buckley
We discover that two requirements of the FEP -- the Markov blanket condition (i. e. a statistical boundary precluding direct coupling between internal and external states) and stringent restrictions on its solenoidal flows (i. e. tendencies driving a system out of equilibrium) -- are only valid for a very narrow space of parameters.
1 code implementation • 11 Mar 2021 • Beren Millidge, Anil Seth, Christopher Buckley
We propose a dichotomy in the objective functions underlying adaptive behaviour between \emph{evidence} objectives, which correspond to well-known reward or utility maximizing objectives in the literature, and \emph{divergence} objectives which instead seek to minimize the divergence between the agent's expected and desired futures, and argue that this new class of divergence objectives could form the mathematical foundation for a much richer understanding of the exploratory components of adaptive and intelligent action, beyond simply greedy utility maximization.
1 code implementation • 19 Feb 2021 • Beren Millidge, Alexander Tschantz, Anil Seth, Christopher Buckley
The Kalman filter is a fundamental filtering algorithm that fuses noisy sensory data, a previous state estimate, and a dynamics model to produce a principled estimate of the current state.
1 code implementation • 13 Oct 2020 • Beren Millidge, Alexander Tschantz, Anil Seth, Christopher L Buckley
The recently proposed Activation Relaxation (AR) algorithm provides a simple and robust approach for approximating the backpropagation of error algorithm using only local learning rules.
no code implementations • 2 Oct 2020 • Beren Millidge, Alexander Tschantz, Anil Seth, Christopher L Buckley
Predictive coding is an influential theory of cortical function which posits that the principal computation the brain performs, which underlies both perception and learning, is the minimization of prediction errors.
1 code implementation • 11 Sep 2020 • Beren Millidge, Alexander Tschantz, Anil. K. Seth, Christopher L. Buckley
The backpropagation of error algorithm (backprop) has been instrumental in the recent success of deep learning.
no code implementations • 11 Jul 2020 • Alexander Tschantz, Beren Millidge, Anil. K. Seth, Christopher L. Buckley
The field of reinforcement learning can be split into model-based and model-free methods.
no code implementations • 23 Jun 2020 • Beren Millidge, Alexander Tschantz, Anil. K. Seth, Christopher L. Buckley
Active Inference (AIF) is an emerging framework in the brain sciences which suggests that biological agents act to minimise a variational bound on model evidence.
no code implementations • 13 Jun 2020 • Beren Millidge, Alexander Tschantz, Anil. K. Seth, Christopher L. Buckley
There are several ways to categorise reinforcement learning (RL) algorithms, such as either model-based or model-free, policy-based or planning-based, on-policy or off-policy, and online or offline.
1 code implementation • 7 Jun 2020 • Beren Millidge, Alexander Tschantz, Christopher L. Buckley
Recently, it has been shown that backprop in multilayer-perceptrons (MLPs) can be approximated using predictive coding, a biologically-plausible process theory of cortical computation which relies only on local and Hebbian updates.
no code implementations • 17 Apr 2020 • Beren Millidge, Alexander Tschantz, Christopher L. Buckley
The Expected Free Energy (EFE) is a central quantity in the theory of active inference.
no code implementations • 28 Feb 2020 • Alexander Tschantz, Beren Millidge, Anil. K. Seth, Christopher L. Buckley
The central tenet of reinforcement learning (RL) is that agents seek to maximize the sum of cumulative rewards.
2 code implementations • 8 Jul 2019 • Beren Millidge
Active Inference is a theory of action arising from neuroscience which casts action and planning as a bayesian inference problem to be solved by minimizing a single quantity - the variational free energy.