Search Results for author: João G. M. Araújo

Found 8 papers, 2 papers with code

What makes a good feedforward computational graph?

no code implementations10 Feb 2025 Alex Vitvitskyi, João G. M. Araújo, Marc Lackenby, Petar Veličković

As implied by the plethora of literature on graph rewiring, the choice of computational graph employed by a neural network can make a significant impact on its downstream performance.

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

1 code implementation25 Jun 2024 Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro

This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters.

Deep Reinforcement Learning reinforcement-learning

Transformers need glasses! Information over-squashing in language tasks

no code implementations6 Jun 2024 Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João G. M. Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković

We rely on a theoretical signal propagation analysis -- specifically, we analyse the representations of the last token in the final layer of the Transformer, as this is the representation used for next-token prediction.

Decoder

Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

no code implementations23 Feb 2024 Bruno Gavranović, Paul Lessard, Andrew Dudzik, Tamara von Glehn, João G. M. Araújo, Petar Veličković

We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures.

All Deep Learning +1

Scalable Training of Language Models using JAX pjit and TPUv4

no code implementations13 Apr 2022 Joanna Yoo, Kuba Perlin, Siddhartha Rao Kamalakara, João G. M. Araújo

Modern large language models require distributed training strategies due to their size.

No News is Good News: A Critique of the One Billion Word Benchmark

no code implementations25 Oct 2021 Helen Ngo, João G. M. Araújo, Jeffrey Hui, Nicholas Frosst

The One Billion Word Benchmark is a dataset derived from the WMT 2011 News Crawl, commonly used to measure language modeling ability in natural language processing.

Language Modeling Language Modelling

Mitigating harm in language models with conditional-likelihood filtration

no code implementations4 Aug 2021 Helen Ngo, Cooper Raterink, João G. M. Araújo, Ivan Zhang, Carol Chen, Adrien Morisot, Nicholas Frosst

Language models trained on large-scale unfiltered datasets curated from the open web acquire systemic biases, prejudices, and harmful views from their training data.

Language Modeling Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.