no code implementations • 10 Feb 2025 • Alex Vitvitskyi, João G. M. Araújo, Marc Lackenby, Petar Veličković
As implied by the plethora of literature on graph rewiring, the choice of computational graph employed by a neural network can make a significant impact on its downstream performance.
1 code implementation • 25 Jun 2024 • Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro
This paper conducts an extensive empirical study focusing on the reliability of hyper-parameter selection for value-based deep reinforcement learning agents, including the introduction of a new score to quantify the consistency and reliability of various hyper-parameters.
no code implementations • 6 Jun 2024 • Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João G. M. Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković
We rely on a theoretical signal propagation analysis -- specifically, we analyse the representations of the last token in the final layer of the Transformer, as this is the representation used for next-token prediction.
no code implementations • 23 Feb 2024 • Bruno Gavranović, Paul Lessard, Andrew Dudzik, Tamara von Glehn, João G. M. Araújo, Petar Veličković
We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures.
1 code implementation • 5 Feb 2024 • Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut, Chang Ye, Zichen Liu, Lucas N. Alegre, Alexander Nikulin, Xiao Hu, Tianlin Liu, Jongwook Choi, Brent Yi
As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone.
no code implementations • 13 Apr 2022 • Joanna Yoo, Kuba Perlin, Siddhartha Rao Kamalakara, João G. M. Araújo
Modern large language models require distributed training strategies due to their size.
no code implementations • 25 Oct 2021 • Helen Ngo, João G. M. Araújo, Jeffrey Hui, Nicholas Frosst
The One Billion Word Benchmark is a dataset derived from the WMT 2011 News Crawl, commonly used to measure language modeling ability in natural language processing.
no code implementations • 4 Aug 2021 • Helen Ngo, Cooper Raterink, João G. M. Araújo, Ivan Zhang, Carol Chen, Adrien Morisot, Nicholas Frosst
Language models trained on large-scale unfiltered datasets curated from the open web acquire systemic biases, prejudices, and harmful views from their training data.