no code implementations • 11 Aug 2023 • Andrea Gesmundo, Kaitlin Maile
Training state-of-the-art neural networks requires a high cost in terms of compute and time.
no code implementations • 6 Feb 2023 • Andrea Gesmundo
Diverse agents can compete to produce the best performing model for a task by reusing the modules introduced to the system by competing agents.
1 code implementation • 29 Sep 2022 • Andrea Gesmundo
We believe that this novel methodology for ML development can be demonstrated through a modularized representation of ML models and the definition of novel abstractions allowing to implement and execute diverse methods for the asynchronous use and extension of modular intelligent systems.
1 code implementation • 15 Sep 2022 • Andrea Gesmundo
This methodology has multiple efficiency and scalability disadvantages, such as leading to spend significant resources into the creation of multiple trial models that do not contribute to the final solution. The presented work is based on the intuition that defining ML models as modular and extensible artefacts allows to introduce a novel ML development methodology enabling the integration of multiple design and evaluation iterations into the continuous enrichment of a single unbounded intelligent system.
Ranked #1 on
Image Classification
on cats_vs_dogs
1 code implementation • 25 May 2022 • Andrea Gesmundo, Jeff Dean
Multitask learning assumes that models capable of learning from multiple tasks can achieve better quality and efficiency via knowledge transfer, a key feature of human learning.
Ranked #1 on
Image Classification
on KMNIST
no code implementations • 22 May 2022 • Andrea Gesmundo, Jeff Dean
We propose a method that uses the layers of a pretrained deep neural network as building blocks to construct an ML system that can jointly solve an arbitrary number of tasks.
3 code implementations • 31 Mar 2022 • Adam Roberts, Hyung Won Chung, Anselm Levskaya, Gaurav Mishra, James Bradbury, Daniel Andor, Sharan Narang, Brian Lester, Colin Gaffney, Afroz Mohiuddin, Curtis Hawthorne, Aitor Lewkowycz, Alex Salcianu, Marc van Zee, Jacob Austin, Sebastian Goodman, Livio Baldini Soares, Haitang Hu, Sasha Tsvyashchenko, Aakanksha Chowdhery, Jasmijn Bastings, Jannis Bulian, Xavier Garcia, Jianmo Ni, Andrew Chen, Kathleen Kenealy, Jonathan H. Clark, Stephan Lee, Dan Garrette, James Lee-Thorp, Colin Raffel, Noam Shazeer, Marvin Ritter, Maarten Bosma, Alexandre Passos, Jeremy Maitin-Shepard, Noah Fiedel, Mark Omernick, Brennan Saeta, Ryan Sepassi, Alexander Spiridonov, Joshua Newlan, Andrea Gesmundo
Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves.
no code implementations • 9 Sep 2020 • Mark Collier, Efi Kokiopoulou, Andrea Gesmundo, Jesse Berent
We propose the use of sparse routing networks for continual learning.
no code implementations • 26 Nov 2019 • Alina Dubatovka, Efi Kokiopoulou, Luciano Sbaiz, Andrea Gesmundo, Gabor Bartok, Jesse Berent
However, it requires a large amount of computing resources and in order to alleviate this, a performance prediction network has been recently proposed that enables efficient architecture search by forecasting the performance of candidate architectures, instead of relying on actual model training.
no code implementations • 10 Oct 2019 • Krzysztof Maziarz, Efi Kokiopoulou, Andrea Gesmundo, Luciano Sbaiz, Gabor Bartok, Jesse Berent
The binary allocation variables are learned jointly with the model parameters by standard back-propagation thanks to the Gumbel-Softmax reparametrization method.
Ranked #1 on
Multi-Task Learning
on OMNIGLOT
no code implementations • 25 Sep 2019 • Krzysztof Maziarz, Mingxing Tan, Andrey Khorlin, Kuang-Yu Samuel Chang, Andrea Gesmundo
We show that the Evo-NAS agent outperforms both neural and evolutionary agents when applied to architecture search for a suite of text and image classification benchmarks.
no code implementations • 25 Sep 2019 • Krzysztof Maziarz, Efi Kokiopoulou, Andrea Gesmundo, Luciano Sbaiz, Gabor Bartok, Jesse Berent
We propose the Gumbel-Matrix routing, a novel multi-task routing method based on the Gumbel-Softmax, that is designed to learn fine-grained parameter sharing.
3 code implementations • 30 Jul 2019 • Iulia M. Comsa, Krzysztof Potempa, Luca Versari, Thomas Fischbacher, Andrea Gesmundo, Jyrki Alakuijala
The timing of individual neuronal spikes is essential for biological brains to make fast responses to sensory stimuli.
no code implementations • 19 Jun 2019 • Zalán Borsos, Andrey Khorlin, Andrea Gesmundo
Recent advances in Neural Architecture Search (NAS) have produced state-of-the-art architectures on several tasks.
no code implementations • 15 Feb 2019 • Efi Kokiopoulou, Anja Hauth, Luciano Sbaiz, Andrea Gesmundo, Gabor Bartok, Jesse Berent
At the core of our framework lies a deep value network that can predict the performance of input architectures on a task by utilizing task meta-features and the previous model training experiments performed on related tasks.
16 code implementations • 2 Feb 2019 • Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly
On GLUE, we attain within 0. 4% of the performance of full fine-tuning, adding only 3. 6% parameters per task.
Ranked #4 on
Image Classification
on OmniBenchmark
(using extra training data)
no code implementations • 27 Dec 2018 • Stanisław Jastrzębski, Quentin de Laroussilhe, Mingxing Tan, Xiao Ma, Neil Houlsby, Andrea Gesmundo
However, the success of NAS depends on the definition of the search space.
no code implementations • 24 Nov 2018 • Krzysztof Maziarz, Mingxing Tan, Andrey Khorlin, Marin Georgiev, Andrea Gesmundo
We show that the Evo-NAS agent outperforms both neural and evolutionary agents when applied to architecture search for a suite of text and image classification benchmarks.
no code implementations • NeurIPS 2018 • Catherine Wong, Neil Houlsby, Yifeng Lu, Andrea Gesmundo
We extend RL-based architecture search methods to support parallel training on multiple tasks and then transfer the search strategy to new tasks.
no code implementations • 23 Jan 2018 • Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang
We analyze the language learned by an agent trained with reinforcement learning as a component of the ActiveQA system [Buck et al., 2017].
no code implementations • ICLR 2018 • Catherine Wong, Andrea Gesmundo
We demonstrate that MNMS can conduct an automated architecture search for multiple tasks simultaneously while still learning well-performing, specialized models for each task.
2 code implementations • ICLR 2018 • Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang
The agent probes the system with, potentially many, natural language reformulations of an initial question and aggregates the returned evidence to yield the best answer.
no code implementations • LREC 2012 • Andrea Gesmundo, Tanja Samard{\v{z}}i{\'c}
We present a novel tool for morphological analysis of Serbian, which is a low-resource language with rich morphology.