1 code implementation • 19 Nov 2024 • Riccardo Grazzi, Julien Siems, Jörg K. H. Franke, Arber Zela, Frank Hutter, Massimiliano Pontil
We extend this result to non-diagonal LRNNs, which have recently shown promise in models such as DeltaNet.
1 code implementation • 25 Oct 2024 • Sebastian Pineda Arango, Maciej Janowski, Lennart Purucker, Arber Zela, Frank Hutter, Josif Grabocka
Finetuning is a common practice widespread across different communities to adapt pretrained models to particular tasks.
1 code implementation • 6 Oct 2024 • Sebastian Pineda Arango, Maciej Janowski, Lennart Purucker, Arber Zela, Frank Hutter, Josif Grabocka
In this study, we explore employing neural networks as ensemble methods, emphasizing the significance of dynamic ensembling to leverage diverse model predictions adaptively.
no code implementations • 6 Oct 2024 • Andreas Mueller, Julien Siems, Harsha Nori, David Salinas, Arber Zela, Rich Caruana, Frank Hutter
Generalized Additive Models (GAMs) are widely recognized for their ability to create fully interpretable machine learning models for tabular data.
2 code implementations • 16 May 2024 • Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Aaron Klein, Lennart Purucker, Joerg K. H. Franke, Frank Hutter
The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance.
1 code implementation • 28 Feb 2024 • Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter
Pareto front profiling in multi-objective optimization (MOO), i. e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training.
no code implementations • 20 Jan 2023 • Colin White, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela, Debadeepta Dey, Frank Hutter
Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas.
Natural Language Understanding Neural Architecture Search +2
1 code implementation • 6 Oct 2022 • Arjun Krishnakumar, Colin White, Arber Zela, Renbo Tu, Mahmoud Safari, Frank Hutter
Zero-cost proxies (ZC proxies) are a recent architecture performance prediction technique aiming to significantly speed up algorithms for neural architecture search (NAS).
no code implementations • 15 Feb 2022 • Thomas Elsken, Arber Zela, Jan Hendrik Metzen, Benedikt Staffler, Thomas Brox, Abhinav Valada, Frank Hutter
The success of deep learning in recent years has lead to a rising demand for neural network architecture engineering.
1 code implementation • ICLR 2022 • Yash Mehta, Colin White, Arber Zela, Arjun Krishnakumar, Guri Zabergja, Shakiba Moradian, Mahmoud Safari, Kaicheng Yu, Frank Hutter
The release of tabular benchmarks, such as NAS-Bench-101 and NAS-Bench-201, has significantly lowered the computational overhead for conducting scientific research in neural architecture search (NAS).
no code implementations • 9 Jul 2021 • Ashwin Raaghav Narayanan, Arber Zela, Tonmoy Saikia, Thomas Brox, Frank Hutter
Ensembles of CNN models trained with different seeds (also known as Deep Ensembles) are known to achieve superior performance over a single copy of the CNN.
no code implementations • 8 Jul 2021 • Thomas Elsken, Benedikt Staffler, Arber Zela, Jan Hendrik Metzen, Frank Hutter
While neural architecture search methods have been successful in previous years and led to new state-of-the-art performance on various problems, they have also been criticized for being unstable, being highly sensitive with respect to their hyperparameters, and often not performing better than random search.
1 code implementation • NeurIPS 2021 • Colin White, Arber Zela, Binxin Ru, Yang Liu, Frank Hutter
Early methods in the rapidly developing field of neural architecture search (NAS) required fully training thousands of neural networks.
1 code implementation • 1 Jan 2021 • Michael Ruchte, Arber Zela, Julien Niklas Siems, Josif Grabocka, Frank Hutter
Neural Architecture Search (NAS) is one of the focal points for the Deep Learning community, but reproducing NAS methods is extremely challenging due to numerous low-level implementation details.
2 code implementations • 9 Oct 2020 • Jovita Lukasik, David Friede, Arber Zela, Frank Hutter, Margret Keuper
We evaluate the proposed approach on neural architectures defined by the ENAS approach, the NAS-Bench-101 and the NAS-Bench-201 search space and show that our smooth embedding space allows to directly extrapolate the performance prediction to architectures outside the seen domain (e. g. with more operations).
1 code implementation • ICLR 2022 • Arber Zela, Julien Siems, Lucas Zimmer, Jovita Lukasik, Margret Keuper, Frank Hutter
We show that surrogate NAS benchmarks can model the true performance of architectures better than tabular benchmarks (at a small fraction of the cost), that they lead to faithful estimates of how well different NAS methods work on the original non-surrogate benchmark, and that they can generate new scientific insight.
1 code implementation • NeurIPS 2021 • Sheheryar Zaidi, Arber Zela, Thomas Elsken, Chris Holmes, Frank Hutter, Yee Whye Teh
On a variety of classification tasks and modern architecture search spaces, we show that the resulting ensembles outperform deep ensembles not only in terms of accuracy but also uncertainty calibration and robustness to dataset shift.
1 code implementation • ICLR 2020 • Arber Zela, Julien Siems, Frank Hutter
One-shot neural architecture search (NAS) has played a crucial role in making NAS methods computationally feasible in practice.
1 code implementation • ICLR 2020 • Arber Zela, Thomas Elsken, Tonmoy Saikia, Yassine Marrakchi, Thomas Brox, Frank Hutter
Differentiable Architecture Search (DARTS) has attracted a lot of attention due to its simplicity and small search costs achieved by a continuous relaxation and an approximation of the resulting bi-level optimization problem.
1 code implementation • ICCV 2019 • Tonmoy Saikia, Yassine Marrakchi, Arber Zela, Frank Hutter, Thomas Brox
In this work, we show how to use and extend existing AutoML techniques to efficiently optimize large-scale U-Net-like encoder-decoder architectures.
3 code implementations • 18 Jul 2018 • Arber Zela, Aaron Klein, Stefan Falkner, Frank Hutter
While existing work on neural architecture search (NAS) tunes hyperparameters in a separate post-processing step, we demonstrate that architectural choices and other hyperparameter settings interact in a way that can render this separation suboptimal.