no code implementations • 9 Oct 2024 • Rhea Sanjay Sukthanker, Benedikt Staffler, Frank Hutter, Aaron Klein
Large language models (LLMs) exhibit remarkable reasoning abilities, allowing them to generalize across a wide range of downstream tasks, such as commonsense reasoning or instruction following.
2 code implementations • 16 May 2024 • Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Aaron Klein, Lennart Purucker, Joerg K. H. Franke, Frank Hutter
The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance.
1 code implementation • 28 Feb 2024 • Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter
Pareto front profiling in multi-objective optimization (MOO), i. e. finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives like neural network training.
no code implementations • 16 Dec 2023 • Rhea Sanjay Sukthanker, Arjun Krishnakumar, Mahmoud Safari, Frank Hutter
%Due to the inherent differences in the structure of these search spaces, these Since weight-entanglement poses compatibility challenges for gradient-based NAS methods, these two paradigms have largely developed independently in parallel sub-communities.
2 code implementations • NeurIPS 2023 • Samuel Dooley, Rhea Sanjay Sukthanker, John P. Dickerson, Colin White, Frank Hutter, Micah Goldblum
Our search outputs a suite of models which Pareto-dominate all other high-performance architectures and existing bias mitigation methods in terms of accuracy and fairness, often by large margins, on the two most widely used datasets for face identification, CelebA and VGGFace2.
no code implementations • CVPR 2022 • Rhea Sanjay Sukthanker, Zhiwu Huang, Suryansh Kumar, Radu Timofte, Luc van Gool
The key idea is to exploit a masked scheme of these two attentions to learn long-range data dependencies in the context of generative flows.
no code implementations • 17 Jan 2021 • Yan Wu, Zhiwu Huang, Suryansh Kumar, Rhea Sanjay Sukthanker, Radu Timofte, Luc van Gool
Modern solutions to the single image super-resolution (SISR) problem using deep neural networks aim not only at better performance accuracy but also at a lighter and computationally efficient model.
1 code implementation • 27 Oct 2020 • Rhea Sanjay Sukthanker, Zhiwu Huang, Suryansh Kumar, Erik Goron Endsjo, Yan Wu, Luc van Gool
To address this problem, we first introduce a geometrically rich and diverse SPD neural architecture search space for an efficient SPD cell design.