no code implementations • 12 Feb 2025 • Zikai Zhou, Qizheng Zhang, Hermann Kumbong, Kunle Olukotun
Fine-tuning large language models (LLMs) is increasingly costly as models scale to hundreds of billions of parameters, and even parameter-efficient fine-tuning (PEFT) methods like LoRA remain resource-intensive.
1 code implementation • 4 Feb 2025 • Genghan Zhang, Weixin Liang, Olivia Hsu, Kunle Olukotun
In order to evaluate the effectiveness of our system, we construct a benchmark of a typical ML library and generate ASPL code with both open and closed-source LLMs on this benchmark.
no code implementations • 13 May 2024 • Raghu Prabhakar, Ram Sivaramakrishnan, Darshan Gandhi, Yun Du, Mingran Wang, XiangYu Song, Kejie Zhang, Tianren Gao, Angela Wang, Karen Li, Yongning Sheng, Joshua Brot, Denis Sokolov, Apurv Vivek, Calvin Leung, Arjun Sabnis, Jiayu Bai, Tuowen Zhao, Mark Gottscho, David Jackson, Mark Luttrell, Manish K. Shah, Edison Chen, Kaizhao Liang, Swayambhoo Jain, Urmish Thakker, Dawei Huang, Sumti Jairath, Kevin J. Brown, Kunle Olukotun
In this paper, we describe how combining CoE, streaming dataflow, and a three-tier memory system scales the AI memory wall.
1 code implementation • 1 Dec 2022 • Erik Hellsten, Artur Souza, Johannes Lenfers, Rubens Lacouture, Olivia Hsu, Adel Ejjeh, Fredrik Kjolstad, Michel Steuwer, Kunle Olukotun, Luigi Nardi
We introduce the Bayesian Compiler Optimization framework (BaCO), a general purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs.
no code implementations • 2 Feb 2022 • Matthew Feldman, Tian Zhao, Kunle Olukotun
As programmers turn to software-defined hardware (SDH) to maintain a high level of productivity while programming hardware to run complex algorithms, heavy-lifting must be done by the compiler to automatically partition on-chip arrays.
no code implementations • 28 Sep 2020 • Artur Souza, Luigi Nardi, Leonardo Oliveira, Kunle Olukotun, Marius Lindauer, Frank Hutter
While Bayesian Optimization (BO) is a very popular method for optimizing expensive black-box functions, it fails to leverage the experience of domain experts.
no code implementations • 25 Jun 2020 • Artur Souza, Luigi Nardi, Leonardo B. Oliveira, Kunle Olukotun, Marius Lindauer, Frank Hutter
We show that BOPrO is around 6. 67x faster than state-of-the-art methods on a common suite of benchmarks, and achieves a new state-of-the-art performance on a real-world hardware design application.
no code implementations • 12 Feb 2020 • Tushar Swamy, Alexander Rucker, Muhammad Shahbaz, Ishan Gaur, Kunle Olukotun
Emerging applications -- cloud computing, the internet of things, and augmented/virtual reality -- demand responsive, secure, and scalable datacenter networks.
no code implementations • 26 Sep 2019 • Tian Zhao, Yaqi Zhang, Kunle Olukotun
Recurrent Neural Network (RNN) applications form a major class of AI-powered, low-latency data center workloads.
no code implementations • 24 May 2019 • Rekha Singhal, Nathan Zhang, Luigi Nardi, Muhammad Shahbaz, Kunle Olukotun
Modern real-time business analytic consist of heterogeneous workloads (e. g, database queries, graph processing, and machine learning).
1 code implementation • 26 Apr 2019 • Artur Souza, Leonardo B. Oliveira, Sabine Hollatz, Matt Feldman, Kunle Olukotun, James M. Holton, Aina E. Cohen, Luigi Nardi
In this paper, we introduce a new serial crystallography dataset comprised of real and synthetic images; the synthetic images are generated through the use of a simulator that is both scalable and accurate.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 11 Oct 2018 • Luigi Nardi, David Koeplinger, Kunle Olukotun
The proposed methodology follows a white-box model which is simple to understand and interpret (unlike, for example, neural networks) and can be used by the user to better understand the results of the automatic search.
no code implementations • 27 Sep 2018 • Artur Souza, Leonardo B. Oliveira, Sabine Hollatz, Matt Feldman, Kunle Olukotun, James M. Holton, Aina E. Cohen, Luigi Nardi
In this paper, we introduce a new serial crystallography dataset generated through the use of a simulator; the synthetic images are labeled and they are both scalable and accurate.
no code implementations • 4 Jun 2018 • Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, Matei Zaharia
In this work, we analyze the entries from DAWNBench, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries.
1 code implementation • 9 Mar 2018 • Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R. Aberger, Kunle Olukotun, Christopher Ré
Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it.
no code implementations • 22 May 2017 • Peter Bailis, Kunle Olukotun, Christopher Re, Matei Zaharia
Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations.
no code implementations • 24 Feb 2016 • Christopher De Sa, Kunle Olukotun, Christopher Ré
Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions.
no code implementations • NeurIPS 2015 • Christopher M. De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning problems.
no code implementations • NeurIPS 2015 • Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré
Gibbs sampling on factor graphs is a widely used inference technique, which often produces good empirical results.
no code implementations • 22 Jun 2015 • Christopher De Sa, Ce Zhang, Kunle Olukotun, Christopher Ré
with relaxed assumptions on the sparsity of the problem; (2) we analyze asynchronous SGD algorithms for non-convex matrix problems including matrix completion; and (3) we design and analyze an asynchronous SGD algorithm, called Buckwild!, that uses lower-precision arithmetic.
no code implementations • 5 Nov 2014 • Christopher De Sa, Kunle Olukotun, Christopher Ré
Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation.
no code implementations • 27 Jun 2012 • Lawrence McAfee, Kunle Olukotun
In this paper, we present SONNC, a compiler for NNs that utilizes static analysis to generate optimized parallel code.