1 code implementation • 7 Jan 2025 • Nandan Kumar Jha, Brandon Reagen
The pervasiveness of proprietary language models has raised critical privacy concerns, necessitating advancements in private inference (PI), where computations are performed directly on encrypted data without revealing users' sensitive information.
no code implementations • 2 Dec 2024 • Patrick Yubeaton, Jianqiao Cambridge Mo, Karthik Garimella, Nandan Kumar Jha, Brandon Reagen, Chinmay Hegde, Siddharth Garg
Private inference (PI) serves an important role in guaranteeing the privacy of user data when interfacing with proprietary machine learning models such as LLMs.
no code implementations • 16 Oct 2024 • Nandan Kumar Jha, Brandon Reagen
The pervasiveness of proprietary language models has raised privacy concerns for users' sensitive data, emphasizing the need for private inference (PI), where inference is performed directly on encrypted inputs.
1 code implementation • 12 Oct 2024 • Nandan Kumar Jha, Brandon Reagen
LayerNorm is a critical component in modern large language models (LLMs) for stabilizing training and ensuring smooth optimization.
1 code implementation • 6 Oct 2023 • Naren Dhyani, Jianqiao Mo, Minsu Cho, Ameya Joshi, Siddharth Garg, Brandon Reagen, Chinmay Hegde
The Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications.
no code implementations • 9 Jul 2023 • Jianqiao Mo, Karthik Garimella, Negar Neda, Austin Ebel, Brandon Reagen
The characterization motivates the need for both GCs and HE accelerators.
1 code implementation • 4 Feb 2022 • Minsu Cho, Ameya Joshi, Siddharth Garg, Brandon Reagen, Chinmay Hegde
To reduce PI latency we propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy.
2 code implementations • 26 Jul 2021 • Karthik Garimella, Nandan Kumar Jha, Brandon Reagen
In this work, we ask: Is it feasible to substitute all ReLUs with low-degree polynomial activation functions for building deep, privacy-friendly neural networks?
no code implementations • 17 Jun 2021 • Minsu Cho, Zahra Ghodsi, Brandon Reagen, Siddharth Garg, Chinmay Hegde
The emergence of deep learning has been accompanied by privacy concerns surrounding users' data and service providers' models.
no code implementations • NeurIPS 2021 • Zahra Ghodsi, Nandan Kumar Jha, Brandon Reagen, Siddharth Garg
In this paper we re-think the ReLU computation and propose optimizations for PI tailored to properties of neural networks.
no code implementations • 9 May 2021 • Deeksha Dangwal, Vincent T. Lee, Hyo Jin Kim, Tianwei Shen, Meghan Cowan, Rajvi Shah, Caroline Trippel, Brandon Reagen, Timothy Sherwood, Vasileios Balntas, Armin Alaghi, Eddy Ilg
This poses a potential risk to user privacy.
no code implementations • 2 Mar 2021 • Nandan Kumar Jha, Zahra Ghodsi, Siddharth Garg, Brandon Reagen
This paper proposes DeepReDuce: a set of optimizations for the judicious removal of ReLUs to reduce private inference latency.
no code implementations • NeurIPS 2020 • Zahra Ghodsi, Akshaj Veldanda, Brandon Reagen, Siddharth Garg
Machine learning as a service has given raise to privacy concerns surrounding clients' data and providers' models and has catalyzed research in private inference (PI): methods to process inferences without disclosing inputs.
no code implementations • 8 Jan 2020 • Udit Gupta, Samuel Hsia, Vikram Saraph, Xiaodong Wang, Brandon Reagen, Gu-Yeon Wei, Hsien-Hsin S. Lee, David Brooks, Carole-Jean Wu
Neural personalized recommendation is the corner-stone of a wide collection of cloud services and products, constituting significant compute demand of the cloud infrastructure.
Distributed, Parallel, and Cluster Computing
no code implementations • 23 Aug 2019 • Udit Gupta, Brandon Reagen, Lillian Pentecost, Marco Donato, Thierry Tambe, Alexander M. Rush, Gu-Yeon Wei, David Brooks
The architecture is enhanced by a series of dynamic activation optimizations that enable compact storage, ensure no energy is wasted computing null operations, and maintain high MAC utilization for highly parallel accelerator designs.
7 code implementations • 6 Jun 2019 • Udit Gupta, Carole-Jean Wu, Xiaodong Wang, Maxim Naumov, Brandon Reagen, David Brooks, Bradford Cottel, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Andrey Malevich, Dheevatsa Mudigere, Mikhail Smelyanskiy, Liang Xiong, Xuan Zhang
The widespread application of deep learning has changed the landscape of computation in the data center.
2 code implementations • 13 Nov 2017 • Brandon Reagen, Udit Gupta, Robert Adolf, Michael M. Mitzenmacher, Alexander M. Rush, Gu-Yeon Wei, David Brooks
This results in up to a 1. 51x improvement over the state-of-the-art.
1 code implementation • 23 Aug 2016 • Robert Adolf, Saketh Rama, Brandon Reagen, Gu-Yeon Wei, David Brooks
Fathom has been released online, and this paper focuses on understanding the fundamental performance characteristics of each model.