Search Results for author: Azalia Mirhoseini

Found 30 papers, 12 papers with code

Archon: An Architecture Search Framework for Inference-Time Techniques

1 code implementation23 Sep 2024 Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Guha, E. Kelly Buchanan, Mayee Chen, Neel Guha, Christopher Ré, Azalia Mirhoseini

Additionally, efficiently and automatically searching the space of model choices, inference-time techniques, and their compositions is challenging due to the large design space.

Hyperparameter Optimization Instruction Following +2

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

no code implementations31 Jul 2024 Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini

When we apply repeated sampling to SWE-bench Lite, the fraction of issues solved with DeepSeek-V2-Coder-Instruct increases from 15. 9% with one sample to 56% with 250 samples, outperforming the single-attempt state-of-the-art of 43% which uses more capable frontier models.

GSM8K Math

CHESS: Contextual Harnessing for Efficient SQL Synthesis

1 code implementation27 May 2024 Shayan Talaei, Mohammadreza Pourreza, Yu-Chen Chang, Azalia Mirhoseini, Amin Saberi

Additionally, we have developed an adaptive schema pruning technique that adjusts based on the complexity of the problem and the model's context size.

Retrieval SQL Synthesis +1

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models

2 code implementations12 Apr 2024 Je-Yong Lee, DongHyun Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini

We demonstrate that CATS can be applied to various base models, including Mistral-7B and Llama2-7B, and outperforms existing sparsification techniques in downstream task performance.

Hydragen: High-Throughput LLM Inference with Shared Prefixes

1 code implementation7 Feb 2024 Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini

Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matrix-vector products for every sequence in the batch.

16k Chatbot

Representing Long-Range Context for Graph Neural Networks with Global Attention

1 code implementation NeurIPS 2021 Zhanghao Wu, Paras Jain, Matthew A. Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica

Inspired by recent computer vision results that find position-invariant attention performant in learning long-range relationships, our method, which we call GraphTrans, applies a permutation-invariant Transformer module after a standard GNN module.

Graph Classification Graph Embedding

A Transferable Approach for Partitioning Machine Learning Models on Multi-Chip-Modules

no code implementations7 Dec 2021 Xinfeng Xie, Prakash Prabhu, Ulysse Beaugnon, Phitchaya Mangpo Phothilimthana, Sudip Roy, Azalia Mirhoseini, Eugene Brevdo, James Laudon, Yanqi Zhou

Partitioning ML graphs for MCMs is particularly hard as the search space grows exponentially with the number of chiplets available and the number of nodes in the neural network.

BIG-bench Machine Learning Reinforcement Learning (RL) +1

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

no code implementations26 May 2021 Dan Zhang, Safeen Huda, Ebrahim Songhori, Kartik Prabhu, Quoc Le, Anna Goldie, Azalia Mirhoseini

The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads.

Optical Character Recognition (OCR) Scheduling

BAFFLE: TOWARDS RESOLVING FEDERATED LEARNING’S DILEMMA - THWARTING BACKDOOR AND INFERENCE ATTACKS

no code implementations1 Jan 2021 Thien Duc Nguyen, Phillip Rieger, Hossein Yalame, Helen Möllering, Hossein Fereidooni, Samuel Marchal, Markus Miettinen, Azalia Mirhoseini, Ahmad-Reza Sadeghi, Thomas Schneider, Shaza Zeitouni

Recently, federated learning (FL) has been subject to both security and privacy attacks posing a dilemmatic challenge on the underlying algorithmic designs: On the one hand, FL is shown to be vulnerable to backdoor attacks that stealthily manipulate the global model output using malicious model updates, and on the other hand, FL is shown vulnerable to inference attacks by a malicious aggregator inferring information about clients’ data from their model updates.

Federated Learning Image Classification

Placement Optimization with Deep Reinforcement Learning

no code implementations18 Mar 2020 Anna Goldie, Azalia Mirhoseini

Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints.

reinforcement-learning Reinforcement Learning +1

Generalized Clustering by Learning to Optimize Expected Normalized Cuts

no code implementations16 Oct 2019 Azade Nazi, Will Hang, Anna Goldie, Sujith Ravi, Azalia Mirhoseini

We introduce a novel end-to-end approach for learning to cluster in the absence of labeled examples.

Clustering

GDP: Generalized Device Placement for Dataflow Graphs

no code implementations28 Sep 2019 Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter C. Ma, Qiumin Xu, Ming Zhong, Hanxiao Liu, Anna Goldie, Azalia Mirhoseini, James Laudon

Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices.

Graph Neural Network

Policy Optimization by Local Improvement through Search

no code implementations25 Sep 2019 Jialin Song, Joe Wenjie Jiang, Amir Yazdanbakhsh, Ebrahim Songhori, Anna Goldie, Navdeep Jaitly, Azalia Mirhoseini

On the other end of the spectrum, approaches rooted in Policy Iteration, such as Dual Policy Iteration do not choose next step actions based on an expert, but instead use planning or search over the policy to choose an action distribution to train towards.

Imitation Learning reinforcement-learning +1

Reinforcement Learning Driven Heuristic Optimization

no code implementations16 Jun 2019 Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei

In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using reinforcement learning (RL), named RLHO.

Combinatorial Optimization reinforcement-learning +2

GAP: Generalizable Approximate Graph Partitioning Framework

1 code implementation2 Mar 2019 Azade Nazi, Will Hang, Anna Goldie, Sujith Ravi, Azalia Mirhoseini

Graph partitioning is the problem of dividing the nodes of a graph into balanced partitions while minimizing the edge cut across the partitions.

Clustering graph partitioning

A Hierarchical Model for Device Placement

no code implementations ICLR 2018 Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. Le, Jeff Dean

We introduce a hierarchical model for efficient placement of computational graphs onto hardware devices, especially in heterogeneous environments with a mixture of CPUs, GPUs, and other computational devices.

Machine Translation Reinforcement Learning +2

Device Placement Optimization with Reinforcement Learning

1 code implementation ICML 2017 Azalia Mirhoseini, Hieu Pham, Quoc V. Le, Benoit Steiner, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Mohammad Norouzi, Samy Bengio, Jeff Dean

Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices.

Language Modelling Machine Translation +4

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

4 code implementations23 Jan 2017 Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, Jeff Dean

In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters.

Computational Efficiency Language Modelling +2

oASIS: Adaptive Column Sampling for Kernel Matrix Approximation

no code implementations19 May 2015 Raajen Patel, Thomas A. Goldstein, Eva L. Dyer, Azalia Mirhoseini, Richard G. Baraniuk

Kernel matrices (e. g. Gram or similarity matrices) are essential for many state-of-the-art approaches to classification, clustering, and dimensionality reduction.

Clustering Dimensionality Reduction +1

RankMap: A Platform-Aware Framework for Distributed Learning from Dense Datasets

1 code implementation27 Mar 2015 Azalia Mirhoseini, Eva L. Dyer, Ebrahim. M. Songhori, Richard G. Baraniuk, Farinaz Koushanfar

This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense datasets.

Distributed Computing Scheduling

Cannot find the paper you are looking for? You can Submit a new open access paper.