Search Results for author: Daniel Y. Fu

Found 18 papers, 14 papers with code

Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

no code implementations • 12 Feb 2024 • Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, Neel Guha, Christopher Ré

Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e. g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text.

Benchmarking Chunking +2

Paper
Add Code

Hydragen: High-Throughput LLM Inference with Shared Prefixes

1 code implementation • 7 Feb 2024 • Jordan Juravsky, Bradley Brown, Ryan Ehrlich, Daniel Y. Fu, Christopher Ré, Azalia Mirhoseini

Decoding in this large-batch setting can be bottlenecked by the attention operation, which reads large key-value (KV) caches from memory and computes inefficient matrix-vector products for every sequence in the batch.

16k Chatbot

Paper
Code

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

1 code implementation • 10 Nov 2023 • Daniel Y. Fu, Hermann Kumbong, Eric Nguyen, Christopher Ré

FlashFFTConv uses a matrix decomposition that computes the FFT using matrix multiply units and enables kernel fusion for long sequences, reducing I/O.

229

Paper
Code

Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

1 code implementation • NeurIPS 2023 • Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré

We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension?

4k Image Classification +1

501

Paper
Code

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

1 code implementation • 13 Mar 2023 • Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Daniel Y. Fu, Zhiqiang Xie, Beidi Chen, Clark Barrett, Joseph E. Gonzalez, Percy Liang, Christopher Ré, Ion Stoica, Ce Zhang

As a result, when running OPT-175B on a single 16GB GPU, FlexGen achieves significantly higher throughput compared to state-of-the-art offloading systems, reaching a generation throughput of 1 token/s for the first time with an effective batch size of 144.

Language Modelling Large Language Model

9,039

Paper
Code

Hyena Hierarchy: Towards Larger Convolutional Language Models

6 code implementations • 21 Feb 2023 • Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale.

Ranked #37 on Language Modelling on WikiText-103

2k 8k +2

842

Paper
Code

Simple Hardware-Efficient Long Convolutions for Sequence Modeling

1 code implementation • 13 Feb 2023 • Daniel Y. Fu, Elliot L. Epstein, Eric Nguyen, Armin W. Thomas, Michael Zhang, Tri Dao, Atri Rudra, Christopher Ré

We find that a key requirement to achieving high performance is keeping the convolution kernels smooth.

Image Classification Language Modelling

842

Paper
Code

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

3 code implementations • 28 Dec 2022 • Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré

First, we use synthetic language modeling tasks to understand the gap between SSMs and attention.

Ranked #2 on Language Modelling on The Pile (Test perplexity metric)

8k Coreference Resolution +5

842

Paper
Code

Lost in Transmission: On the Impact of Networking Corruptions on Video Machine Learning Models

no code implementations • 10 Jun 2022 • Trenton Chang, Daniel Y. Fu

In a simulation study, we investigate (1) what artifacts networking corruptions cause, (2) how such artifacts affect ML models, and (3) whether standard robustness methods can mitigate their negative effects.

Data Augmentation

Paper
Add Code

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

9 code implementations • 27 May 2022 • Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré

We also extend FlashAttention to block-sparse attention, yielding an approximate attention algorithm that is faster than any existing approximate attention method.

16k 4k +3

78,706

Paper
Code

TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

1 code implementation • Findings (ACL) 2022 • Megan Leszczynski, Daniel Y. Fu, Mayee F. Chen, Christopher Ré

Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking.

Entity Retrieval Fact Checking +3

Paper
Code

Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning

1 code implementation • 15 Apr 2022 • Mayee F. Chen, Daniel Y. Fu, Avanika Narayan, Michael Zhang, Zhao Song, Kayvon Fatahalian, Christopher Ré

We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread.

Contrastive Learning Data Augmentation

Paper
Code

Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

1 code implementation • 24 Mar 2022 • Mayee F. Chen, Daniel Y. Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, Christopher Ré

Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space.

Paper
Code

Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings

1 code implementation • 26 Jun 2020 • Mayee F. Chen, Daniel Y. Fu, Frederic Sala, Sen Wu, Ravi Teja Mullapudi, Fait Poms, Kayvon Fatahalian, Christopher Ré

Our goal is to enable machine learning systems to be trained interactively.

Transfer Learning

Paper
Code

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

1 code implementation • ICML 2020 • Daniel Y. Fu, Mayee F. Chen, Frederic Sala, Sarah M. Hooper, Kayvon Fatahalian, Christopher Ré

In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD).

310

Paper
Code

Multi-Resolution Weak Supervision for Sequential Data

no code implementations • NeurIPS 2019 • Frederic Sala, Paroma Varma, Jason Fries, Daniel Y. Fu, Shiori Sagawa, Saelig Khattar, Ashwini Ramamoorthy, Ke Xiao, Kayvon Fatahalian, James Priest, Christopher Ré

Multi-resolution sources exacerbate this challenge due to complex correlations and sample complexity that scales in the length of the sequence.

Paper
Add Code

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

1 code implementation • 7 Oct 2019 • Daniel Y. Fu, Will Crichton, James Hong, Xinwei Yao, Haotian Zhang, Anh Truong, Avanika Narayan, Maneesh Agrawala, Christopher Ré, Kayvon Fatahalian

Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film.

Paper
Code

Influencing Flock Formation in Low-Density Settings

no code implementations • 23 Apr 2018 • Daniel Y. Fu, Emily S. Wang, Peter M. Krafft, Barbara J. Grosz

In the interest of learning how to control flocking behavior, recent work in the multiagent systems literature has explored the use of influencing agents for guiding flocking agents to face a target direction.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.