Search Results for author: Matei Zaharia

Found 58 papers, 27 papers with code

Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP

2 code implementations • 28 Dec 2022 • Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, Matei Zaharia

Retrieval-augmented in-context learning has emerged as a powerful approach for addressing knowledge-intensive tasks using frozen language models (LM) and retrieval models (RM).

In-Context Learning Language Modelling +2

10,165

Paper
Code

DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines

2 code implementations • 5 Oct 2023 • Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts

The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks.

Language Modelling Math

10,165

Paper
Code

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

1 code implementation • 20 Dec 2023 • Arnav Singhvi, Manish Shetty, Shangyin Tan, Christopher Potts, Koushik Sen, Matei Zaharia, Omar Khattab

We integrate our constructs into the recent DSPy programming model for LMs, and present new strategies that allow DSPy to compile programs with LM Assertions into more reliable and accurate systems.

Language Modelling Prompt Engineering +2

10,158

Paper
Code

RAFT: Adapting Language Model to Domain Specific RAG

1 code implementation • 15 Mar 2024 • Tianjun Zhang, Shishir G. Patil, Naman jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez

In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings.

Language Modelling

9,945

Paper
Code

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM

1 code implementation • 9 Apr 2021 • Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick Legresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia

In this paper, we show how different types of parallelism methods (tensor, pipeline, and data parallelism) can be composed to scale to thousands of GPUs and models with trillions of parameters.

Language Modelling

8,472

Paper
Code

World Model on Million-Length Video And Language With Blockwise RingAttention

1 code implementation • 13 Feb 2024 • Hao liu, Wilson Yan, Matei Zaharia, Pieter Abbeel

To address these challenges, we curate a large dataset of diverse videos and books, utilize the Blockwise RingAttention technique to scalably train on long sequences, and gradually increase context size from 4K to 1M tokens.

4k Video Understanding

6,783

Paper
Code

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

8 code implementations • 27 Apr 2020 • Omar Khattab, Matei Zaharia

ColBERT introduces a late interaction architecture that independently encodes the query and the document using BERT and then employs a cheap yet powerful interaction step that models their fine-grained similarity.

Document Ranking Information Retrieval +3

2,437

Paper
Code

Relevance-guided Supervision for OpenQA with ColBERT

5 code implementations • 1 Jul 2020 • Omar Khattab, Christopher Potts, Matei Zaharia

In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages.

Natural Questions Open-Domain Question Answering +2

2,437

Paper
Code

Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

2 code implementations • NeurIPS 2021 • Omar Khattab, Christopher Potts, Matei Zaharia

Multi-hop reasoning (i. e., reasoning across two or more documents) is a key ingredient for NLP models that leverage large corpora to exhibit broad knowledge.

Claim Verification Question Answering +1

2,437

Paper
Code

ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction

3 code implementations • NAACL 2022 • Keshav Santhanam, Omar Khattab, Jon Saad-Falcon, Christopher Potts, Matei Zaharia

Neural information retrieval (IR) has greatly advanced search and other knowledge-intensive language tasks.

Ranked #6 on Zero-shot Text Search on BEIR

Information Retrieval Open-Domain Question Answering +2

2,437

Paper
Code

PLAID: An Efficient Engine for Late Interaction Retrieval

1 code implementation • 19 May 2022 • Keshav Santhanam, Omar Khattab, Christopher Potts, Matei Zaharia

PLAID uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7$\times$ on a GPU and 45$\times$ on a CPU against vanilla ColBERTv2, while continuing to deliver state-of-the-art retrieval quality.

Information Retrieval Retrieval

2,437

Paper
Code

How is ChatGPT's behavior changing over time?

4 code implementations • 18 Jul 2023 • Lingjiao Chen, Matei Zaharia, James Zou

We find that the performance and behavior of both GPT-3. 5 and GPT-4 can vary greatly over time.

Code Generation Language Modelling +3

1,621

Paper
Code

MLPerf Training Benchmark

2 code implementations • 2 Oct 2019 • Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim Hazelwood, Andrew Hock, Xinyuan Huang, Atsushi Ike, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Tsuguchika Tabaru, Carole-Jean Wu, Lingjie Xu, Masafumi Yamazaki, Cliff Young, Matei Zaharia

Machine learning (ML) needs industry-standard performance benchmarks to support design and competitive evaluation of the many emerging software and hardware solutions for ML.

Benchmarking BIG-bench Machine Learning

1,550

Paper
Code

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

2 code implementations • 29 Nov 2022 • Trevor Gale, Deepak Narayanan, Cliff Young, Matei Zaharia

We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.

1,031

Paper
Code

On the Opportunities and Risks of Foundation Models

2 code implementations • 16 Aug 2021 • Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, aditi raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, Percy Liang

AI is undergoing a paradigm shift with the rise of models (e. g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks.

Transfer Learning

847

Paper
Code

Similarity Search for Efficient Active Learning and Search of Rare Concepts

1 code implementation • 30 Jun 2020 • Cody Coleman, Edward Chou, Julian Katz-Samuels, Sean Culatana, Peter Bailis, Alexander C. Berg, Robert Nowak, Roshan Sumbaly, Matei Zaharia, I. Zeki Yalniz

Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples.

Active Learning Computational Efficiency

520

Paper
Code

Ring Attention with Blockwise Transformers for Near-Infinite Context

3 code implementations • 3 Oct 2023 • Hao liu, Matei Zaharia, Pieter Abbeel

Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications.

Language Modelling

493

Paper
Code

NoScope: Optimizing Neural Network Queries over Video at Scale

1 code implementation • 7 Mar 2017 • Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, Matei Zaharia

Given a target video, object to detect, and reference neural network, NoScope automatically searches for and trains a sequence, or cascade, of models that preserves the accuracy of the reference network but is specialized to the target video and are therefore far less computationally expensive.

Binary Classification

433

Paper
Code

Memory-Efficient Pipeline-Parallel DNN Training

1 code implementation • 16 Jun 2020 • Deepak Narayanan, Amar Phanishayee, Kaiyu Shi, Xie Chen, Matei Zaharia

Many state-of-the-art ML results have been obtained by scaling up the number of parameters in existing models.

367

Paper
Code

ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems

1 code implementation • 16 Nov 2023 • Jon Saad-Falcon, Omar Khattab, Christopher Potts, Matei Zaharia

Evaluating retrieval-augmented generation (RAG) systems traditionally relies on hand annotations for input queries, passages to retrieve, and responses to generate.

Retrieval

267

Paper
Code

Sparse GPU Kernels for Deep Learning

1 code implementation • 18 Jun 2020 • Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen

In this work, we study sparse matrices from deep learning applications and identify favorable properties that can be exploited to accelerate computation.

225

Paper
Code

Selection via Proxy: Efficient Data Selection for Deep Learning

1 code implementation • ICLR 2020 • Cody Coleman, Christopher Yeh, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

By removing hidden layers from the target model, using smaller architectures, and training for fewer epochs, we create proxies that are an order of magnitude faster to train.

Active Learning Computational Efficiency

Paper
Code

To Index or Not to Index: Optimizing Exact Maximum Inner Product Search

1 code implementation • 5 Jun 2017 • Firas Abuzaid, Geet Sethi, Peter Bailis, Matei Zaharia

The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task.

Recommendation Systems

Paper
Code

LIT: Learned Intermediate Representation Training for Model Compression

1 code implementation • 4 Sep 2019 • Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia

In this work, we introduce Learned Intermediate representation Training (LIT), a novel model compression technique that outperforms a range of recent model compression techniques by leveraging the highly repetitive structure of modern DNNs (e. g., ResNet).

Image Classification Model Compression +2

Paper
Code

Model Assertions for Monitoring and Improving ML Models

1 code implementation • 3 Mar 2020 • Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia

We propose methods of using model assertions at all stages of ML system deployment, including runtime monitoring, validating labels, and continuously improving ML models.

Active Learning

Paper
Code

Express: Lowering the Cost of Metadata-hiding Communication with Cryptographic Privacy

1 code implementation • 20 Nov 2019 • Saba Eskandarian, Henry Corrigan-Gibbs, Matei Zaharia, Dan Boneh

Existing systems for metadata-hiding messaging that provide cryptographic privacy properties have either high communication costs, high computation costs, or both.

Cryptography and Security

Paper
Code

HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions

1 code implementation • 18 Sep 2022 • Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou

HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS).

object-detection Object Detection +4

Paper
Code

Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark

no code implementations • 4 Jun 2018 • Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, Matei Zaharia

In this work, we analyze the entries from DAWNBench, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries.

Benchmarking BIG-bench Machine Learning

Paper
Add Code

Infrastructure for Usable Machine Learning: The Stanford DAWN Project

no code implementations • 22 May 2017 • Peter Bailis, Kunle Olukotun, Christopher Re, Matei Zaharia

Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations.

BIG-bench Machine Learning

Paper
Add Code

MLlib: Machine Learning in Apache Spark

no code implementations • 26 May 2015 • Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks.

BIG-bench Machine Learning

Paper
Add Code

LIT: Block-wise Intermediate Representation Training for Model Compression

no code implementations • ICLR 2019 • Animesh Koratana, Daniel Kang, Peter Bailis, Matei Zaharia

Knowledge distillation (KD) is a popular method for reducing the computational overhead of deep network inference, in which the output of a teacher model is used to train a smaller, faster student model.

Knowledge Distillation Model Compression

Paper
Add Code

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale

no code implementations • NeurIPS 2016 • Firas Abuzaid, Joseph K. Bradley, Feynman T. Liang, Andrew Feng, Lee Yang, Matei Zaharia, Ameet S. Talwalkar

Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets.

Paper
Add Code

Select Via Proxy: Efficient Data Selection For Training Deep Networks

no code implementations • ICLR 2019 • Cody Coleman, Stephen Mussmann, Baharan Mirzasoleiman, Peter Bailis, Percy Liang, Jure Leskovec, Matei Zaharia

In our approach, we first train a small proxy model quickly, which we then use to estimate the utility of individual training data points, and then select the most informative ones for training the large target model.

BIG-bench Machine Learning Image Classification +1

Paper
Add Code

Model Specialization for Inference Via End-to-End Distillation, Pruning, and Cascades

no code implementations • ICLR 2018 • Daniel Kang, Karey Shi, Thao Ngyuen, Stephanie Mallard, Peter Bailis, Matei Zaharia

Thus, simply fine-tuning or transfer learn- ing from a general-purpose network inherits a large computational cost that may not be necessary for a given task.

General Classification Image Classification

Paper
Add Code

Beyond Data and Model Parallelism for Deep Neural Networks

no code implementations • 14 Jul 2018 • Zhihao Jia, Matei Zaharia, Alex Aiken

We also propose FlexFlow, a deep learning framework that uses guided randomized search of the SOAP space to find a fast parallelization strategy for a specific parallel machine.

Distributed, Parallel, and Cluster Computing

Paper
Add Code

MLSys: The New Frontier of Machine Learning Systems

no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar

Machine learning (ML) techniques are enjoying rapidly increasing adoption.

BIG-bench Machine Learning

Paper
Add Code

Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference

no code implementations • 3 Jun 2019 • Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia

First, Willump automatically cascades feature computation for classification queries: Willump classifies most data inputs using only high-value, low-cost features selected through empirical observations of ML model performance, improving query performance by up to 5x without statistically significant accuracy loss.

BIG-bench Machine Learning

Paper
Add Code

BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics

no code implementations • 2 May 2018 • Daniel Kang, Peter Bailis, Matei Zaharia

We introduce two new query optimization techniques in BlazeIt that are not supported by prior work.

Databases

Paper
Add Code

FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply

no code implementations • NeurIPS 2020 • Lingjiao Chen, Matei Zaharia, James Zou

Prediction APIs offered for a fee are a fast-growing industry and an important part of machine learning as a service.

Facial Emotion Recognition Sentiment Analysis +2

Paper
Add Code

Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

no code implementations • 25 Jul 2020 • Daniel Kang, Ankit Mathur, Teja Veeramacheneni, Peter Bailis, Matei Zaharia

This runtime engine a) efficiently pipelines preprocessing and DNN execution for inference, b) places preprocessing operations on the CPU or GPU in a hardware- and input-aware manner, and c) efficiently manages memory and threading for high throughput execution.

Paper
Add Code

Efficient Online ML API Selection for Multi-Label Classification Tasks

no code implementations • 18 Feb 2021 • Lingjiao Chen, Matei Zaharia, James Zou

In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting user's budget.

General Classification Multi-Label Classification +7

Paper
Add Code

Proof: Accelerating Approximate Aggregation Queries with Expensive Predicates

no code implementations • 27 Jul 2021 • Daniel Kang, John Guibas, Peter Bailis, Tatsunori Hashimoto, Yi Sun, Matei Zaharia

Given a dataset $\mathcal{D}$, we are interested in computing the mean of a subset of $\mathcal{D}$ which matches a predicate.

Paper
Add Code

Did the Model Change? Efficiently Assessing Machine Learning API Shifts

no code implementations • 29 Jul 2021 • Lingjiao Chen, Tracy Cai, Matei Zaharia, James Zou

This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant.

BIG-bench Machine Learning

Paper
Add Code

How Did the Model Change? Efficiently Assessing Machine Learning API Shifts

no code implementations • ICLR 2022 • Lingjiao Chen, Matei Zaharia, James Zou

ML prediction APIs from providers like Amazon and Google have made it simple to use ML in applications.

BIG-bench Machine Learning

Paper
Add Code

Hindsight: Posterior-guided training of retrievers for improved open-ended generation

no code implementations • ICLR 2022 • Ashwin Paranjape, Omar Khattab, Christopher Potts, Matei Zaharia, Christopher D. Manning

Many text generation systems benefit from using a retriever to retrieve passages from a textual knowledge corpus (e. g., Wikipedia) which are then provided as additional context to the generator.

Text Generation

Paper
Add Code

DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution

no code implementations • 9 Nov 2021 • Keshav Santhanam, Siddharth Krishna, Ryota Tomioka, Tim Harris, Matei Zaharia

The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof.

Efficient Neural Network

Paper
Add Code

Toward Compact Parameter Representations for Architecture-Agnostic Neural Network Compression

no code implementations • 19 Nov 2021 • Yuezhou Sun, Wenlong Zhao, Lijun Zhang, Xiao Liu, Hui Guan, Matei Zaharia

This paper investigates deep neural network (DNN) compression from the perspective of compactly representing and storing trained parameters.

Neural Network Compression Quantization

Paper
Add Code

What can Data-Centric AI Learn from Data and ML Engineering?

no code implementations • 13 Dec 2021 • Neoklis Polyzotis, Matei Zaharia

Data-centric AI is a new and exciting research topic in the AI community, but many organizations already build and maintain various "data-centric" applications whose goal is to produce high quality data.

Paper
Add Code

Estimating and Explaining Model Performance When Both Covariates and Labels Shift

no code implementations • 18 Sep 2022 • Lingjiao Chen, Matei Zaharia, James Zou

We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels.

Paper
Add Code

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

no code implementations • 2 Dec 2022 • Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks.

Benchmarking Information Retrieval +1

Paper
Add Code

Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks

no code implementations • 11 Feb 2023 • Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia, Tatsunori Hashimoto

Recent advances in instruction-following large language models (LLMs) have led to dramatic improvements in a range of NLP tasks.

Computer Security Instruction Following

Paper
Add Code

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

no code implementations • 9 May 2023 • Lingjiao Chen, Matei Zaharia, James Zou

There is a rapidly growing number of large language models (LLMs) that users can query for a fee.

Paper
Add Code

Exploration with Principles for Diverse AI Supervision

no code implementations • 13 Oct 2023 • Hao liu, Matei Zaharia, Pieter Abbeel

Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI.

Reinforcement Learning (RL) Unsupervised Reinforcement Learning

Paper
Add Code

Data Acquisition: A New Frontier in Data-centric AI

no code implementations • 22 Nov 2023 • Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou

As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative.

Paper
Add Code

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

no code implementations • 4 Mar 2024 • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

We find empirically that across multiple language tasks, surprisingly, Voting Inference Systems' performance first increases but then decreases as a function of the number of LLM calls.

Language Modelling Large Language Model

Paper
Add Code

ALTO: An Efficient Network Orchestrator for Compound AI Systems

no code implementations • 7 Mar 2024 • Keshav Santhanam, Deepti Raghavan, Muhammad Shahir Rahman, Thejas Venkatesh, Neha Kunjal, Pratiksha Thaker, Philip Levis, Matei Zaharia

We present ALTO, a network orchestrator for efficiently serving compound AI systems such as pipelines of language models.

Chatbot Scheduling

Paper
Add Code

ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data

no code implementations • 7 Mar 2024 • Liana Patel, Peter Kraft, Carlos Guestrin, Matei Zaharia

Applications increasingly leverage mixed-modality data, and must jointly search over vector data, such as embedded images, text and video, as well as structured data, such as attributes and keywords.

Paper
Add Code

Optimizing LLM Queries in Relational Workloads

no code implementations • 9 Mar 2024 • Shu Liu, Asim Biswal, Audrey Cheng, Xiangxi Mo, Shiyi Cao, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia

In this paper, we explore how to optimize LLM inference for analytical workloads that invoke LLMs within relational queries.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.