Search Results for author: Lingjiao Chen

Found 20 papers, 4 papers with code

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

no code implementations • 11 Mar 2024 • Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM).

Language Modelling Large Language Model

Paper
Add Code

Are More LLM Calls All You Need? Towards Scaling Laws of Compound Inference Systems

no code implementations • 4 Mar 2024 • Lingjiao Chen, Jared Quincy Davis, Boris Hanin, Peter Bailis, Ion Stoica, Matei Zaharia, James Zou

We find empirically that across multiple language tasks, surprisingly, Voting Inference Systems' performance first increases but then decreases as a function of the number of LLM calls.

Language Modelling Large Language Model

Paper
Add Code

Data Acquisition: A New Frontier in Data-centric AI

no code implementations • 22 Nov 2023 • Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou

As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative.

Paper
Add Code

How is ChatGPT's behavior changing over time?

4 code implementations • 18 Jul 2023 • Lingjiao Chen, Matei Zaharia, James Zou

We find that the performance and behavior of both GPT-3. 5 and GPT-4 can vary greatly over time.

Code Generation Language Modelling +3

1,641

Paper
Code

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

no code implementations • 9 May 2023 • Lingjiao Chen, Matei Zaharia, James Zou

There is a rapidly growing number of large language models (LLMs) that users can query for a fee.

Paper
Add Code

SEAL : Interactive Tool for Systematic Error Analysis and Labeling

no code implementations • 11 Oct 2022 • Nazneen Rajani, Weixin Liang, Lingjiao Chen, Meg Mitchell, James Zou

With the advent of Transformers, large language models (LLMs) have saturated well-known NLP benchmarks and leaderboards with high aggregate performance.

Paper
Add Code

HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions

1 code implementation • 18 Sep 2022 • Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou

HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS).

object-detection Object Detection +4

Paper
Code

Estimating and Explaining Model Performance When Both Covariates and Labels Shift

no code implementations • 18 Sep 2022 • Lingjiao Chen, Matei Zaharia, James Zou

We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels.

Paper
Add Code

DataPerf: Benchmarks for Data-Centric AI Development

1 code implementation • NeurIPS 2023 • Mark Mazumder, Colby Banbury, Xiaozhe Yao, Bojan Karlaš, William Gaviria Rojas, Sudnya Diamos, Greg Diamos, Lynn He, Alicia Parrish, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Douwe Kiela, David Jurado, David Kanter, Rafael Mosquera, Juan Ciro, Lora Aroyo, Bilge Acun, Lingjiao Chen, Mehul Smriti Raje, Max Bartolo, Sabri Eyuboglu, Amirata Ghorbani, Emmett Goodman, Oana Inel, Tariq Kane, Christine R. Kirkpatrick, Tzu-Sheng Kuo, Jonas Mueller, Tristan Thrush, Joaquin Vanschoren, Margaret Warren, Adina Williams, Serena Yeung, Newsha Ardalani, Praveen Paritosh, Lilith Bat-Leah, Ce Zhang, James Zou, Carole-Jean Wu, Cody Coleman, Andrew Ng, Peter Mattson, Vijay Janapa Reddi

Machine learning research has long focused on models rather than datasets, and prominent datasets are used for common ML tasks without regard to the breadth, difficulty, and faithfulness of the underlying problems.

Paper
Code

Solon: Communication-efficient Byzantine-resilient Distributed Training via Redundant Gradients

no code implementations • 4 Oct 2021 • Lingjiao Chen, Leshang Chen, Hongyi Wang, Susan Davidson, Edgar Dobriban

There has been a growing need to provide Byzantine-resilience in distributed model training.

Paper
Add Code

How Did the Model Change? Efficiently Assessing Machine Learning API Shifts

no code implementations • ICLR 2022 • Lingjiao Chen, Matei Zaharia, James Zou

ML prediction APIs from providers like Amazon and Google have made it simple to use ML in applications.

BIG-bench Machine Learning

Paper
Add Code

Did the Model Change? Efficiently Assessing Machine Learning API Shifts

no code implementations • 29 Jul 2021 • Lingjiao Chen, Tracy Cai, Matei Zaharia, James Zou

This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant.

BIG-bench Machine Learning

Paper
Add Code

Efficient Online ML API Selection for Multi-Label Classification Tasks

no code implementations • 18 Feb 2021 • Lingjiao Chen, Matei Zaharia, James Zou

In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting user's budget.

General Classification Multi-Label Classification +7

Paper
Add Code

FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply

no code implementations • NeurIPS 2020 • Lingjiao Chen, Matei Zaharia, James Zou

Prediction APIs offered for a fee are a fast-growing industry and an important part of machine learning as a service.

Facial Emotion Recognition Sentiment Analysis +2

Paper
Add Code

The Effect of Network Width on the Performance of Large-batch Training

no code implementations • NeurIPS 2018 • Lingjiao Chen, Hongyi Wang, Jinman Zhao, Dimitris Papailiopoulos, Paraschos Koutris

Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training.

Paper
Add Code

Model-based Pricing for Machine Learning in a Data Marketplace

no code implementations • 26 May 2018 • Lingjiao Chen, Paraschos Koutris, Arun Kumar

Finally, we conduct extensive experiments, which validate that the MBP framework can provide high revenue to the seller, high affordability to the buyer, and also operate on low runtime cost.

BIG-bench Machine Learning

Paper
Add Code

DRACO: Byzantine-resilient Distributed Training via Redundant Gradients

1 code implementation • ICML 2018 • Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i. e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS).

Paper
Code

The Manifold Assumption and Defenses Against Adversarial Perturbations

no code implementations • ICLR 2018 • Xi Wu, Uyeong Jang, Lingjiao Chen, Somesh Jha

Interestingly, we find that a recent objective by Madry et al. encourages training a model that satisfies well our formal version of the goodness property, but has a weak control of points that are wrong but with low confidence.

Paper
Add Code

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training

no code implementations • ICML 2018 • Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, Somesh Jha

In this paper we study leveraging confidence information induced by adversarial training to reinforce adversarial robustness of a given adversarially trained model.

Adversarial Robustness

Paper
Add Code

Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent

no code implementations • 22 Feb 2017 • Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Jeffrey F. Naughton, Jignesh M. Patel, Xi Wu

We fill this crucial research gap by proposing a new lossless compression scheme we call tuple-oriented compression (TOC) that is inspired by an unlikely source, the string/text compression scheme Lempel-Ziv-Welch, but tailored to MGD in a way that preserves tuple boundaries within mini-batches.

Data Compression Open-Ended Question Answering +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.