Search Results for author: Wentao Wu

Found 13 papers, 6 papers with code

Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines

1 code implementation23 Apr 2022 Bojan Karlaš, David Dao, Matteo Interlandi, Bo Li, Sebastian Schelter, Wentao Wu, Ce Zhang

We present DataScope (ease. ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training.


VolcanoML: Speeding up End-to-End AutoML via Scalable Search Space Decomposition

3 code implementations19 Jul 2021 Yang Li, Yu Shen, Wentao Zhang, Jiawei Jiang, Bolin Ding, Yaliang Li, Jingren Zhou, Zhi Yang, Wentao Wu, Ce Zhang, Bin Cui

End-to-end AutoML has attracted intensive interests from both academia and industry, which automatically searches for ML pipelines in a space induced by feature engineering, algorithm/model selection, and hyper-parameter tuning.

AutoML Feature Engineering +1

OpenBox: A Generalized Black-box Optimization Service

7 code implementations1 Jun 2021 Yang Li, Yu Shen, Wentao Zhang, Yuanwei Chen, Huaijun Jiang, Mingchao Liu, Jiawei Jiang, Jinyang Gao, Wentao Wu, Zhi Yang, Ce Zhang, Bin Cui

Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, engineering, physics, and experimental design.

Experimental Design Transfer Learning

Towards Demystifying Serverless Machine Learning Training

1 code implementation17 May 2021 Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang

The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML).

A Data Quality-Driven View of MLOps

no code implementations15 Feb 2021 Cedric Renggli, Luka Rimanic, Nezihe Merve Gürel, Bojan Karlaš, Wentao Wu, Ce Zhang

Developing machine learning models can be seen as a process similar to the one established for traditional software development.

Ease.ML/Snoopy: Towards Automatic Feasibility Studies for ML via Quantitative Understanding of "Data Quality for ML"

1 code implementation16 Oct 2020 Cedric Renggli, Luka Rimanic, Luka Kolar, Wentao Wu, Ce Zhang

In our experience of working with domain experts who are using today's AutoML systems, a common problem we encountered is what we call "unrealistic expectations" -- when users are facing a very challenging task with a noisy data acquisition process, while being expected to achieve startlingly high accuracy with machine learning (ML).


Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

1 code implementation11 May 2020 Bojan Karlaš, Peng Li, Renzhi Wu, Nezihe Merve Gürel, Xu Chu, Wentao Wu, Ce Zhang

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data.

Data Science through the looking glass and what we found there

no code implementations19 Dec 2019 Fotis Psallidas, Yiwen Zhu, Bojan Karlas, Matteo Interlandi, Avrilia Floratou, Konstantinos Karanasos, Wentao Wu, Ce Zhang, Subru Krishnan, Carlo Curino, Markus Weimer

The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners.

Continuous Integration of Machine Learning Models with Towards a Rigorous Yet Practical Treatment

no code implementations1 Mar 2019 Cedric Renggli, Bojan Karlaš, Bolin Ding, Feng Liu, Kevin Schawinski, Wentao Wu, Ce Zhang

Continuous integration is an indispensable step of modern software engineering practices to systematically manage the life cycles of system development. Towards Multi-tenant Resource Sharing for Machine Learning Workloads

no code implementations24 Aug 2017 Tian Li, Jie Zhong, Ji Liu, Wentao Wu, Ce Zhang

We ask, as a "service provider" that manages a shared cluster of machines among all our users running machine learning workloads, what is the resource allocation strategy that maximizes the global satisfaction of all our users?

Fairness Image Classification +2

MLBench: How Good Are Machine Learning Clouds for Binary Classification Tasks on Structured Data?

no code implementations29 Jul 2017 Yu Liu, Hantian Zhang, Luyuan Zeng, Wentao Wu, Ce Zhang

We then compare the performance of the top winning code available from Kaggle with that of running machine learning clouds from both Azure and Amazon on mlbench.

General Classification

Revisiting Differentially Private Regression: Lessons From Learning Theory and their Consequences

no code implementations20 Dec 2015 Xi Wu, Matthew Fredrikson, Wentao Wu, Somesh Jha, Jeffrey F. Naughton

Perhaps more importantly, our theory reveals that the most basic mechanism in differential privacy, output perturbation, can be used to obtain a better tradeoff for all convex-Lipschitz-bounded learning tasks.

Learning Theory

Cannot find the paper you are looking for? You can Submit a new open access paper.