Search Results for author: James Zou

Found 72 papers, 25 papers with code

Did the Model Change? Efficiently Assessing Machine Learning API Shifts

no code implementations29 Jul 2021 Lingjiao Chen, Tracy Cai, Matei Zaharia, James Zou

This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant.

Do Humans Trust Advice More if it Comes from AI? An Analysis of Human-AI Interactions

no code implementations14 Jul 2021 Kailas Vodrahalli, Tobias Gerstenberg, James Zou

In many applications of AI, the algorithm's output is framed as a suggestion to a human user.

Meaningfully Explaining a Model's Mistakes

no code implementations24 Jun 2021 Abubakar Abid, James Zou

Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases.

Adversarial Training Helps Transfer Learning via Better Representations

no code implementations18 Jun 2021 Zhun Deng, Linjun Zhang, Kailas Vodrahalli, Kenji Kawaguchi, James Zou

Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains.

Transfer Learning

Group-Structured Adversarial Training

no code implementations18 Jun 2021 Farzan Farnia, Amirali Aghazadeh, James Zou, David Tse

Robust training methods against perturbations to the input data have received great attention in the machine learning literature.

MLDemon: Deployment Monitoring for Machine Learning Systems

no code implementations28 Apr 2021 Antonio Ginart, Martin Zhang, James Zou

MLDemon integrates both unlabeled features and a small amount of on-demand labeled examples over time to produce a real-time estimate of the ML model's current performance on a given data stream.

Data Shapley Valuation for Efficient Batch Active Learning

no code implementations16 Apr 2021 Amirata Ghorbani, James Zou, Andre Esteva

In this work, we introduce Active Data Shapley (ADS) -- a filtering layer for batch active learning that significantly increases the efficiency of active learning by pre-selecting, using a linear time computation, the highest-value points from an unlabeled dataset.

Active Learning

FrugalMCT: Efficient Online ML API Selection for Multi-Label Classification Tasks

no code implementations18 Feb 2021 Lingjiao Chen, Matei Zaharia, James Zou

In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting user's budget.

General Classification Multi-Label Classification +6

How to Learn when Data Reacts to Your Model: Performative Gradient Descent

no code implementations15 Feb 2021 Zachary Izzo, Lexing Ying, James Zou

Performative distribution shift captures the setting where the choice of which ML model is deployed changes the data distribution.

When and How Mixup Improves Calibration

no code implementations11 Feb 2021 Linjun Zhang, Zhun Deng, Kenji Kawaguchi, James Zou

In addition, we study how Mixup improves calibration in semi-supervised learning.

Data Augmentation

Persistent Anti-Muslim Bias in Large Language Models

1 code implementation14 Jan 2021 Abubakar Abid, Maheen Farooqi, James Zou

It has been observed that large-scale language models capture undesirable societal biases, e. g. relating to race and gender; yet religious bias has been relatively unexplored.

Adversarial Text Language Modelling

Neural Group Testing to Accelerate Deep Learning

1 code implementation21 Nov 2020 Weixin Liang, James Zou

A key challenge of neural group testing is to modify a deep neural network so that it could test multiple samples in one forward pass.

Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset

no code implementations15 Oct 2020 Siyi Tang, Amirata Ghorbani, Rikiya Yamashita, Sameer Rehman, Jared A. Dunnmon, James Zou, Daniel L. Rubin

In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset.

Pneumonia Detection

How Does Mixup Help With Robustness and Generalization?

no code implementations ICLR 2021 Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, James Zou

For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss.

Data Augmentation

TrueImage: A Machine Learning Algorithm to Improve the Quality of Telehealth Photos

no code implementations1 Oct 2020 Kailas Vodrahalli, Roxana Daneshjou, Roberto A Novoa, Albert Chiou, Justin M Ko, James Zou

These promising results suggest that our solution is feasible and can improve the quality of teledermatology care.

ALICE: Active Learning with Contrastive Natural Language Explanations

no code implementations EMNLP 2020 Weixin Liang, James Zou, Zhou Yu

We propose Active Learning with Contrastive Explanations (ALICE), an expert-in-the-loop training framework that utilizes contrastive natural language explanations to improve data efficiency in learning.

Active Learning General Classification

Competing AI: How does competition feedback affect machine learning?

no code implementations15 Sep 2020 Antonio Ginart, Eva Zhang, Yongchan Kwon, James Zou

A service that is more often queried by users, perhaps because it more accurately anticipates user preferences, is also more likely to obtain additional user data (e. g. in the form of a Yelp review).

Improving Generalization in Meta-learning via Task Augmentation

1 code implementation26 Jul 2020 Huaxiu Yao, Long-Kai Huang, Linjun Zhang, Ying WEI, Li Tian, James Zou, Junzhou Huang, Zhenhui Li

Moreover, both MetaMix and Channel Shuffle outperform state-of-the-art results by a large margin across many datasets and are compatible with existing meta-learning algorithms.


Efficient computation and analysis of distributional Shapley values

no code implementations2 Jul 2020 Yongchan Kwon, Manuel A. Rivas, James Zou

Distributional data Shapley value (DShapley) has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.

Density Estimation

Improving Adversarial Robustness via Unlabeled Out-of-Domain Data

no code implementations15 Jun 2020 Zhun Deng, Linjun Zhang, Amirata Ghorbani, James Zou

In this work, we investigate how adversarial robustness can be enhanced by leveraging out-of-domain unlabeled data.

Data Augmentation Object Recognition +1

Improving Training on Noisy Stuctured Labels

no code implementations8 Mar 2020 Abubakar Abid, James Zou

Systematic experiments on image segmentation and text tagging demonstrate the strong performance of ECN in improving training on noisy structured labels.

Semantic Segmentation

A Distributional Framework for Data Valuation

no code implementations ICML 2020 Amirata Ghorbani, Michael P. Kim, James Zou

Shapley value is a classic notion from game theory, historically used to quantify the contributions of individuals within groups, and more recently applied to assign values to data points when training machine learning models.

Approximate Data Deletion from Machine Learning Models

no code implementations24 Feb 2020 Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou

Deleting data from a trained machine learning (ML) model is a critical task in many applications.

Neuron Shapley: Discovering the Responsible Neurons

1 code implementation NeurIPS 2020 Amirata Ghorbani, James Zou

We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network.

Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data

no code implementations9 Oct 2019 Gal Yona, Amirata Ghorbani, James Zou

We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

Learning transport cost from subset correspondence

no code implementations ICLR 2020 Ruishan Liu, Akshay Balsubramani, James Zou

Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related.

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

4 code implementations25 Sep 2019 Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou

Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings.

Click-Through Rate Prediction Recommendation Systems

LitGen: Genetic Literature Recommendation Guided by Human Explanations

1 code implementation24 Sep 2019 Allen Nie, Arturo L. Pineda, Matt W. Wright Hannah Wand, Bryan Wulf, Helio A. Costa, Ronak Y. Patel, Carlos D. Bustamante, James Zou

In collaboration with the Clinical Genomic Resource (ClinGen)---the flagship NIH program for clinical curation---we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity.

Making AI Forget You: Data Deletion in Machine Learning

2 code implementations NeurIPS 2019 Antonio Ginart, Melody Y. Guan, Gregory Valiant, James Zou

Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EU's Right To Be Forgotten regulation is an example of this effort.

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

1 code implementation6 Jun 2019 Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, James Zou

Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.

Discovering Conditionally Salient Features with Statistical Guarantees

no code implementations29 May 2019 Jaime Roquero Gimenez, James Zou

Most of the work in this domain has focused on identifying globally relevant features, which are features that are related to the outcome using evidence across the entire dataset.

Feature Selection

A Knowledge Graph-based Approach for Exploring the U.S. Opioid Epidemic

no code implementations27 May 2019 Maulik R. Kamdar, Tymor Hamamsy, Shea Shelton, Ayin Vala, Tome Eftimov, James Zou, Suzanne Tamang

Statistical learning methods that use data from multiple clinical centers across the US to detect opioid over-prescribing trends and predict possible opioid misuse are required.

Data Shapley: Equitable Valuation of Data for Machine Learning

4 code implementations5 Apr 2019 Amirata Ghorbani, James Zou

As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions.

Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings

1 code implementation NAACL 2019 Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Matthew Gentzkow, Jesse Shapiro, Dan Jurafsky

We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force.

Contrastive Variational Autoencoder Enhances Salient Features

1 code implementation12 Feb 2019 Abubakar Abid, James Zou

The cVAE explicitly models latent features that are shared between the datasets, as well as those that are enriched in one dataset relative to the other, which allows the algorithm to isolate and enhance the salient latent features.

Contrastive Learning

Towards Automatic Concept-based Explanations

1 code implementation NeurIPS 2019 Amirata Ghorbani, James Wexler, James Zou, Been Kim

Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions.

Feature Importance

Concrete Autoencoders for Differentiable Feature Selection and Reconstruction

1 code implementation27 Jan 2019 Abubakar Abid, Muhammad Fatih Balin, James Zou

We introduce the concrete autoencoder, an end-to-end differentiable method for global feature selection, which efficiently identifies a subset of the most informative features and simultaneously learns a neural network to reconstruct the input data from the selected features.

Feature Selection

Large-scale Generative Modeling to Improve Automated Veterinary Disease Coding

no code implementations29 Nov 2018 Yuhui Zhang, Allen Nie, James Zou

We compare the performance of our model with several baselines in a challenging cross-hospital setting with substantial domain shift.

Minimizing Close-k Aggregate Loss Improves Classification

1 code implementation1 Nov 2018 Bryan He, James Zou

In classification, the de facto method for aggregating individual losses is the average loss.

General Classification

Contrastive Multivariate Singular Spectrum Analysis

no code implementations31 Oct 2018 Abdi-Hakin Dirie, Abubakar Abid, James Zou

We introduce Contrastive Multivariate Singular Spectrum Analysis, a novel unsupervised method for dimensionality reduction and signal decomposition of time series data.

Dimensionality Reduction Time Series

Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization

no code implementations26 Oct 2018 Jaime Roquero Gimenez, James Zou

The Model-X knockoff procedure has recently emerged as a powerful approach for feature selection with statistical guarantees.

Feature Selection

Autowarp: Learning a Warping Distance from Unlabeled Time Series Using Sequence Autoencoders

no code implementations NeurIPS 2018 Abubakar Abid, James Zou

We define a flexible and differentiable family of warping metrics, which encompasses common metrics such as DTW, Euclidean, and edit distance.

Dynamic Time Warping Time Series

Knockoffs for the mass: new feature importance statistics with false discovery guarantees

no code implementations17 Jul 2018 Jaime Roquero Gimenez, Amirata Ghorbani, James Zou

This is often impossible to do from purely observational data, and a natural relaxation is to identify features that are correlated with the outcome even conditioned on all other observed features.

Feature Importance

DeepTag: inferring all-cause diagnoses from clinical notes in under-resourced medical domain

1 code implementation28 Jun 2018 Allen Nie, Ashley Zehnder, Rodney L. Page, Arturo L. Pineda, Manuel A. Rivas, Carlos D. Bustamante, James Zou

However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free text notes.

Multiaccuracy: Black-Box Post-Processing for Fairness in Classification

no code implementations31 May 2018 Michael P. Kim, Amirata Ghorbani, James Zou

Prediction systems are successfully deployed in applications ranging from disease diagnosis, to predicting credit worthiness, to image recognition.

Fairness General Classification +1

Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions

no code implementations5 Apr 2018 Anvita Gupta, James Zou

We propose a novel feedback-loop architecture, called Feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyzer.

Stochastic EM for Shuffled Linear Regression

no code implementations2 Apr 2018 Abubakar Abid, James Zou

We consider the problem of inference in a linear regression model in which the relative ordering of the input features and output labels is not known.

CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions

1 code implementation ICML 2018 Kevin Tian, Teng Zhang, James Zou

However, in addition to the text data itself, we often have additional covariates associated with individual corpus documents---e. g. the demographic of the author, time and venue of publication---and we would like the embedding to naturally capture this information.

Tensor Decomposition

From Information Bottleneck To Activation Norm Penalty

no code implementations ICLR 2018 Allen Nie, Mihir Mongia, James Zou

Recently, a regularization method has been proposed to optimize the variational lower bound of the Information Bottleneck Lagrangian.

General Classification Image Classification +1

Learning Covariate-Specific Embeddings with Tensor Decompositions

no code implementations ICLR 2018 Kevin Tian, Teng Zhang, James Zou

In addition to the text data itself, we often have additional covariates associated with individual documents in the corpus---e. g. the demographic of the author, time and venue of publication, etc.---and we would like the embedding to naturally capture the information of the covariates.

Tensor Decomposition Word Embeddings


no code implementations ICLR 2018 Amirata Ghorbani, Abubakar Abid, James Zou

In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different}interpretations.

Feature Importance

Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes

1 code implementation22 Nov 2017 Nikhil Garg, Londa Schiebinger, Dan Jurafsky, James Zou

Word embeddings use vectors to represent words such that the geometry between vectors captures semantic relationship between the words.

Word Embeddings

NeuralFDR: Learning Discovery Thresholds from Hypothesis Features

1 code implementation NeurIPS 2017 Fei Xia, Martin J. Zhang, James Zou, David Tse

For example, in genetic association studies, each hypothesis tests the correlation between a variant and the trait.

Interpretation of Neural Networks is Fragile

1 code implementation29 Oct 2017 Amirata Ghorbani, Abubakar Abid, James Zou

In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations.

Feature Importance

The Effects of Memory Replay in Reinforcement Learning

1 code implementation18 Oct 2017 Ruishan Liu, James Zou

We show that even in this very simple setting, the amount of memory kept can substantially affect the agent's performance.


Contrastive Principal Component Analysis

1 code implementation20 Sep 2017 Abubakar Abid, Martin J. Zhang, Vivek K. Bagaria, James Zou

We present a new technique called contrastive principal component analysis (cPCA) that is designed to discover low-dimensional structure that is unique to a dataset, or enriched in one dataset relative to other data.

Denoising Feature Selection

Why Adaptively Collected Data Have Negative Bias and How to Correct for It

no code implementations7 Aug 2017 Xinkun Nie, Xiaoying Tian, Jonathan Taylor, James Zou

In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic \emph{negative} biases.

Learning Latent Space Models with Angular Constraints

no code implementations ICML 2017 Pengtao Xie, Yuntian Deng, Yi Zhou, Abhimanu Kumar, Yao-Liang Yu, James Zou, Eric P. Xing

The large model capacity of latent space models (LSMs) enables them to achieve great performance on various applications, but meanwhile renders LSMs to be prone to overfitting.

Estimating the unseen from multiple populations

2 code implementations ICML 2017 Aditi Raghunathan, Greg Valiant, James Zou

We generalize this extrapolation and related unseen estimation problems to the multiple population setting, where population $j$ has an unknown distribution $D_j$ from which we observe $n_j$ samples.

Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

no code implementations WS 2017 Shyam Upadhyay, Kai-Wei Chang, Matt Taddy, Adam Kalai, James Zou

We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a) using multilingual (i. e., more than two languages) corpora to significantly improve sense embeddings beyond what one achieves with bilingual information, and (b) uses a principled approach to learn a variable number of senses per word, in a data-driven manner.

Word Embeddings

Linear Regression with Shuffled Labels

no code implementations3 May 2017 Abubakar Abid, Ada Poon, James Zou

We study the regimes in which each estimator excels, and generalize the estimators to the setting where partial ordering information is available in the form of experiments replicated independently.

Quantifying and Reducing Stereotypes in Word Embeddings

no code implementations20 Jun 2016 Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai

Machine learning algorithms are optimized to model statistical properties of the training data.

Word Embeddings

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

no code implementations19 Jun 2016 Akash Srivastava, James Zou, Ryan P. Adams, Charles Sutton

A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria.

Quantifying the accuracy of approximate diffusions and Markov chains

no code implementations20 May 2016 Jonathan H. Huggins, James Zou

As an illustration, we apply our framework to derive finite-sample error bounds of approximate unadjusted Langevin dynamics.

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

no code implementations22 Feb 2016 Akash Srivastava, James Zou, Charles Sutton

A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria.

How much does your data exploration overfit? Controlling bias via information usage

no code implementations16 Nov 2015 Daniel Russo, James Zou

But while %the adaptive nature of exploration any data-exploration renders standard statistical theory invalid, experience suggests that different types of exploratory analysis can lead to disparate levels of bias, and the degree of bias also depends on the particulars of the data set.

Rich Component Analysis

no code implementations14 Jul 2015 Rong Ge, James Zou

In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution.

Latent Variable Models

Intersecting Faces: Non-negative Matrix Factorization With New Guarantees

no code implementations8 Jul 2015 Rong Ge, James Zou

A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work.

Cannot find the paper you are looking for? You can Submit a new open access paper.