Search Results for author: James Zou

Found 113 papers, 43 papers with code

Explaining the Trump Gap in Social Distancing Using COVID Discourse

no code implementations EMNLP (NLP-COVID19) 2020 Austin Van Loon, Sheridan Stewart, Brandon Waldon, Shrinidhi K Lakshmikanth, Ishan Shah, Sharath Chandra Guntuku, Garrick Sherman, James Zou, Johannes Eichstaedt

Our ability to limit the future spread of COVID-19 will in part depend on our understanding of the psychological and sociological processes that lead people to follow or reject coronavirus health behaviors.

Word Embeddings

Provable Membership Inference Privacy

no code implementations12 Nov 2022 Zachary Izzo, Jinsung Yoon, Sercan O. Arik, James Zou

However, DP's strong theoretical guarantees often come at the cost of a large drop in its utility for machine learning, and DP guarantees themselves can be difficult to interpret.

A Spectral Method for Assessing and Combining Multiple Data Visualizations

1 code implementation25 Oct 2022 Rong Ma, Eric D. Sun, James Zou

Then it leverages the eigenscores to obtain a consensus visualization, which has much improved { quality over the individual visualizations in capturing the underlying true data structure.}

Data Visualization Dimensionality Reduction

Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise

no code implementations20 Oct 2022 Haotian Ye, James Zou, Linjun Zhang

This opens a promising strategy to first train a feature learner rather than a classifier, and then perform linear probing (last layer retraining) in the test environment.

Representation Learning

C-Mixup: Improving Generalization in Regression

1 code implementation11 Oct 2022 Huaxiu Yao, Yiping Wang, Linjun Zhang, James Zou, Chelsea Finn

In this paper, we propose a simple yet powerful algorithm, C-Mixup, to improve generalization on regression tasks.

regression

Knowledge-Driven New Drug Recommendation

no code implementations11 Oct 2022 Zhenbang Wu, Huaxiu Yao, Zhe Su, David M Liebovitz, Lucas M Glass, James Zou, Chelsea Finn, Jimeng Sun

However, newly approved drugs do not have much historical prescription data and cannot leverage existing drug recommendation methods.

Few-Shot Learning Multi-Label Classification

SEAL : Interactive Tool for Systematic Error Analysis and Labeling

no code implementations11 Oct 2022 Nazneen Rajani, Weixin Liang, Lingjiao Chen, Meg Mitchell, James Zou

With the advent of Transformers, large language models (LLMs) have saturated well-known NLP benchmarks and leaderboards with high aggregate performance.

When and why vision-language models behave like bags-of-words, and what to do about it?

no code implementations4 Oct 2022 Mert Yuksekgonul, Federico Bianchi, Pratyusha Kalluri, Dan Jurafsky, James Zou

ARO consists of Visual Genome Attribution, to test the understanding of objects' properties; Visual Genome Relation, to test for relational understanding; and COCO & Flickr30k-Order, to test for order sensitivity.

Contrastive Learning Retrieval

Data Budgeting for Machine Learning

no code implementations3 Oct 2022 Xinyi Zhao, Weixin Liang, James Zou

Data is the fuel powering AI and creates tremendous value for many domains.

Ensembling improves stability and power of feature selection for deep learning models

no code implementations2 Oct 2022 Prashnna K Gyawali, Xiaoxia Liu, James Zou, Zihuai He

Despite extensive recent efforts to define different feature importance metrics for deep learning models, we identified that inherent stochasticity in the design and training of deep learning models makes commonly used feature importance scores unstable.

Feature Importance

WeightedSHAP: analyzing and improving Shapley based feature attributions

1 code implementation27 Sep 2022 Yongchan Kwon, James Zou

On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions compared to the features identified by the Shapley value.

Estimating and Explaining Model Performance When Both Covariates and Labels Shift

no code implementations18 Sep 2022 Lingjiao Chen, Matei Zaharia, James Zou

We further propose SEES, an algorithmic framework to characterize the distribution shift under SJS and to estimate a model's performance on new data without any labels.

HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions

1 code implementation18 Sep 2022 Lingjiao Chen, Zhihua Jin, Sabri Eyuboglu, Christopher Ré, Matei Zaharia, James Zou

HAPI is the first large-scale dataset of ML API usages and is a unique resource for studying ML-as-a-service (MLaaS).

object-detection Object Detection +4

Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality

1 code implementation12 Sep 2022 Kailas Vodrahalli, Justin Ko, Albert S. Chiou, Roberto Novoa, Abubakar Abid, Michelle Phung, Kiana Yekrang, Paige Petrone, James Zou, Roxana Daneshjou

To address this issue, we developed TrueImage 2. 0, an artificial intelligence (AI) model for assessing patient photo quality for telemedicine and providing real-time feedback to patients for photo quality improvement.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

1 code implementation9 Jun 2022 Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramón Risco Delgado, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Timothy Telleen-Lawton, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, ZiRui Wang, Ziyi Wu

BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models.

Common Sense Reasoning Memorization

FIFA: Making Fairness More Generalizable in Classifiers Trained on Imbalanced Data

no code implementations6 Jun 2022 Zhun Deng, Jiayao Zhang, Linjun Zhang, Ting Ye, Yates Coley, Weijie J. Su, James Zou

Specifically, FIFA encourages both classification and fairness generalization and can be flexibly combined with many existing fair learning methods with logits-based losses.

Classification Fairness

Post-hoc Concept Bottleneck Models

no code implementations31 May 2022 Mert Yuksekgonul, Maggie Wang, James Zou

Through a model-editing user study, we show that editing PCBMs via concept-level feedback can provide significant performance gains without using any data from the target domain or model retraining.

A Unified f-divergence Framework Generalizing VAE and GAN

no code implementations11 May 2022 Jaime Roquero Gimenez, James Zou

Developing deep generative models that flexibly incorporate diverse measures of probability distance is an important area of research.

Improving genetic risk prediction across diverse population by disentangling ancestry representations

no code implementations10 May 2022 Prashnna K Gyawali, Yann Le Guen, Xiaoxia Liu, Hua Tang, James Zou, Zihuai He

This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans.

Genetic Risk Prediction

Domino: Discovering Systematic Errors with Cross-Modal Embeddings

2 code implementations ICLR 2022 Sabri Eyuboglu, Maya Varma, Khaled Saab, Jean-Benoit Delbrouck, Christopher Lee-Messer, Jared Dunnmon, James Zou, Christopher Ré

In this work, we address these challenges by first designing a principled evaluation framework that enables a quantitative comparison of SDMs across 1, 235 slice discovery settings in three input domains (natural images, medical images, and time-series data).

Representation Learning Time Series

Disparities in Dermatology AI Performance on a Diverse, Curated Clinical Image Set

no code implementations15 Mar 2022 Roxana Daneshjou, Kailas Vodrahalli, Roberto A Novoa, Melissa Jenkins, Weixin Liang, Veronica Rotemberg, Justin Ko, Susan M Swetter, Elizabeth E Bailey, Olivier Gevaert, Pritam Mukherjee, Michelle Phung, Kiana Yekrang, Bradley Fong, Rachna Sahasrabudhe, Johan A. C. Allerup, Utako Okata-Karigane, James Zou, Albert Chiou

To ascertain potential biases in algorithm performance in this context, we curated the Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones.

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

2 code implementations3 Mar 2022 Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, James Zou

Our systematic analysis demonstrates that this gap is caused by a combination of model initialization and contrastive learning optimization.

Contrastive Learning Fairness +2

MetaShift: A Dataset of Datasets for Evaluating Contextual Distribution Shifts and Training Conflicts

1 code implementation ICLR 2022 Weixin Liang, James Zou

We present MetaShift--a collection of 12, 868 sets of natural images across 410 classes--to address this challenge.

Uncalibrated Models Can Improve Human-AI Collaboration

1 code implementation12 Feb 2022 Kailas Vodrahalli, Tobias Gerstenberg, James Zou

In this paper, we present an initial exploration that suggests showing AI models as more confident than they actually are, even when the original AI is well-calibrated, can improve human-AI performance (measured as the accuracy and confidence of the human's final prediction after seeing the AI advice).

Decision Making

Competition over data: how does data purchase affect users?

no code implementations26 Jan 2022 Yongchan Kwon, Antonio Ginart, James Zou

We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users.

Active Learning

Submix: Practical Private Prediction for Large-Scale Language Models

no code implementations4 Jan 2022 Antonio Ginart, Laurens van der Maaten, James Zou, Chuan Guo

Recent data-extraction attacks have exposed that language models can memorize some training samples verbatim.

Language Modelling

Improving Out-of-Distribution Robustness via Selective Augmentation

2 code implementations2 Jan 2022 Huaxiu Yao, Yu Wang, Sai Li, Linjun Zhang, Weixin Liang, James Zou, Chelsea Finn

Machine learning algorithms typically assume that training and test examples are drawn from the same distribution.

How to Learn when Data Gradually Reacts to Your Model

no code implementations13 Dec 2021 Zachary Izzo, James Zou, Lexing Ying

A recent line of work has focused on training machine learning (ML) models in the performative setting, i. e. when the data distribution reacts to the deployed model.

Explaining medical AI performance disparities across sites with confounder Shapley value analysis

no code implementations12 Nov 2021 Eric Wu, Kevin Wu, James Zou

Medical AI algorithms can often experience degraded performance when evaluated on previously unseen sites.

Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature Semantics

no code implementations10 Nov 2021 Amirata Ghorbani, Dina Berenbaum, Maor Ivgi, Yuval Dafna, James Zou

We address this limitation by introducing Feature Vectors, a new global interpretability method designed for tabular datasets.

Feature Importance

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

1 code implementation26 Oct 2021 Yongchan Kwon, James Zou

Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.

BIG-bench Machine Learning

CloudPred: Predicting Patient Phenotypes From Single-cell RNA-seq

no code implementations13 Oct 2021 Bryan He, Matthew Thomson, Meena Subramaniam, Richard Perez, Chun Jimmie Ye, James Zou

Predicting phenotype from scRNA-seq is challenging for standard machine learning methods -- the number of cells measured can vary by orders of magnitude across individuals and the cell populations are also highly heterogeneous.

Interpretable Machine Learning

Clustering Plotted Data by Image Segmentation

1 code implementation CVPR 2022 Tarek Naous, Srinjay Sarkar, Abubakar Abid, James Zou

We describe the method and compare it to ten other clustering methods on synthetic data to illustrate its advantages and disadvantages.

Image Segmentation Instance Segmentation +1

The Power of Contrast for Feature Learning: A Theoretical Analysis

no code implementations6 Oct 2021 Wenlong Ji, Zhun Deng, Ryumei Nakada, James Zou, Linjun Zhang

In this paper, (i) we provably show that contrastive learning outperforms autoencoder, a classical unsupervised learning method, for both feature recovery and downstream tasks; (ii) we also illustrate the role of labeled data in supervised contrastive learning.

Contrastive Learning Self-Supervised Learning +1

Language Models as Recommender Systems: Evaluations and Limitations

no code implementations NeurIPS Workshop ICBINB 2021 Yuhui Zhang, Hao Ding, Zeren Shui, Yifei Ma, James Zou, Anoop Deoras, Hao Wang

Pre-trained language models (PLMs) such as BERT and GPT learn general text representations and encode extensive world knowledge; thus, they can be efficiently and accurately adapted to various downstream tasks.

Movie Recommendation Session-Based Recommendations

Did the Model Change? Efficiently Assessing Machine Learning API Shifts

no code implementations29 Jul 2021 Lingjiao Chen, Tracy Cai, Matei Zaharia, James Zou

This motivated us to formulate the API shift assessment problem at a more fine-grained level as estimating how the API model's confusion matrix changes over time when the data distribution is constant.

BIG-bench Machine Learning

Do Humans Trust Advice More if it Comes from AI? An Analysis of Human-AI Interactions

1 code implementation14 Jul 2021 Kailas Vodrahalli, Roxana Daneshjou, Tobias Gerstenberg, James Zou

In decision support applications of AI, the AI algorithm's output is framed as a suggestion to a human user.

Meaningfully Debugging Model Mistakes using Conceptual Counterfactual Explanations

1 code implementation24 Jun 2021 Abubakar Abid, Mert Yuksekgonul, James Zou

Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases.

Group-Structured Adversarial Training

no code implementations18 Jun 2021 Farzan Farnia, Amirali Aghazadeh, James Zou, David Tse

Robust training methods against perturbations to the input data have received great attention in the machine learning literature.

Adversarial Training Helps Transfer Learning via Better Representations

no code implementations NeurIPS 2021 Zhun Deng, Linjun Zhang, Kailas Vodrahalli, Kenji Kawaguchi, James Zou

Recent works empirically demonstrate that adversarial training in the source data can improve the ability of models to transfer to new domains.

Transfer Learning

MLDemon: Deployment Monitoring for Machine Learning Systems

no code implementations28 Apr 2021 Antonio Ginart, Martin Zhang, James Zou

Post-deployment monitoring of ML systems is critical for ensuring reliability, especially as new user inputs can differ from the training distribution.

BIG-bench Machine Learning

Data Shapley Valuation for Efficient Batch Active Learning

no code implementations16 Apr 2021 Amirata Ghorbani, James Zou, Andre Esteva

In this work, we introduce Active Data Shapley (ADS) -- a filtering layer for batch active learning that significantly increases the efficiency of active learning by pre-selecting, using a linear time computation, the highest-value points from an unlabeled dataset.

Active Learning

Efficient Online ML API Selection for Multi-Label Classification Tasks

no code implementations18 Feb 2021 Lingjiao Chen, Matei Zaharia, James Zou

In this work, we propose FrugalMCT, a principled framework that adaptively selects the APIs to use for different data in an online fashion while respecting user's budget.

General Classification Multi-Label Classification +6

How to Learn when Data Reacts to Your Model: Performative Gradient Descent

1 code implementation15 Feb 2021 Zachary Izzo, Lexing Ying, James Zou

Performative distribution shift captures the setting where the choice of which ML model is deployed changes the data distribution.

When and How Mixup Improves Calibration

no code implementations11 Feb 2021 Linjun Zhang, Zhun Deng, Kenji Kawaguchi, James Zou

In addition, we study how Mixup improves calibration in semi-supervised learning.

Data Augmentation

Persistent Anti-Muslim Bias in Large Language Models

1 code implementation14 Jan 2021 Abubakar Abid, Maheen Farooqi, James Zou

It has been observed that large-scale language models capture undesirable societal biases, e. g. relating to race and gender; yet religious bias has been relatively unexplored.

Adversarial Text Language Modelling +1

Neural Group Testing to Accelerate Deep Learning

1 code implementation21 Nov 2020 Weixin Liang, James Zou

A key challenge of neural group testing is to modify a deep neural network so that it could test multiple samples in one forward pass.

Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset

no code implementations15 Oct 2020 Siyi Tang, Amirata Ghorbani, Rikiya Yamashita, Sameer Rehman, Jared A. Dunnmon, James Zou, Daniel L. Rubin

In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset.

Pneumonia Detection

How Does Mixup Help With Robustness and Generalization?

no code implementations ICLR 2021 Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, James Zou

For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss.

Data Augmentation

ALICE: Active Learning with Contrastive Natural Language Explanations

no code implementations EMNLP 2020 Weixin Liang, James Zou, Zhou Yu

We propose Active Learning with Contrastive Explanations (ALICE), an expert-in-the-loop training framework that utilizes contrastive natural language explanations to improve data efficiency in learning.

Active Learning Classification +1

Competing AI: How does competition feedback affect machine learning?

no code implementations15 Sep 2020 Antonio Ginart, Eva Zhang, Yongchan Kwon, James Zou

A service that is more often queried by users, perhaps because it more accurately anticipates user preferences, is also more likely to obtain additional user data (e. g. in the form of a Yelp review).

BIG-bench Machine Learning

Improving Generalization in Meta-learning via Task Augmentation

1 code implementation26 Jul 2020 Huaxiu Yao, Long-Kai Huang, Linjun Zhang, Ying WEI, Li Tian, James Zou, Junzhou Huang, Zhenhui Li

Moreover, both MetaMix and Channel Shuffle outperform state-of-the-art results by a large margin across many datasets and are compatible with existing meta-learning algorithms.

Meta-Learning

Efficient computation and analysis of distributional Shapley values

no code implementations2 Jul 2020 Yongchan Kwon, Manuel A. Rivas, James Zou

Distributional data Shapley value (DShapley) has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning.

Density Estimation

Improving Adversarial Robustness via Unlabeled Out-of-Domain Data

no code implementations15 Jun 2020 Zhun Deng, Linjun Zhang, Amirata Ghorbani, James Zou

In this work, we investigate how adversarial robustness can be enhanced by leveraging out-of-domain unlabeled data.

Adversarial Robustness Data Augmentation +2

Improving Training on Noisy Stuctured Labels

no code implementations8 Mar 2020 Abubakar Abid, James Zou

Systematic experiments on image segmentation and text tagging demonstrate the strong performance of ECN in improving training on noisy structured labels.

Image Segmentation Semantic Segmentation

A Distributional Framework for Data Valuation

no code implementations ICML 2020 Amirata Ghorbani, Michael P. Kim, James Zou

Shapley value is a classic notion from game theory, historically used to quantify the contributions of individuals within groups, and more recently applied to assign values to data points when training machine learning models.

Approximate Data Deletion from Machine Learning Models

no code implementations24 Feb 2020 Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou

Deleting data from a trained machine learning (ML) model is a critical task in many applications.

BIG-bench Machine Learning

Neuron Shapley: Discovering the Responsible Neurons

1 code implementation NeurIPS 2020 Amirata Ghorbani, James Zou

We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network.

Who's responsible? Jointly quantifying the contribution of the learning algorithm and training data

no code implementations9 Oct 2019 Gal Yona, Amirata Ghorbani, James Zou

We propose Extended Shapley as a principled framework for this problem, and experiment empirically with how it can be used to address questions of ML accountability.

Learning transport cost from subset correspondence

no code implementations ICLR 2020 Ruishan Liu, Akshay Balsubramani, James Zou

Optimal transport (OT) is a principled approach to align datasets, but a key challenge in applying OT is that we need to specify a transport cost function that accurately captures how the two datasets are related.

Mixed Dimension Embeddings with Application to Memory-Efficient Recommendation Systems

6 code implementations25 Sep 2019 Antonio Ginart, Maxim Naumov, Dheevatsa Mudigere, Jiyan Yang, James Zou

Embedding representations power machine intelligence in many applications, including recommendation systems, but they are space intensive -- potentially occupying hundreds of gigabytes in large-scale settings.

Click-Through Rate Prediction Collaborative Filtering +1

LitGen: Genetic Literature Recommendation Guided by Human Explanations

1 code implementation24 Sep 2019 Allen Nie, Arturo L. Pineda, Matt W. Wright Hannah Wand, Bryan Wulf, Helio A. Costa, Ronak Y. Patel, Carlos D. Bustamante, James Zou

In collaboration with the Clinical Genomic Resource (ClinGen)---the flagship NIH program for clinical curation---we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity.

Making AI Forget You: Data Deletion in Machine Learning

3 code implementations NeurIPS 2019 Antonio Ginart, Melody Y. Guan, Gregory Valiant, James Zou

Intense recent discussions have focused on how to provide individuals with control over when their data can and cannot be used --- the EU's Right To Be Forgotten regulation is an example of this effort.

BIG-bench Machine Learning

Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild

1 code implementation6 Jun 2019 Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, James Zou

Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks.

BIG-bench Machine Learning

Discovering Conditionally Salient Features with Statistical Guarantees

no code implementations29 May 2019 Jaime Roquero Gimenez, James Zou

Most of the work in this domain has focused on identifying globally relevant features, which are features that are related to the outcome using evidence across the entire dataset.

Association

A Knowledge Graph-based Approach for Exploring the U.S. Opioid Epidemic

no code implementations27 May 2019 Maulik R. Kamdar, Tymor Hamamsy, Shea Shelton, Ayin Vala, Tome Eftimov, James Zou, Suzanne Tamang

Statistical learning methods that use data from multiple clinical centers across the US to detect opioid over-prescribing trends and predict possible opioid misuse are required.

Data Shapley: Equitable Valuation of Data for Machine Learning

4 code implementations5 Apr 2019 Amirata Ghorbani, James Zou

As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions.

BIG-bench Machine Learning

Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings

1 code implementation NAACL 2019 Dorottya Demszky, Nikhil Garg, Rob Voigt, James Zou, Matthew Gentzkow, Jesse Shapiro, Dan Jurafsky

We provide an NLP framework to uncover four linguistic dimensions of political polarization in social media: topic choice, framing, affect and illocutionary force.

Contrastive Variational Autoencoder Enhances Salient Features

1 code implementation12 Feb 2019 Abubakar Abid, James Zou

The cVAE explicitly models latent features that are shared between the datasets, as well as those that are enriched in one dataset relative to the other, which allows the algorithm to isolate and enhance the salient latent features.

Contrastive Learning

Towards Automatic Concept-based Explanations

2 code implementations NeurIPS 2019 Amirata Ghorbani, James Wexler, James Zou, Been Kim

Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions.

Feature Importance

Concrete Autoencoders for Differentiable Feature Selection and Reconstruction

2 code implementations27 Jan 2019 Abubakar Abid, Muhammad Fatih Balin, James Zou

We introduce the concrete autoencoder, an end-to-end differentiable method for global feature selection, which efficiently identifies a subset of the most informative features and simultaneously learns a neural network to reconstruct the input data from the selected features.

General Classification General Classification Selection

Large-scale Generative Modeling to Improve Automated Veterinary Disease Coding

no code implementations29 Nov 2018 Yuhui Zhang, Allen Nie, James Zou

We compare the performance of our model with several baselines in a challenging cross-hospital setting with substantial domain shift.

Minimizing Close-k Aggregate Loss Improves Classification

1 code implementation1 Nov 2018 Bryan He, James Zou

In classification, the de facto method for aggregating individual losses is the average loss.

Classification General Classification

Contrastive Multivariate Singular Spectrum Analysis

no code implementations31 Oct 2018 Abdi-Hakin Dirie, Abubakar Abid, James Zou

We introduce Contrastive Multivariate Singular Spectrum Analysis, a novel unsupervised method for dimensionality reduction and signal decomposition of time series data.

Dimensionality Reduction Time Series

Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization

no code implementations26 Oct 2018 Jaime Roquero Gimenez, James Zou

The Model-X knockoff procedure has recently emerged as a powerful approach for feature selection with statistical guarantees.

Autowarp: Learning a Warping Distance from Unlabeled Time Series Using Sequence Autoencoders

no code implementations NeurIPS 2018 Abubakar Abid, James Zou

We define a flexible and differentiable family of warping metrics, which encompasses common metrics such as DTW, Euclidean, and edit distance.

Astronomy Dynamic Time Warping +1

Knockoffs for the mass: new feature importance statistics with false discovery guarantees

no code implementations17 Jul 2018 Jaime Roquero Gimenez, Amirata Ghorbani, James Zou

This is often impossible to do from purely observational data, and a natural relaxation is to identify features that are correlated with the outcome even conditioned on all other observed features.

Feature Importance

DeepTag: inferring all-cause diagnoses from clinical notes in under-resourced medical domain

1 code implementation28 Jun 2018 Allen Nie, Ashley Zehnder, Rodney L. Page, Arturo L. Pineda, Manuel A. Rivas, Carlos D. Bustamante, James Zou

However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free text notes.

Multiaccuracy: Black-Box Post-Processing for Fairness in Classification

1 code implementation31 May 2018 Michael P. Kim, Amirata Ghorbani, James Zou

Prediction systems are successfully deployed in applications ranging from disease diagnosis, to predicting credit worthiness, to image recognition.

Classification Fairness +2

Feedback GAN (FBGAN) for DNA: a Novel Feedback-Loop Architecture for Optimizing Protein Functions

no code implementations5 Apr 2018 Anvita Gupta, James Zou

We propose a novel feedback-loop architecture, called Feedback GAN (FBGAN), to optimize the synthetic gene sequences for desired properties using an external function analyzer.

Stochastic EM for Shuffled Linear Regression

no code implementations2 Apr 2018 Abubakar Abid, James Zou

We consider the problem of inference in a linear regression model in which the relative ordering of the input features and output labels is not known.

regression

CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions

1 code implementation ICML 2018 Kevin Tian, Teng Zhang, James Zou

However, in addition to the text data itself, we often have additional covariates associated with individual corpus documents---e. g. the demographic of the author, time and venue of publication---and we would like the embedding to naturally capture this information.

Natural Questions Tensor Decomposition

INTERPRETATION OF NEURAL NETWORK IS FRAGILE

no code implementations ICLR 2018 Amirata Ghorbani, Abubakar Abid, James Zou

In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different}interpretations.

BIG-bench Machine Learning Feature Importance

Learning Covariate-Specific Embeddings with Tensor Decompositions

no code implementations ICLR 2018 Kevin Tian, Teng Zhang, James Zou

In addition to the text data itself, we often have additional covariates associated with individual documents in the corpus---e. g. the demographic of the author, time and venue of publication, etc.---and we would like the embedding to naturally capture the information of the covariates.

Natural Questions Tensor Decomposition +1

From Information Bottleneck To Activation Norm Penalty

no code implementations ICLR 2018 Allen Nie, Mihir Mongia, James Zou

Recently, a regularization method has been proposed to optimize the variational lower bound of the Information Bottleneck Lagrangian.

General Classification Image Classification +1

Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes

1 code implementation22 Nov 2017 Nikhil Garg, Londa Schiebinger, Dan Jurafsky, James Zou

Word embeddings use vectors to represent words such that the geometry between vectors captures semantic relationship between the words.

Word Embeddings

NeuralFDR: Learning Discovery Thresholds from Hypothesis Features

1 code implementation NeurIPS 2017 Fei Xia, Martin J. Zhang, James Zou, David Tse

For example, in genetic association studies, each hypothesis tests the correlation between a variant and the trait.

Association

Interpretation of Neural Networks is Fragile

2 code implementations29 Oct 2017 Amirata Ghorbani, Abubakar Abid, James Zou

In this paper, we show that interpretation of deep learning predictions is extremely fragile in the following sense: two perceptively indistinguishable inputs with the same predicted label can be assigned very different interpretations.

BIG-bench Machine Learning Feature Importance

The Effects of Memory Replay in Reinforcement Learning

1 code implementation18 Oct 2017 Ruishan Liu, James Zou

We show that even in this very simple setting, the amount of memory kept can substantially affect the agent's performance.

Q-Learning reinforcement-learning +1

Contrastive Principal Component Analysis

1 code implementation20 Sep 2017 Abubakar Abid, Martin J. Zhang, Vivek K. Bagaria, James Zou

We present a new technique called contrastive principal component analysis (cPCA) that is designed to discover low-dimensional structure that is unique to a dataset, or enriched in one dataset relative to other data.

Denoising

Why Adaptively Collected Data Have Negative Bias and How to Correct for It

no code implementations7 Aug 2017 Xinkun Nie, Xiaoying Tian, Jonathan Taylor, James Zou

In this paper, we prove that when the data collection procedure satisfies natural conditions, then sample means of the data have systematic \emph{negative} biases.

Learning Latent Space Models with Angular Constraints

no code implementations ICML 2017 Pengtao Xie, Yuntian Deng, Yi Zhou, Abhimanu Kumar, Yao-Liang Yu, James Zou, Eric P. Xing

The large model capacity of latent space models (LSMs) enables them to achieve great performance on various applications, but meanwhile renders LSMs to be prone to overfitting.

Estimating the unseen from multiple populations

2 code implementations ICML 2017 Aditi Raghunathan, Greg Valiant, James Zou

We generalize this extrapolation and related unseen estimation problems to the multiple population setting, where population $j$ has an unknown distribution $D_j$ from which we observe $n_j$ samples.

Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

no code implementations WS 2017 Shyam Upadhyay, Kai-Wei Chang, Matt Taddy, Adam Kalai, James Zou

We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a) using multilingual (i. e., more than two languages) corpora to significantly improve sense embeddings beyond what one achieves with bilingual information, and (b) uses a principled approach to learn a variable number of senses per word, in a data-driven manner.

Word Embeddings

Linear Regression with Shuffled Labels

no code implementations3 May 2017 Abubakar Abid, Ada Poon, James Zou

We study the regimes in which each estimator excels, and generalize the estimators to the setting where partial ordering information is available in the form of experiments replicated independently.

regression

Quantifying and Reducing Stereotypes in Word Embeddings

no code implementations20 Jun 2016 Tolga Bolukbasi, Kai-Wei Chang, James Zou, Venkatesh Saligrama, Adam Kalai

Machine learning algorithms are optimized to model statistical properties of the training data.

Word Embeddings

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

no code implementations19 Jun 2016 Akash Srivastava, James Zou, Ryan P. Adams, Charles Sutton

A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria.

Quantifying the accuracy of approximate diffusions and Markov chains

no code implementations20 May 2016 Jonathan H. Huggins, James Zou

As an illustration, we apply our framework to derive finite-sample error bounds of approximate unadjusted Langevin dynamics.

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

no code implementations22 Feb 2016 Akash Srivastava, James Zou, Charles Sutton

A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria.

How much does your data exploration overfit? Controlling bias via information usage

no code implementations16 Nov 2015 Daniel Russo, James Zou

But while %the adaptive nature of exploration any data-exploration renders standard statistical theory invalid, experience suggests that different types of exploratory analysis can lead to disparate levels of bias, and the degree of bias also depends on the particulars of the data set.

Rich Component Analysis

no code implementations14 Jul 2015 Rong Ge, James Zou

In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution.

Intersecting Faces: Non-negative Matrix Factorization With New Guarantees

no code implementations8 Jul 2015 Rong Ge, James Zou

A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work.

Cannot find the paper you are looking for? You can Submit a new open access paper.