no code implementations • 21 Mar 2024 • Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo
Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable.
no code implementations • 7 Mar 2024 • Shengyuan Hu, Saeed Mahloujifar, Virginia Smith, Kamalika Chaudhuri, Chuan Guo
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
no code implementations • 4 Mar 2024 • Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo
In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.
no code implementations • 19 Feb 2024 • Chhavi Yadav, Amrita Roy Chowdhury, Dan Boneh, Kamalika Chaudhuri
To this end, we propose FairProof - a system that uses Zero-Knowledge Proofs (a cryptographic primitive) to publicly verify the fairness of a model, while maintaining confidentiality.
no code implementations • 3 Feb 2024 • Bargav Jayaraman, Chuan Guo, Kamalika Chaudhuri
Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation.
1 code implementation • 9 Jan 2024 • Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos
Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training.
no code implementations • 10 Oct 2023 • Tatsuki Koga, Kamalika Chaudhuri, David Page
In this work, we take a fresh look at federated learning with a focus on causal inference; specifically, we look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications, and provide a federated analytics approach to enable ATE estimation across multiple sites along with differential privacy (DP) guarantees at each site.
1 code implementation • 2 Oct 2023 • Kamalika Chaudhuri, David Lopez-Paz
To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes. The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise. Unfortunately, this recipe does not allow different sources of uncertainty to communicate with each other, produces miscalibrated predictions, and it does not allow to correct for misspecifications in our uncertainty estimates.
no code implementations • 4 Aug 2023 • Ruihan Wu, Chuan Guo, Kamalika Chaudhuri
In this work, we look at how to use generic large-scale public data to improve the quality of differentially private image generation in Generative Adversarial Networks (GANs), and provide an improved method that uses public data effectively.
1 code implementation • 15 Jun 2023 • Yaodong Yu, Maziar Sanjabi, Yi Ma, Kamalika Chaudhuri, Chuan Guo
In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee.
no code implementations • 18 May 2023 • Zhifeng Kong, Kamalika Chaudhuri
Deep generative models are known to produce undesirable samples such as harmful content.
1 code implementation • NeurIPS 2023 • Casey Meehan, Florian Bordes, Pascal Vincent, Kamalika Chaudhuri, Chuan Guo
Self-supervised learning (SSL) algorithms can produce useful image representations by learning to associate different parts of natural images with one another.
no code implementations • 7 Mar 2023 • Zhifeng Kong, Amrita Roy Chowdhury, Kamalika Chaudhuri
Given a machine learning model, a data point and some auxiliary information, the goal of an MI attack is to determine whether the data point was used to train the model.
no code implementations • 25 Feb 2023 • Robi Bhattacharjee, Sanjoy Dasgupta, Kamalika Chaudhuri
There has been some recent interest in detecting and addressing memorization of training data by deep neural networks.
no code implementations • 19 Nov 2022 • Nick Rittler, Kamalika Chaudhuri
$k$-nearest neighbor classification is a popular non-parametric method because of desirable properties like automatic adaption to distributional scale changes.
1 code implementation • 8 Nov 2022 • Chuan Guo, Kamalika Chaudhuri, Pierre Stock, Mike Rabbat
In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model.
no code implementations • 2 Oct 2022 • Robi Bhattacharjee, Max Hopkins, Akash Kumar, Hantao Yu, Kamalika Chaudhuri
Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need.
no code implementations • 29 Jun 2022 • Zhifeng Kong, Kamalika Chaudhuri
Large pre-trained generative models are known to occasionally output undesirable samples, which undermines their trustworthiness.
1 code implementation • 17 Jun 2022 • Zhi Wang, Chicheng Zhang, Kamalika Chaudhuri
We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments.
no code implementations • 9 Jun 2022 • Chhavi Yadav, Michal Moshkovitz, Kamalika Chaudhuri
This work formalizes the role of explanations in auditing and investigates if and how model explanations can help audits.
no code implementations • 23 May 2022 • Kamalika Chaudhuri, Kartik Ahuja, Martin Arjovsky, David Lopez-Paz
When facing data with imbalanced classes or groups, practitioners follow an intriguing strategy to achieve best results.
no code implementations • ACL 2022 • Casey Meehan, Khalil Mrini, Kamalika Chaudhuri
User language data can contain highly sensitive personal content.
1 code implementation • 15 Mar 2022 • Kamalika Chaudhuri, Chuan Guo, Mike Rabbat
Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed low-bandwidth user devices to estimate aggregate statistics.
1 code implementation • 10 Feb 2022 • Yao-Yuan Yang, Chi-Ning Chou, Kamalika Chaudhuri
Neural networks are known to use spurious correlations such as background information for classification.
1 code implementation • 28 Jan 2022 • Chuan Guo, Brian Karrer, Kamalika Chaudhuri, Laurens van der Maaten
Differential privacy is widely accepted as the de facto method for preventing data leakage in ML, and conventional wisdom suggests that it offers strong protection against privacy attacks.
no code implementations • 13 Jan 2022 • Tatsuki Koga, Casey Meehan, Kamalika Chaudhuri
When this is the case, we observe that the influence of a single participant (sensitivity) can be reduced by subsampling and/or filtering in time, while still meeting privacy requirements.
no code implementations • 11 Dec 2021 • Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta
Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected.
no code implementations • ICLR 2022 • Casey Meehan, Amrita Roy Chowdhury, Kamalika Chaudhuri, Somesh Jha
\ldp deployments are vulnerable to inference attacks as an adversary can link the noisy responses to their identity and subsequently, auxiliary information using the \textit{order} of the data.
no code implementations • 14 Sep 2021 • Chhavi Yadav, Kamalika Chaudhuri
We attribute this to training set subsampling for IFs.
no code implementations • 11 Jun 2021 • Casey Meehan, Amrita Roy Chowdhury, Kamalika Chaudhuri, Somesh Jha
ldp deployments are vulnerable to inference attacks as an adversary can link the noisy responses to their identity and subsequently, auxiliary information using the order of the data.
1 code implementation • NeurIPS 2021 • Zhifeng Kong, Kamalika Chaudhuri
Instance-based interpretation methods have been widely studied for supervised learning methods as they help explain how black box neural networks predict.
no code implementations • 21 May 2021 • Jacob Imola, Kamalika Chaudhuri
Balancing privacy and accuracy is a major challenge in designing differentially private machine learning algorithms.
no code implementations • ICML Workshop INNF 2021 • Zhifeng Kong, Kamalika Chaudhuri
Normalizing flows are a class of flexible deep generative models that offer easy likelihood computation.
1 code implementation • 23 Feb 2021 • Casey Meehan, Kamalika Chaudhuri
Providing meaningful privacy to users of location based services is particularly challenging when multiple locations are revealed in a short period of time.
no code implementations • NeurIPS 2021 • Robi Bhattacharjee, Kamalika Chaudhuri
Learning classifiers that are robust to adversarial examples has received a great deal of recent attention.
1 code implementation • 14 Feb 2021 • Michal Moshkovitz, Yao-Yuan Yang, Kamalika Chaudhuri
We then show that a tighter bound on the size is possible when the data is linearly separated.
no code implementations • 19 Dec 2020 • Robi Bhattacharjee, Somesh Jha, Kamalika Chaudhuri
This shows that for very well-separated data, convergence rates of $O(\frac{1}{n})$ are achievable, which is not the case otherwise.
1 code implementation • 17 Nov 2020 • Yao-Yuan Yang, Cyrus Rashtchian, Ruslan Salakhutdinov, Kamalika Chaudhuri
Overall, adversarially robust networks resemble a nearest neighbor classifier when it comes to OOD data.
no code implementations • 6 Nov 2020 • Chong Liu, Yuqing Zhu, Kamalika Chaudhuri, Yu-Xiang Wang
The Private Aggregation of Teacher Ensembles (PATE) framework is one of the most promising recent approaches in differentially private learning.
1 code implementation • 29 Oct 2020 • Zhi Wang, Chicheng Zhang, Manish Kumar Singh, Laurel D. Riek, Kamalika Chaudhuri
In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol.
no code implementations • 10 Aug 2020 • Rosario Cammarota, Matthias Schunter, Anand Rajan, Fabian Boemer, Ágnes Kiss, Amos Treiber, Christian Weinert, Thomas Schneider, Emmanuel Stapf, Ahmad-Reza Sadeghi, Daniel Demmler, Joshua Stock, Huili Chen, Siam Umar Hussain, Sadegh Riazi, Farinaz Koushanfar, Saransh Gupta, Tajan Simunic Rosing, Kamalika Chaudhuri, Hamid Nejatollahi, Nikil Dutt, Mohsen Imani, Kim Laine, Anuj Dubey, Aydin Aysu, Fateme Sadat Hosseini, Chengmo Yang, Eric Wallace, Pamela Norton
Additionally, such systems should also use Privacy-Enhancing Technologies (PETs) to protect customers' data at any time.
no code implementations • 31 May 2020 • Zhifeng Kong, Kamalika Chaudhuri
Normalizing flows have received a great deal of recent attention as they allow flexible generative modeling as well as easy likelihood computation.
no code implementations • 24 May 2020 • Antonious M. Girgis, Deepesh Data, Kamalika Chaudhuri, Christina Fragouli, Suhas Diggavi
This work examines a novel question: how much randomness is needed to achieve local differential privacy (LDP)?
1 code implementation • 12 Apr 2020 • Casey Meehan, Kamalika Chaudhuri, Sanjoy Dasgupta
Detecting overfitting in generative models is an important challenge in machine learning.
no code implementations • ICML 2020 • Robi Bhattacharjee, Kamalika Chaudhuri
A growing body of research has shown that many classifiers are susceptible to {\em{adversarial examples}} -- small strategic modifications to test inputs that lead to misclassification.
1 code implementation • NeurIPS 2020 • Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Ruslan Salakhutdinov, Kamalika Chaudhuri
Current methods for training robust networks lead to a drop in test accuracy, which has led prior works to posit that a robustness-accuracy tradeoff may be inevitable in deep learning.
no code implementations • 24 Feb 2020 • Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou
Deleting data from a trained machine learning (ML) model is a critical task in many applications.
no code implementations • 9 Dec 2019 • Casey Meehan, Kamalika Chaudhuri
Providing meaningful privacy to users of location based services is particularly challenging when multiple locations are revealed in a short period of time.
no code implementations • NeurIPS 2019 • Kamalika Chaudhuri, Jacob Imola, Ashwin Machanavajjhala
Differential privacy, a notion of algorithmic stability, is a gold standard for measuring the additional risk an algorithm's output poses to the privacy of a single record in the dataset.
1 code implementation • 7 Jun 2019 • Yao-Yuan Yang, Cyrus Rashtchian, Yizhen Wang, Kamalika Chaudhuri
To test our defense, we provide a novel attack that applies to a wide range of non-parametric classifiers.
1 code implementation • NeurIPS 2019 • Songbai Yan, Kamalika Chaudhuri, Tara Javidi
We provably demonstrate that the result of this is an algorithm which is statistically consistent as well as more label-efficient than prior work.
no code implementations • 28 May 2019 • Yizhen Wang, Somesh Jha, Kamalika Chaudhuri
Data poisoning attacks -- where an adversary can modify a small fraction of training data, with the goal of forcing the trained classifier to high loss -- are an important threat for machine learning in many applications.
no code implementations • 21 Jan 2019 • Joseph Geumlek, Kamalika Chaudhuri
Differential privacy has emerged as a gold standard in privacy-preserving data analysis.
no code implementations • 5 Nov 2018 • Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, Songbai Yan
This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) - cloud services that provide (a) tools and resources to learn the model, and (b) a user-friendly query interface to access the model.
no code implementations • 12 Sep 2018 • Shuang Liu, Kamalika Chaudhuri
Generative adversarial networks are a novel method for statistical inference that have achieved much empirical success; however, the factors contributing to this success remain ill-understood.
no code implementations • 27 Aug 2018 • Yizhen Wang, Kamalika Chaudhuri
While there has been much prior work on data poisoning, most of it is in the offline setting, and attacks for online learning, where training data arrives in a streaming manner, are not well understood.
no code implementations • ICML 2018 • Songbai Yan, Kamalika Chaudhuri, Tara Javidi
We consider active learning with logged data, where labeled examples are drawn conditioned on a predetermined logging policy, and the goal is to learn a classifier on the entire population, not just conditioned on the logging policy.
no code implementations • 7 Feb 2018 • Chicheng Zhang, Eran A. Mukamel, Kamalika Chaudhuri
We consider learning parameters of Binomial Hidden Markov Models, which may be used to model DNA methylation data.
no code implementations • NeurIPS 2017 • Joseph Geumlek, Shuang Song, Kamalika Chaudhuri
With the newly proposed privacy definition of Rényi Differential Privacy (RDP) in (Mironov, 2017), we re-examine the inherent privacy of releasing a single sample from a posterior distribution.
no code implementations • 2 Oct 2017 • Joseph Geumlek, Shuang Song, Kamalika Chaudhuri
Using a recently proposed privacy definition of R\'enyi Differential Privacy (RDP), we re-examine the inherent privacy of releasing a single sample from a posterior distribution.
no code implementations • ICML 2017 • Kamalika Chaudhuri, Prateek Jain, Nagarajan Natarajan
In this work, we consider a theoretical analysis of the label requirement of active learning for regression under a heteroscedastic noise model, where the noise depends on the instance.
no code implementations • 10 Jul 2017 • Shuang Song, Kamalika Chaudhuri
With the proliferation of mobile devices and the internet of things, developing principled solutions for privacy in time series applications has become increasingly important.
1 code implementation • ICML 2018 • Yizhen Wang, Somesh Jha, Kamalika Chaudhuri
Our analysis shows that its robustness properties depend critically on the value of k - the classifier may be inherently non-robust for small k, but its robustness approaches that of the Bayes Optimal classifier for fast-growing k. We propose a novel modified 1-nearest neighbor classifier, and guarantee its robustness in the large sample limit.
no code implementations • NeurIPS 2017 • Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri
In this paper, we address these questions in a broad and unified setting by defining a notion of adversarial divergences that includes a number of recently proposed objective functions.
1 code implementation • 1 Nov 2016 • Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling
Many applications of Bayesian data analysis involve sensitive information, motivating methods which ensure that privacy is protected.
no code implementations • NeurIPS 2016 • Songbai Yan, Kamalika Chaudhuri, Tara Javidi
We study active learning where the labeler can not only return incorrect labels but also abstain from labeling.
no code implementations • 14 Sep 2016 • Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling
We develop a privatised stochastic variational inference method for Latent Dirichlet Allocation (LDA).
1 code implementation • 15 Jun 2016 • Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, Jeffrey F. Naughton
This paper takes a first step to remedy this disconnect and proposes a private SGD algorithm to address \emph{both} issues in an integrated manner.
1 code implementation • 23 May 2016 • Mijung Park, Jimmy Foulds, Kamalika Chaudhuri, Max Welling
The iterative nature of the expectation maximization (EM) algorithm presents a challenge for privacy-preserving estimation, as each iteration increases the amount of noise needed.
no code implementations • 21 Apr 2016 • Chicheng Zhang, Kamalika Chaudhuri
In this paper, we address both challenges.
no code implementations • 23 Mar 2016 • James Foulds, Joseph Geumlek, Max Welling, Kamalika Chaudhuri
Bayesian inference has great promise for the privacy-preserving analysis of sensitive data, as posterior sampling automatically preserves differential privacy, an algorithmic notion of data privacy, under certain conditions (Dimitrakakis et al., 2014; Wang et al., 2015).
no code implementations • 13 Mar 2016 • Shuang Song, Yizhen Wang, Kamalika Chaudhuri
Since this mechanism may be computationally inefficient, we provide an additional mechanism that applies to some practical cases such as physical activity measurements across time, and is computationally efficient.
no code implementations • 14 Feb 2016 • Julian Yarkony, Kamalika Chaudhuri
We apply column generation to approximating complex structured objects via a set of primitive structured objects under either the cross entropy or L2 loss.
no code implementations • NeurIPS 2015 • Kamalika Chaudhuri, Sham M. Kakade, Praneeth Netrapalli, Sujay Sanghavi
Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem.
no code implementations • NeurIPS 2015 • Chicheng Zhang, Kamalika Chaudhuri
This work addresses active learning with labels obtained from strong and weak labelers, where in addition to the standard active learning setting, we have an extra weak labeler which may occasionally provide incorrect labels.
no code implementations • NeurIPS 2015 • Kamalika Chaudhuri, Sham Kakade, Praneeth Netrapalli, Sujay Sanghavi
Provided certain conditions hold on the model class, we provide a two-stage active learning algorithm for this problem.
no code implementations • NeurIPS 2015 • Chicheng Zhang, Jimin Song, Kevin C Chen, Kamalika Chaudhuri
We develop a latent variable model and an efficient spectral algorithm motivated by the recent emergence of very large data sets of chromatin marks from multiple human cell types.
no code implementations • 31 Mar 2015 • James Y. Zou, Kamalika Chaudhuri, Adam Tauman Kalai
In addition we also ask the crowd to provide binary labels to the remaining examples based on the discovered features.
no code implementations • 17 Dec 2014 • Shuang Song, Kamalika Chaudhuri, Anand D. Sarwate
In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source.
no code implementations • NeurIPS 2014 • Kamalika Chaudhuri, Daniel Hsu, Shuang Song
A basic problem in the design of privacy-preserving algorithms is the private maximization problem: the goal is to pick an item from a universe that (approximately) maximizes a data-dependent function, all under the constraint of differential privacy.
no code implementations • NeurIPS 2014 • Chicheng Zhang, Kamalika Chaudhuri
We study agnostic active learning, where the goal is to learn a classifier in a pre-specified hypothesis class interactively with as few label queries as possible, while making no assumptions on the true function generating the labels.
no code implementations • NeurIPS 2014 • Kamalika Chaudhuri, Sanjoy Dasgupta
Nearest neighbor methods are a popular class of nonparametric estimators with several desirable properties, such as adaptivity to different distance scales in different regions of space.
no code implementations • 5 Jun 2014 • Kamalika Chaudhuri, Sanjoy Dasgupta, Samory Kpotufe, Ulrike Von Luxburg
For a density $f$ on ${\mathbb R}^d$, a {\it high-density cluster} is any connected component of $\{x: f(x) \geq \lambda\}$, for some $\lambda > 0$.
no code implementations • NeurIPS 2013 • Kamalika Chaudhuri, Staal A. Vinterbo
Differential privacy is a cryptographically motivated definition of privacy which has gained considerable attention in the algorithms, machine-learning and data-mining communities.
no code implementations • NeurIPS 2012 • Kamalika Chaudhuri, Anand Sarwate, Kaushik Sinha
In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output.
no code implementations • 12 Jul 2012 • Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha
In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output.
no code implementations • NeurIPS 2011 • Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang
The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i. e., the graph of how the underlying hidden variables are connected to each other and to the observed variables).
no code implementations • NeurIPS 2010 • Kamalika Chaudhuri, Sanjoy Dasgupta
For a density f on R^d, a high-density cluster is any connected component of {x: f(x) >= c}, for some c > 0.
no code implementations • NeurIPS 2009 • Kamalika Chaudhuri, Yoav Freund, Daniel J. Hsu
Previous algorithms for learning in this framework have a tunable learning rate parameter, and a major barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large.
no code implementations • NeurIPS 2008 • Kamalika Chaudhuri, Claire Monteleoni
This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases.