Search Results for author: Kamalika Chaudhuri

Found 95 papers, 24 papers with code

Beyond Discrepancy: A Closer Look at the Theory of Distribution Shift

no code implementations29 May 2024 Robi Bhattacharjee, Nick Rittler, Kamalika Chaudhuri

Instead of relying on the discrepancy, we adopt an Invariant-Risk-Minimization (IRM)-like assumption connecting the distributions, and characterize conditions under which data from a source distribution is sufficient for accurate classification of the target.

Learning Theory

Better Membership Inference Privacy Measurement through Discrepancy

no code implementations24 May 2024 Ruihan Wu, Pengrun Huang, Kamalika Chaudhuri

A major barrier to the practical deployment of these attacks is that they do not scale to large well-generalized models -- either the advantage is relatively low, or the attack involves training multiple models which is highly compute-intensive.

Uncertainty-Based Abstention in LLMs Improves Safety and Reduces Hallucinations

no code implementations16 Apr 2024 Christian Tomani, Kamalika Chaudhuri, Ivan Evtimov, Daniel Cremers, Mark Ibrahim

A major barrier towards the practical deployment of large language models (LLMs) is their lack of reliability.

Question Answering

Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds

1 code implementation3 Apr 2024 Kamalika Chaudhuri, Chuan Guo, Laurens van der Maaten, Saeed Mahloujifar, Mark Tygert

The HCR bounds appear to be insufficient on their own to guarantee confidentiality of the inputs to inference with standard deep neural nets, "ResNet-18" and "Swin-T," pre-trained on the data set, "ImageNet-1000," which contains 1000 classes.

Image Classification

DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

1 code implementation21 Mar 2024 Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo

Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable.

Memorization Retrieval

Privacy Amplification for the Gaussian Mechanism via Bounded Support

no code implementations7 Mar 2024 Shengyuan Hu, Saeed Mahloujifar, Virginia Smith, Kamalika Chaudhuri, Chuan Guo

Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.

Differentially Private Representation Learning via Image Captioning

no code implementations4 Mar 2024 Tom Sander, Yaodong Yu, Maziar Sanjabi, Alain Durmus, Yi Ma, Kamalika Chaudhuri, Chuan Guo

In this work, we show that effective DP representation learning can be done via image captioning and scaling up to internet-scale multimodal datasets.

Image Captioning Representation Learning

FairProof : Confidential and Certifiable Fairness for Neural Networks

no code implementations19 Feb 2024 Chhavi Yadav, Amrita Roy Chowdhury, Dan Boneh, Kamalika Chaudhuri

To this end, we propose FairProof - a system that uses Zero-Knowledge Proofs (a cryptographic primitive) to publicly verify the fairness of a model, while maintaining confidentiality.


Déjà Vu Memorization in Vision-Language Models

no code implementations3 Feb 2024 Bargav Jayaraman, Chuan Guo, Kamalika Chaudhuri

Vision-Language Models (VLMs) have emerged as the state-of-the-art representation learning solution, with myriads of downstream applications such as image classification, retrieval and generation.

Image Classification Memorization +2

Effective pruning of web-scale datasets based on complexity of concept clusters

1 code implementation9 Jan 2024 Amro Abbas, Evgenia Rusak, Kushal Tirumala, Wieland Brendel, Kamalika Chaudhuri, Ari S. Morcos

Using a simple and intuitive complexity measure, we are able to reduce the training cost to a quarter of regular training.

Differentially Private Multi-Site Treatment Effect Estimation

no code implementations10 Oct 2023 Tatsuki Koga, Kamalika Chaudhuri, David Page

In this work, we take a fresh look at federated learning with a focus on causal inference; specifically, we look at estimating the average treatment effect (ATE), an important task in causal inference for healthcare applications, and provide a federated analytics approach to enable ATE estimation across multiple sites along with differential privacy (DP) guarantees at each site.

Causal Inference Federated Learning

Unified Uncertainty Calibration

1 code implementation2 Oct 2023 Kamalika Chaudhuri, David Lopez-Paz

To build robust, fair, and safe AI systems, we would like our classifiers to say ``I don't know'' when facing test examples that are difficult or fall outside of the training classes. The ubiquitous strategy to predict under uncertainty is the simplistic \emph{reject-or-classify} rule: abstain from prediction if epistemic uncertainty is high, classify otherwise. Unfortunately, this recipe does not allow different sources of uncertainty to communicate with each other, produces miscalibrated predictions, and it does not allow to correct for misspecifications in our uncertainty estimates.

Large-Scale Public Data Improves Differentially Private Image Generation Quality

no code implementations4 Aug 2023 Ruihan Wu, Chuan Guo, Kamalika Chaudhuri

In this work, we look at how to use generic large-scale public data to improve the quality of differentially private image generation in Generative Adversarial Networks (GANs), and provide an improved method that uses public data effectively.

Image Generation

ViP: A Differentially Private Foundation Model for Computer Vision

1 code implementation15 Jun 2023 Yaodong Yu, Maziar Sanjabi, Yi Ma, Kamalika Chaudhuri, Chuan Guo

In this work, we propose as a mitigation measure a recipe to train foundation vision models with differential privacy (DP) guarantee.

Data Redaction from Conditional Generative Models

no code implementations18 May 2023 Zhifeng Kong, Kamalika Chaudhuri

Deep generative models are known to produce undesirable samples such as harmful content.

Do SSL Models Have Déjà Vu? A Case of Unintended Memorization in Self-supervised Learning

1 code implementation NeurIPS 2023 Casey Meehan, Florian Bordes, Pascal Vincent, Kamalika Chaudhuri, Chuan Guo

Self-supervised learning (SSL) algorithms can produce useful image representations by learning to associate different parts of natural images with one another.

Memorization Self-Supervised Learning

Can Membership Inferencing be Refuted?

no code implementations7 Mar 2023 Zhifeng Kong, Amrita Roy Chowdhury, Kamalika Chaudhuri

Given a machine learning model, a data point and some auxiliary information, the goal of an MI attack is to determine whether the data point was used to train the model.

Data-Copying in Generative Models: A Formal Framework

no code implementations25 Feb 2023 Robi Bhattacharjee, Sanjoy Dasgupta, Kamalika Chaudhuri

There has been some recent interest in detecting and addressing memorization of training data by deep neural networks.


A Two-Stage Active Learning Algorithm for $k$-Nearest Neighbors

no code implementations19 Nov 2022 Nick Rittler, Kamalika Chaudhuri

$k$-nearest neighbor classification is a popular non-parametric method because of desirable properties like automatic adaption to distributional scale changes.

Active Learning Vocal Bursts Valence Prediction

Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design

1 code implementation8 Nov 2022 Chuan Guo, Kamalika Chaudhuri, Pierre Stock, Mike Rabbat

In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model.

Federated Learning

Robust Empirical Risk Minimization with Tolerance

no code implementations2 Oct 2022 Robi Bhattacharjee, Max Hopkins, Akash Kumar, Hantao Yu, Kamalika Chaudhuri

Developing simple, sample-efficient learning algorithms for robust classification is a pressing issue in today's tech-dominated world, and current theoretical techniques requiring exponential sample complexity and complicated improper learning rules fall far from answering the need.

Robust classification

Data Redaction from Pre-trained GANs

no code implementations29 Jun 2022 Zhifeng Kong, Kamalika Chaudhuri

Large pre-trained generative models are known to occasionally output undesirable samples, which undermines their trustworthiness.

Thompson Sampling for Robust Transfer in Multi-Task Bandits

1 code implementation17 Jun 2022 Zhi Wang, Chicheng Zhang, Kamalika Chaudhuri

We study the problem of online multi-task learning where the tasks are performed within similar but not necessarily identical multi-armed bandit environments.

Multi-Task Learning Thompson Sampling

XAudit : A Theoretical Look at Auditing with Explanations

no code implementations9 Jun 2022 Chhavi Yadav, Michal Moshkovitz, Kamalika Chaudhuri

This work formalizes the role of explanations in auditing and investigates if and how model explanations can help audits.

BIG-bench Machine Learning counterfactual

Why does Throwing Away Data Improve Worst-Group Error?

no code implementations23 May 2022 Kamalika Chaudhuri, Kartik Ahuja, Martin Arjovsky, David Lopez-Paz

When facing data with imbalanced classes or groups, practitioners follow an intriguing strategy to achieve best results.

Fairness imbalanced classification +1

Privacy-Aware Compression for Federated Data Analysis

1 code implementation15 Mar 2022 Kamalika Chaudhuri, Chuan Guo, Mike Rabbat

Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed low-bandwidth user devices to estimate aggregate statistics.

Federated Learning

Understanding Rare Spurious Correlations in Neural Networks

1 code implementation10 Feb 2022 Yao-Yuan Yang, Chi-Ning Chou, Kamalika Chaudhuri

Neural networks are known to use spurious correlations such as background information for classification.

Bounding Training Data Reconstruction in Private (Deep) Learning

1 code implementation28 Jan 2022 Chuan Guo, Brian Karrer, Kamalika Chaudhuri, Laurens van der Maaten

Differential privacy is widely accepted as the de facto method for preventing data leakage in ML, and conventional wisdom suggests that it offers strong protection against privacy attacks.

Privacy Amplification by Subsampling in Time Domain

no code implementations13 Jan 2022 Tatsuki Koga, Casey Meehan, Kamalika Chaudhuri

When this is the case, we observe that the influence of a single participant (sensitivity) can be reduced by subsampling and/or filtering in time, while still meeting privacy requirements.

Time Series Time Series Analysis

Privacy Amplification via Shuffling for Linear Contextual Bandits

no code implementations11 Dec 2021 Evrard Garcelon, Kamalika Chaudhuri, Vianney Perchet, Matteo Pirotta

Contextual bandit algorithms are widely used in domains where it is desirable to provide a personalized service by leveraging contextual information, that may contain sensitive information that needs to be protected.

Multi-Armed Bandits

Privacy Implications of Shuffling

no code implementations ICLR 2022 Casey Meehan, Amrita Roy Chowdhury, Kamalika Chaudhuri, Somesh Jha

\ldp deployments are vulnerable to inference attacks as an adversary can link the noisy responses to their identity and subsequently, auxiliary information using the \textit{order} of the data.

A Shuffling Framework for Local Differential Privacy

no code implementations11 Jun 2021 Casey Meehan, Amrita Roy Chowdhury, Kamalika Chaudhuri, Somesh Jha

ldp deployments are vulnerable to inference attacks as an adversary can link the noisy responses to their identity and subsequently, auxiliary information using the order of the data.

Understanding Instance-based Interpretability of Variational Auto-Encoders

1 code implementation NeurIPS 2021 Zhifeng Kong, Kamalika Chaudhuri

Instance-based interpretation methods have been widely studied for supervised learning methods as they help explain how black box neural networks predict.

Privacy Amplification Via Bernoulli Sampling

no code implementations21 May 2021 Jacob Imola, Kamalika Chaudhuri

Balancing privacy and accuracy is a major challenge in designing differentially private machine learning algorithms.

Bayesian Inference Data Compression

Universal Approximation of Residual Flows in Maximum Mean Discrepancy

no code implementations ICML Workshop INNF 2021 Zhifeng Kong, Kamalika Chaudhuri

Normalizing flows are a class of flexible deep generative models that offer easy likelihood computation.

Location Trace Privacy Under Conditional Priors

1 code implementation23 Feb 2021 Casey Meehan, Kamalika Chaudhuri

Providing meaningful privacy to users of location based services is particularly challenging when multiple locations are revealed in a short period of time.

Consistent Non-Parametric Methods for Maximizing Robustness

no code implementations NeurIPS 2021 Robi Bhattacharjee, Kamalika Chaudhuri

Learning classifiers that are robust to adversarial examples has received a great deal of recent attention.

Connecting Interpretability and Robustness in Decision Trees through Separation

1 code implementation14 Feb 2021 Michal Moshkovitz, Yao-Yuan Yang, Kamalika Chaudhuri

We then show that a tighter bound on the size is possible when the data is linearly separated.

Sample Complexity of Adversarially Robust Linear Classification on Separated Data

no code implementations19 Dec 2020 Robi Bhattacharjee, Somesh Jha, Kamalika Chaudhuri

This shows that for very well-separated data, convergence rates of $O(\frac{1}{n})$ are achievable, which is not the case otherwise.

Adversarial Robustness Classification +1

Revisiting Model-Agnostic Private Learning: Faster Rates and Active Learning

no code implementations6 Nov 2020 Chong Liu, Yuqing Zhu, Kamalika Chaudhuri, Yu-Xiang Wang

The Private Aggregation of Teacher Ensembles (PATE) framework is one of the most promising recent approaches in differentially private learning.

Active Learning Majority Voting Classifier

Multitask Bandit Learning Through Heterogeneous Feedback Aggregation

1 code implementation29 Oct 2020 Zhi Wang, Chicheng Zhang, Manish Kumar Singh, Laurel D. Riek, Kamalika Chaudhuri

In many real-world applications, multiple agents seek to learn how to perform highly related yet slightly different tasks in an online bandit learning protocol.

The Expressive Power of a Class of Normalizing Flow Models

no code implementations31 May 2020 Zhifeng Kong, Kamalika Chaudhuri

Normalizing flows have received a great deal of recent attention as they allow flexible generative modeling as well as easy likelihood computation.

Successive Refinement of Privacy

no code implementations24 May 2020 Antonious M. Girgis, Deepesh Data, Kamalika Chaudhuri, Christina Fragouli, Suhas Diggavi

This work examines a novel question: how much randomness is needed to achieve local differential privacy (LDP)?

A Non-Parametric Test to Detect Data-Copying in Generative Models

1 code implementation12 Apr 2020 Casey Meehan, Kamalika Chaudhuri, Sanjoy Dasgupta

Detecting overfitting in generative models is an important challenge in machine learning.

BIG-bench Machine Learning

When are Non-Parametric Methods Robust?

no code implementations ICML 2020 Robi Bhattacharjee, Kamalika Chaudhuri

A growing body of research has shown that many classifiers are susceptible to {\em{adversarial examples}} -- small strategic modifications to test inputs that lead to misclassification.

A Closer Look at Accuracy vs. Robustness

1 code implementation NeurIPS 2020 Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Ruslan Salakhutdinov, Kamalika Chaudhuri

Current methods for training robust networks lead to a drop in test accuracy, which has led prior works to posit that a robustness-accuracy tradeoff may be inevitable in deep learning.

Approximate Data Deletion from Machine Learning Models

no code implementations24 Feb 2020 Zachary Izzo, Mary Anne Smart, Kamalika Chaudhuri, James Zou

Deleting data from a trained machine learning (ML) model is a critical task in many applications.

BIG-bench Machine Learning

Location Trace Privacy Under Conditional Priors

no code implementations9 Dec 2019 Casey Meehan, Kamalika Chaudhuri

Providing meaningful privacy to users of location based services is particularly challenging when multiple locations are revealed in a short period of time.

Capacity Bounded Differential Privacy

no code implementations NeurIPS 2019 Kamalika Chaudhuri, Jacob Imola, Ashwin Machanavajjhala

Differential privacy, a notion of algorithmic stability, is a gold standard for measuring the additional risk an algorithm's output poses to the privacy of a single record in the dataset.

The Label Complexity of Active Learning from Observational Data

1 code implementation NeurIPS 2019 Songbai Yan, Kamalika Chaudhuri, Tara Javidi

We provably demonstrate that the result of this is an algorithm which is statistically consistent as well as more label-efficient than prior work.

Active Learning counterfactual

An Investigation of Data Poisoning Defenses for Online Learning

no code implementations28 May 2019 Yizhen Wang, Somesh Jha, Kamalika Chaudhuri

Data poisoning attacks -- where an adversary can modify a small fraction of training data, with the goal of forcing the trained classifier to high loss -- are an important threat for machine learning in many applications.

Data Poisoning General Classification

Profile-Based Privacy for Locally Private Computations

no code implementations21 Jan 2019 Joseph Geumlek, Kamalika Chaudhuri

Differential privacy has emerged as a gold standard in privacy-preserving data analysis.

Privacy Preserving

Exploring Connections Between Active Learning and Model Extraction

no code implementations5 Nov 2018 Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, Songbai Yan

This has resulted in the surge of Machine Learning-as-a-Service (MLaaS) - cloud services that provide (a) tools and resources to learn the model, and (b) a user-friendly query interface to access the model.

Active Learning BIG-bench Machine Learning +1

The Inductive Bias of Restricted f-GANs

no code implementations12 Sep 2018 Shuang Liu, Kamalika Chaudhuri

Generative adversarial networks are a novel method for statistical inference that have achieved much empirical success; however, the factors contributing to this success remain ill-understood.

Inductive Bias

Data Poisoning Attacks against Online Learning

no code implementations27 Aug 2018 Yizhen Wang, Kamalika Chaudhuri

While there has been much prior work on data poisoning, most of it is in the offline setting, and attacks for online learning, where training data arrives in a streaming manner, are not well understood.

Data Poisoning

Active Learning with Logged Data

no code implementations ICML 2018 Songbai Yan, Kamalika Chaudhuri, Tara Javidi

We consider active learning with logged data, where labeled examples are drawn conditioned on a predetermined logging policy, and the goal is to learn a classifier on the entire population, not just conditioned on the logging policy.

Active Learning

Spectral Learning of Binomial HMMs for DNA Methylation Data

no code implementations7 Feb 2018 Chicheng Zhang, Eran A. Mukamel, Kamalika Chaudhuri

We consider learning parameters of Binomial Hidden Markov Models, which may be used to model DNA methylation data.

Computational Efficiency Tensor Decomposition

Renyi Differential Privacy Mechanisms for Posterior Sampling

no code implementations NeurIPS 2017 Joseph Geumlek, Shuang Song, Kamalika Chaudhuri

With the newly proposed privacy definition of Rényi Differential Privacy (RDP) in (Mironov, 2017), we re-examine the inherent privacy of releasing a single sample from a posterior distribution.


Rényi Differential Privacy Mechanisms for Posterior Sampling

no code implementations2 Oct 2017 Joseph Geumlek, Shuang Song, Kamalika Chaudhuri

Using a recently proposed privacy definition of R\'enyi Differential Privacy (RDP), we re-examine the inherent privacy of releasing a single sample from a posterior distribution.


Active Heteroscedastic Regression

no code implementations ICML 2017 Kamalika Chaudhuri, Prateek Jain, Nagarajan Natarajan

In this work, we consider a theoretical analysis of the label requirement of active learning for regression under a heteroscedastic noise model, where the noise depends on the instance.

Active Learning Binary Classification +1

Composition Properties of Inferential Privacy for Time-Series Data

no code implementations10 Jul 2017 Shuang Song, Kamalika Chaudhuri

With the proliferation of mobile devices and the internet of things, developing principled solutions for privacy in time series applications has become increasingly important.

Time Series Time Series Analysis

Analyzing the Robustness of Nearest Neighbors to Adversarial Examples

1 code implementation ICML 2018 Yizhen Wang, Somesh Jha, Kamalika Chaudhuri

Our analysis shows that its robustness properties depend critically on the value of k - the classifier may be inherently non-robust for small k, but its robustness approaches that of the Bayes Optimal classifier for fast-growing k. We propose a novel modified 1-nearest neighbor classifier, and guarantee its robustness in the large sample limit.

Approximation and Convergence Properties of Generative Adversarial Learning

no code implementations NeurIPS 2017 Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri

In this paper, we address these questions in a broad and unified setting by defining a notion of adversarial divergences that includes a number of recently proposed objective functions.

Variational Bayes In Private Settings (VIPS)

1 code implementation1 Nov 2016 Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling

Many applications of Bayesian data analysis involve sensitive information, motivating methods which ensure that privacy is protected.

Bayesian Inference Data Augmentation +1

Active Learning from Imperfect Labelers

no code implementations NeurIPS 2016 Songbai Yan, Kamalika Chaudhuri, Tara Javidi

We study active learning where the labeler can not only return incorrect labels but also abstain from labeling.

Active Learning

Private Topic Modeling

no code implementations14 Sep 2016 Mijung Park, James Foulds, Kamalika Chaudhuri, Max Welling

We develop a privatised stochastic variational inference method for Latent Dirichlet Allocation (LDA).

Variational Inference

Bolt-on Differential Privacy for Scalable Stochastic Gradient Descent-based Analytics

1 code implementation15 Jun 2016 Xi Wu, Fengan Li, Arun Kumar, Kamalika Chaudhuri, Somesh Jha, Jeffrey F. Naughton

This paper takes a first step to remedy this disconnect and proposes a private SGD algorithm to address \emph{both} issues in an integrated manner.

DP-EM: Differentially Private Expectation Maximization

1 code implementation23 May 2016 Mijung Park, Jimmy Foulds, Kamalika Chaudhuri, Max Welling

The iterative nature of the expectation maximization (EM) algorithm presents a challenge for privacy-preserving estimation, as each iteration increases the amount of noise needed.

Privacy Preserving

On the Theory and Practice of Privacy-Preserving Bayesian Data Analysis

no code implementations23 Mar 2016 James Foulds, Joseph Geumlek, Max Welling, Kamalika Chaudhuri

Bayesian inference has great promise for the privacy-preserving analysis of sensitive data, as posterior sampling automatically preserves differential privacy, an algorithmic notion of data privacy, under certain conditions (Dimitrakakis et al., 2014; Wang et al., 2015).

Bayesian Inference Privacy Preserving +2

Pufferfish Privacy Mechanisms for Correlated Data

no code implementations13 Mar 2016 Shuang Song, Yizhen Wang, Kamalika Chaudhuri

Since this mechanism may be computationally inefficient, we provide an additional mechanism that applies to some practical cases such as physical activity measurements across time, and is computationally efficient.

Convex Optimization For Non-Convex Problems via Column Generation

no code implementations14 Feb 2016 Julian Yarkony, Kamalika Chaudhuri

We apply column generation to approximating complex structured objects via a set of primitive structured objects under either the cross entropy or L2 loss.

Active Learning from Weak and Strong Labelers

no code implementations NeurIPS 2015 Chicheng Zhang, Kamalika Chaudhuri

This work addresses active learning with labels obtained from strong and weak labelers, where in addition to the standard active learning setting, we have an extra weak labeler which may occasionally provide incorrect labels.

Active Learning

Spectral Learning of Large Structured HMMs for Comparative Epigenomics

no code implementations NeurIPS 2015 Chicheng Zhang, Jimin Song, Kevin C Chen, Kamalika Chaudhuri

We develop a latent variable model and an efficient spectral algorithm motivated by the recent emergence of very large data sets of chromatin marks from multiple human cell types.

Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons

no code implementations31 Mar 2015 James Y. Zou, Kamalika Chaudhuri, Adam Tauman Kalai

In addition we also ask the crowd to provide binary labels to the remaining examples based on the discovered features.

Learning from Data with Heterogeneous Noise using SGD

no code implementations17 Dec 2014 Shuang Song, Kamalika Chaudhuri, Anand D. Sarwate

In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source.

The Large Margin Mechanism for Differentially Private Maximization

no code implementations NeurIPS 2014 Kamalika Chaudhuri, Daniel Hsu, Shuang Song

A basic problem in the design of privacy-preserving algorithms is the private maximization problem: the goal is to pick an item from a universe that (approximately) maximizes a data-dependent function, all under the constraint of differential privacy.

BIG-bench Machine Learning Privacy Preserving

Beyond Disagreement-based Agnostic Active Learning

no code implementations NeurIPS 2014 Chicheng Zhang, Kamalika Chaudhuri

We study agnostic active learning, where the goal is to learn a classifier in a pre-specified hypothesis class interactively with as few label queries as possible, while making no assumptions on the true function generating the labels.

Active Learning General Classification

Rates of Convergence for Nearest Neighbor Classification

no code implementations NeurIPS 2014 Kamalika Chaudhuri, Sanjoy Dasgupta

Nearest neighbor methods are a popular class of nonparametric estimators with several desirable properties, such as adaptivity to different distance scales in different regions of space.

Classification General Classification

Consistent procedures for cluster tree estimation and pruning

no code implementations5 Jun 2014 Kamalika Chaudhuri, Sanjoy Dasgupta, Samory Kpotufe, Ulrike Von Luxburg

For a density $f$ on ${\mathbb R}^d$, a {\it high-density cluster} is any connected component of $\{x: f(x) \geq \lambda\}$, for some $\lambda > 0$.


A Stability-based Validation Procedure for Differentially Private Machine Learning

no code implementations NeurIPS 2013 Kamalika Chaudhuri, Staal A. Vinterbo

Differential privacy is a cryptographically motivated definition of privacy which has gained considerable attention in the algorithms, machine-learning and data-mining communities.

BIG-bench Machine Learning

Near-optimal Differentially Private Principal Components

no code implementations NeurIPS 2012 Kamalika Chaudhuri, Anand Sarwate, Kaushik Sinha

In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output.

Near-Optimal Algorithms for Differentially-Private Principal Components

no code implementations12 Jul 2012 Kamalika Chaudhuri, Anand D. Sarwate, Kaushik Sinha

In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output.

Spectral Methods for Learning Multivariate Latent Tree Structure

no code implementations NeurIPS 2011 Animashree Anandkumar, Kamalika Chaudhuri, Daniel J. Hsu, Sham M. Kakade, Le Song, Tong Zhang

The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i. e., the graph of how the underlying hidden variables are connected to each other and to the observed variables).

Rates of convergence for the cluster tree

no code implementations NeurIPS 2010 Kamalika Chaudhuri, Sanjoy Dasgupta

For a density f on R^d, a high-density cluster is any connected component of {x: f(x) >= c}, for some c > 0.

A Parameter-free Hedging Algorithm

no code implementations NeurIPS 2009 Kamalika Chaudhuri, Yoav Freund, Daniel J. Hsu

Previous algorithms for learning in this framework have a tunable learning rate parameter, and a major barrier to using online-learning in practical applications is that it is not understood how to set this parameter optimally, particularly when the number of actions is large.

Privacy-preserving logistic regression

no code implementations NeurIPS 2008 Kamalika Chaudhuri, Claire Monteleoni

This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases.

BIG-bench Machine Learning Privacy Preserving +1

Cannot find the paper you are looking for? You can Submit a new open access paper.