Search Results for author: Lunjia Hu

Found 17 papers, 2 papers with code

Testing Calibration in Subquadratic Time

1 code implementation20 Feb 2024 Lunjia Hu, Kevin Tian, Chutong Yang

Motivated by [BGHN23], which proposed a rigorous framework for measuring distances to calibration, we initiate the algorithmic study of calibration through the lens of property testing.

Decision Making

On Computationally Efficient Multi-Class Calibration

no code implementations12 Feb 2024 Parikshit Gopalan, Lunjia Hu, Guy N. Rothblum

Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e. g. is this an image of an animal?

Binary Classification Computational Efficiency

When Does Optimizing a Proper Loss Yield Calibration?

no code implementations NeurIPS 2023 Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran

Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated.

Loss Minimization Yields Multicalibration for Large Neural Networks

no code implementations19 Apr 2023 Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, Preetum Nakkiran

We show that minimizing the squared loss over all neural nets of size $n$ implies multicalibration for all but a bounded number of unlucky values of $n$.

Fairness

Generative Models of Huge Objects

no code implementations24 Feb 2023 Lunjia Hu, Inbal Livni-Navon, Omer Reingold

In this we extend the work of Goldreich, Goldwasser and Nussboim (SICOMP 2010) that focused on the implementation of huge objects that are indistinguishable from the uniform distribution, satisfying some global properties (which they coined truthfulness).

Fairness Learning Theory +1

A Unifying Theory of Distance from Calibration

no code implementations30 Nov 2022 Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran

We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors.

Comparative Learning: A Sample Complexity Theory for Two Hypothesis Classes

no code implementations16 Nov 2022 Lunjia Hu, Charlotte Peale

We show that the sample complexity of comparative learning is characterized by the mutual VC dimension $\mathsf{VC}(S, B)$ which we define to be the maximum size of a subset shattered by both $S$ and $B$.

Learning Theory PAC learning +1

Subspace Recovery from Heterogeneous Data with Non-isotropic Noise

no code implementations24 Oct 2022 John Duchi, Vitaly Feldman, Lunjia Hu, Kunal Talwar

Our goal is to recover the linear subspace shared by $\mu_1,\ldots,\mu_n$ using the data points from all users, where every data point from user $i$ is formed by adding an independent mean-zero noise vector to $\mu_i$.

Federated Learning

Loss Minimization through the Lens of Outcome Indistinguishability

no code implementations16 Oct 2022 Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, Udi Wieder

This decomposition highlights the utility of a new multi-group fairness notion that we call calibrated multiaccuracy, which lies in between multiaccuracy and multicalibration.

Fairness

Omnipredictors for Constrained Optimization

no code implementations15 Sep 2022 Lunjia Hu, Inbal Livni-Navon, Omer Reingold, Chutong Yang

In this paper, we introduce omnipredictors for constrained optimization and study their complexity and implications.

Fairness

Metric Entropy Duality and the Sample Complexity of Outcome Indistinguishability

no code implementations9 Mar 2022 Lunjia Hu, Charlotte Peale, Omer Reingold

In this setting, we show that the sample complexity of outcome indistinguishability is characterized by the fat-shattering dimension of $D$.

PAC learning

Near-Optimal Explainable $k$-Means for All Dimensions

no code implementations29 Jun 2021 Moses Charikar, Lunjia Hu

Given $d$-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose $k$-means cost is at most $k^{1 - 2/d}\,\mathrm{poly}(d\log k)$ times the minimum cost achievable by a clustering without the explainability constraint, assuming $k, d\ge 2$.

Clustering

Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers

no code implementations18 Aug 2020 Lunjia Hu, Omer Reingold

We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where most coordinates of every example may be missing and $\varepsilon N$ examples may be arbitrarily corrupted.

Robust and On-the-fly Dataset Denoising for Image Classification

no code implementations ECCV 2020 Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma

We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples, and use this information to remove noisy examples from the training set.

Classification counterfactual +4

Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation

1 code implementation NeurIPS 2018 Liwei Wang, Lunjia Hu, Jiayuan Gu, Yue Wu, Zhiqiang Hu, Kun He, John Hopcroft

The theory gives a complete characterization of the structure of neuron activation subspace matches, where the core concepts are maximum match and simple match which describe the overall and the finest similarity between sets of neurons in two networks respectively.

Active Tolerant Testing

no code implementations1 Nov 2017 Avrim Blum, Lunjia Hu

In this work, we give the first algorithms for tolerant testing of nontrivial classes in the active model: estimating the distance of a target function to a hypothesis class C with respect to some arbitrary distribution D, using only a small number of label queries to a polynomial-sized pool of unlabeled examples drawn from D. Specifically, we show that for the class D of unions of d intervals on the line, we can estimate the error rate of the best hypothesis in the class to an additive error epsilon from only $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$.

Quadratic Upper Bound for Recursive Teaching Dimension of Finite VC Classes

no code implementations18 Feb 2017 Lunjia Hu, Ruihan Wu, Tianhong Li, Li-Wei Wang

The RTD of a concept class $\mathcal C \subseteq \{0, 1\}^n$, introduced by Zilles et al. (2011), is a combinatorial complexity measure characterized by the worst-case number of examples necessary to identify a concept in $\mathcal C$ according to the recursive teaching model.

Cannot find the paper you are looking for? You can Submit a new open access paper.