Search Results for author: Alon Orlitsky

Found 29 papers, 1 papers with code

Optimal Sequential Maximization: One Interview is Enough!

no code implementations ICML 2020 Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati

To derive these results we consider a probabilistic setting where several candidates for a position are asked multiple questions with the goal of finding who has the highest probability of answering interview questions correctly.

Linear Regression using Heterogeneous Data Batches

no code implementations5 Sep 2023 Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky

A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship.

regression

TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm

no code implementations15 Feb 2022 Yi Hao, Ayush Jain, Alon Orlitsky, Vaishakh Ravindrakumar

We derive a near-linear-time and essentially sample-optimal estimator that establishes $c_{t, d}=2$ for all $(t, d)\ne(1, 0)$.

Vocal Bursts Valence Prediction

Robust estimation algorithms don't need to know the corruption level

no code implementations11 Feb 2022 Ayush Jain, Alon Orlitsky, Vaishakh Ravindrakumar

However, their vast majority approach optimal accuracy only when given a tight upper bound on the fraction of corrupt data.

Linear-Sample Learning of Low-Rank Distributions

no code implementations NeurIPS 2020 Ayush Jain, Alon Orlitsky

Many latent-variable applications, including community detection, collaborative filtering, genomic analysis, and NLP, model data as generated by low-rank matrices.

Collaborative Filtering Community Detection

SURF: A Simple, Universal, Robust, Fast Distribution Learning Algorithm

no code implementations NeurIPS 2020 Yi Hao, Ayush Jain, Alon Orlitsky, Vaishakh Ravindrakumar

Sample- and computationally-efficient distribution estimation is a fundamental tenet in statistics and machine learning.

Optimal Robust Learning of Discrete Distributions from Batches

no code implementations ICML 2020 Ayush Jain, Alon Orlitsky

Previous estimators for this setting ran in exponential time, and for some regimes required a suboptimal number of batches.

Collaborative Filtering Federated Learning

Unified Sample-Optimal Property Estimation in Near-Linear Time

no code implementations NeurIPS 2019 Yi Hao, Alon Orlitsky

We consider the fundamental learning problem of estimating properties of distributions over large domains.

The Broad Optimality of Profile Maximum Likelihood

1 code implementation NeurIPS 2019 Yi Hao, Alon Orlitsky

In particular, for every alphabet size $k$ and desired accuracy $\varepsilon$: $\textbf{Distribution estimation}$ Under $\ell_1$ distance, PML yields optimal $\Theta(k/(\varepsilon^2\log k))$ sample complexity for sorted-distribution estimation, and a PML-based estimator empirically outperforms the Good-Turing estimator on the actual distribution; $\textbf{Additive property estimation}$ For a broad class of additive properties, the PML plug-in estimator uses just four times the sample size required by the best estimator to achieve roughly twice its error, with exponentially higher confidence; $\boldsymbol{\alpha}\textbf{-R\'enyi entropy estimation}$ For integer $\alpha>1$, the PML plug-in estimator has optimal $k^{1-1/\alpha}$ sample complexity; for non-integer $\alpha>3/4$, the PML plug-in estimator has sample complexity lower than the state of the art; $\textbf{Identity testing}$ In testing whether an unknown distribution is equal to or at least $\varepsilon$ far from a given distribution in $\ell_1$ distance, a PML-based tester achieves the optimal sample complexity up to logarithmic factors of $k$.

Data Amplification: A Unified and Competitive Approach to Property Estimation

no code implementations NeurIPS 2018 Yi Hao, Alon Orlitsky, Ananda T. Suresh, Yihong Wu

We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just $2n$ samples to achieve the performance attained by the empirical estimator with $n\sqrt{\log n}$ samples.

Data Amplification: Instance-Optimal Property Estimation

no code implementations ICML 2020 Yi Hao, Alon Orlitsky

For a large variety of distribution properties including four of the most popular ones and for every underlying distribution, they achieve the accuracy that the empirical-frequency plug-in estimators would attain using a logarithmic-factor more samples.

On Learning Markov Chains

no code implementations NeurIPS 2018 Yi Hao, Alon Orlitsky, Venkatadheeraj Pichapati

We consider two problems related to the min-max risk (expected loss) of estimating an unknown $k$-state Markov chain from its $n$ sequential samples: predicting the conditional distribution of the next sample with respect to the KL-divergence, and estimating the transition matrix with respect to a natural loss induced by KL or a more general $f$-divergence measure.

The Limits of Maxing, Ranking, and Preference Learning

no code implementations ICML 2018 Moein Falahatgar, Ayush Jain, Alon Orlitsky, Venkatadheeraj Pichapati, Vaishakh Ravindrakumar

We present a comprehensive understanding of three important problems in PAC preference learning: maximum selection (maxing), ranking, and estimating all pairwise preference probabilities, in the adaptive setting.

Maxing and Ranking with Few Assumptions

no code implementations NeurIPS 2017 Moein Falahatgar, Yi Hao, Alon Orlitsky, Venkatadheeraj Pichapati, Vaishakh Ravindrakumar

PAC maximum selection (maxing) and ranking of $n$ elements via random pairwise comparisons have diverse applications and have been studied under many models and assumptions.

A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions

no code implementations ICML 2017 Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

Symmetric distribution properties such as support size, support coverage, entropy, and proximity to uniformity, arise in many applications.

Maximum Selection and Ranking under Noisy Comparisons

no code implementations ICML 2017 Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Theertha Suresh

We consider $(\epsilon,\delta)$-PAC maximum-selection and ranking for general probabilistic models whose comparisons probabilities satisfy strong stochastic transitivity and stochastic triangle inequality.

A Unified Maximum Likelihood Approach for Optimal Distribution Property Estimation

no code implementations9 Nov 2016 Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Theertha Suresh

The advent of data science has spurred interest in estimating properties of distributions over large alphabets.

Competitive Distribution Estimation: Why is Good-Turing Good

no code implementations NeurIPS 2015 Alon Orlitsky, Ananda Theertha Suresh

Second, they estimate every distribution nearly as well as the best estimator designed with prior knowledge of the exact distribution, but as all natural estimators, restricted to assign the same probability to all symbols appearing the same number of times. Specifically, for distributions over $k$ symbols and $n$ samples, we show that for both comparisons, a simple variant of Good-Turing estimator is always within KL divergence of $(3+o(1))/n^{1/3}$ from the best estimator, and that a more involved estimator is within $\tilde{\mathcal{O}}(\min(k/n, 1/\sqrt n))$.

Estimating the number of unseen species: A bird in the hand is worth $\log n $ in the bush

no code implementations23 Nov 2015 Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu

We derive a class of estimators that $\textit{provably}$ predict $U$ not just for constant $t>1$, but all the way up to $t$ proportional to $\log n$.

Faster Algorithms for Testing under Conditional Sampling

no code implementations16 Apr 2015 Moein Falahatgar, Ashkan Jafarpour, Alon Orlitsky, Venkatadheeraj Pichapathi, Ananda Theertha Suresh

There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size $k$.

Competitive Distribution Estimation

no code implementations27 Mar 2015 Alon Orlitsky, Ananda Theertha Suresh

We also provide an estimator that runs in linear time and incurs competitive regret of $\tilde{\mathcal{O}}(\min(k/n, 1/\sqrt n))$, and show that for natural estimators this competitive regret is inevitable.

Estimating Renyi Entropy of Discrete Distributions

no code implementations2 Aug 2014 Jayadev Acharya, Alon Orlitsky, Ananda Theertha Suresh, Himanshu Tyagi

It was recently shown that estimating the Shannon entropy $H({\rm p})$ of a discrete $k$-symbol distribution ${\rm p}$ requires $\Theta(k/\log k)$ samples, a number that grows near-linearly in the support size.

Universal Compression of Envelope Classes: Tight Characterization via Poisson Sampling

no code implementations29 May 2014 Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

The Poisson-sampling technique eliminates dependencies among symbol appearances in a random sequence.

Near-optimal-sample estimators for spherical Gaussian mixtures

no code implementations NeurIPS 2014 Jayadev Acharya, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh

For mixtures of any $k$ $d$-dimensional spherical Gaussians, we derive an intuitive spectral-estimator that uses $\mathcal{O}_k\bigl(\frac{d\log^2d}{\epsilon^4}\bigr)$ samples and runs in time $\mathcal{O}_{k,\epsilon}(d^3\log^5 d)$, both significantly lower than previously known.

Cannot find the paper you are looking for? You can Submit a new open access paper.