Fast and Sample Near-Optimal Algorithms for Learning Multidimensional Histograms

23 Feb 2018  ·  Ilias Diakonikolas, Jerry Li, Ludwig Schmidt ·

We study the problem of robustly learning multi-dimensional histograms. A $d$-dimensional function $h: D \rightarrow \mathbb{R}$ is called a $k$-histogram if there exists a partition of the domain $D \subseteq \mathbb{R}^d$ into $k$ axis-aligned rectangles such that $h$ is constant within each such rectangle. Let $f: D \rightarrow \mathbb{R}$ be a $d$-dimensional probability density function and suppose that $f$ is $\mathrm{OPT}$-close, in $L_1$-distance, to an unknown $k$-histogram (with unknown partition). Our goal is to output a hypothesis that is $O(\mathrm{OPT}) + \epsilon$ close to $f$, in $L_1$-distance. We give an algorithm for this learning problem that uses $n = \tilde{O}_d(k/\epsilon^2)$ samples and runs in time $\tilde{O}_d(n)$. For any fixed dimension, our algorithm has optimal sample complexity, up to logarithmic factors, and runs in near-linear time. Prior to our work, the time complexity of the $d=1$ case was well-understood, but significant gaps in our understanding remained even for $d=2$.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here