Tree density estimation

23 Nov 2021  ·  László Györfi, Aryeh Kontorovich, Roi Weiss ·

We study the problem of estimating the density $f(\boldsymbol x)$ of a random vector ${\boldsymbol X}$ in $\mathbb R^d$. For a spanning tree $T$ defined on the vertex set $\{1,\dots ,d\}$, the tree density $f_{T}$ is a product of bivariate conditional densities. An optimal spanning tree minimizes the Kullback-Leibler divergence between $f$ and $f_{T}$. From i.i.d. data we identify an optimal tree $T^*$ and efficiently construct a tree density estimate $f_n$ such that, without any regularity conditions on the density $f$, one has $\lim_{n\to \infty} \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x=0$ a.s. For Lipschitz $f$ with bounded support, $\mathbb E \left\{ \int |f_n(\boldsymbol x)-f_{T^*}(\boldsymbol x)|d\boldsymbol x\right\}=O\big(n^{-1/4}\big)$, a dimension-free rate.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here