You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 22 Mar 2022 • Frank Nielsen, Ke Sun

A key technique of machine learning and computer vision is to embed discrete weighted graphs into continuous spaces for further downstream processing.

no code implementations • 7 Dec 2021 • Pascal Mattia Esser, Frank Nielsen

We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.

no code implementations • 22 Jul 2021 • Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces.

no code implementations • 22 Jul 2021 • Gautier Marti, Victor Goubet, Frank Nielsen

We propose a methodology to approximate conditional distributions in the elliptope of correlation matrices based on conditional generative adversarial networks.

no code implementations • 13 Jul 2021 • Frank Nielsen

Since the Jeffreys divergence between Gaussian mixture models is not available in closed-form, various techniques with pros and cons have been proposed in the literature to either estimate, approximate, or lower and upper bound this divergence.

1 code implementation • 1 Jul 2021 • Vaden Masrani, Rob Brekelmans, Thang Bui, Frank Nielsen, Aram Galstyan, Greg Ver Steeg, Frank Wood

Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average.

no code implementations • 19 Feb 2021 • Frank Nielsen

We generalize the Jensen-Shannon divergence by considering a variational definition with respect to a generic mean extending thereby the notion of Sibson's information radius.

Quantization Information Theory Information Theory

no code implementations • 15 Feb 2021 • Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

Natural-gradient descent (NGD) on structured parameter spaces (e. g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations.

no code implementations • 29 Jan 2021 • Frank Nielsen, Kazuki Okamura

We prove that the $f$-divergences between univariate Cauchy distributions are all symmetric, and can be expressed as strictly increasing scalar functions of the symmetric chi-squared divergence.

Information Theory Information Theory Statistics Theory Statistics Theory

no code implementations • 11 Jan 2021 • Frank Nielsen

We study information projections with respect to statistical $f$-divergences between any two location-scale families.

Information Theory Information Theory

no code implementations • NeurIPS Workshop DL-IG 2020 • Rob Brekelmans, Frank Nielsen, Alireza Makhzani, Aram Galstyan, Greg Ver Steeg

The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance sampling.

1 code implementation • NeurIPS Workshop DL-IG 2020 • Rob Brekelmans, Vaden Masrani, Thang Bui, Frank Wood, Aram Galstyan, Greg Ver Steeg, Frank Nielsen

Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target.

no code implementations • 12 Jun 2020 • Frank Nielsen

We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams.

no code implementations • 5 Mar 2020 • Frank Nielsen, Richard Nock

It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, $\alpha$-divergences, and Jeffreys' divergences between densities belonging to a same exponential family have generic closed-form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family.

1 code implementation • 19 Feb 2020 • Gaëtan Hadjeres, Frank Nielsen

Distances between probability distributions that take into account the geometry of their sample space, like the Wasserstein or the Maximum Mean Discrepancy (MMD) distances have received a lot of attention in machine learning as they can, for instance, be used to compare probability distributions with disjoint supports.

no code implementations • 27 Nov 2019 • Ke Sun, Frank Nielsen

This letter introduces an abstract learning problem called the "set embedding": The objective is to map sets into probability distributions so as to lose less information.

no code implementations • 9 Oct 2019 • Frank Nielsen

The dualistic structure of statistical manifolds in information geometry yields eight types of geodesic triangles passing through three given points, the triangle vertices.

no code implementations • 19 Sep 2019 • Frank Nielsen, Gaëtan Hadjeres

We then define the strictly quasiconvex Bregman divergences as the limit case of scaled and skewed quasiconvex Jensen divergences, and report a simple closed-form formula which shows that these divergences are only pseudo-divergences at countably many inflection points of the generators.

no code implementations • 27 May 2019 • Ke Sun, Frank Nielsen

Why do deep neural networks (DNNs) benefit from very high dimensional parameter spaces?

no code implementations • 8 Apr 2019 • Frank Nielsen

The Jensen-Shannon divergence is a renown bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the average mixture distribution.

no code implementations • 14 Mar 2019 • Frank Nielsen, Gaëtan Hadjeres

We consider both finite and infinite power chi expansions of $f$-divergences derived from Taylor's expansions of smooth generators, and elaborate on cases where these expansions yield closed-form formula, bounded approximations, or analytic divergence series expressions of $f$-divergences.

1 code implementation • ICLR 2019 • Gaëtan Hadjeres, Frank Nielsen

This paper presents the Variation Network (VarNet), a generative model providing means to manipulate the high-level attributes of a given input.

no code implementations • 9 Jan 2019 • Frank Nielsen

The traditional Minkowski distances are induced by the corresponding Minkowski norms in real-valued vector spaces.

no code implementations • 19 Dec 2018 • Frank Nielsen, Ke Sun

We experimentally evaluate our new family of distances by quantifying the upper bounds of several jointly convex distances between statistical mixtures, and by proposing a novel efficient method to learn Gaussian mixture models (GMMs) by simplifying kernel density estimators with respect to our distance.

no code implementations • 25 Oct 2018 • Erika Gomes-Gonçalves, Henryk Gzyl, Frank Nielsen

Separable Bregman divergences induce Riemannian metric spaces that are isometric to the Euclidean space after monotone embeddings.

no code implementations • 22 Oct 2018 • Frank Nielsen, Richard Nock

Distances are fundamental primitives whose choice significantly impacts the performances of algorithms in machine learning and signal processing.

2 code implementations • ICLR 2019 • Giorgio Patrini, Rianne van den Berg, Patrick Forré, Marcello Carioni, Samarth Bhargav, Max Welling, Tim Genewein, Frank Nielsen

We show that minimizing the p-Wasserstein distance between the generator and the true data distribution is equivalent to the unconstrained min-min optimization of the p-Wasserstein distance between the encoder aggregated posterior and the prior in latent space, plus a reconstruction error.

no code implementations • 17 Aug 2018 • Frank Nielsen

In this survey, we describe the fundamental differential-geometric structures of information manifolds, state the fundamental theorem of information geometry, and illustrate some use cases of these information manifolds in information sciences.

no code implementations • 29 Jun 2018 • Frank Nielsen, Ke Sun

The total variation distance is a core statistical distance between probability measures that satisfies the metric axioms, with value always falling in $[0, 1]$.

1 code implementation • 1 Jun 2018 • Frank Nielsen, Ke Sun

We propose a new generic type of stochastic neurons, called $q$-neurons, that considers activation functions based on Jackson's $q$-derivatives with stochastic parameters $q$.

no code implementations • 20 Mar 2018 • Frank Nielsen, Gaëtan Hadjeres

When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters.

no code implementations • 29 Sep 2017 • Frank Nielsen

We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties.

no code implementations • 19 Sep 2017 • Gaëtan Hadjeres, Frank Nielsen

We demonstrate its efficiency on the task of generating melodies satisfying positional constraints in the style of the soprano parts of the J. S.

no code implementations • 3 Sep 2017 • Gaëtan Hadjeres, Frank Nielsen

These musical sequences belong to a given corpus (or style) and it is obvious that a good distance on musical sequences should take this information into account; being able to define a distance ex nihilo which could be applicable to all music styles seems implausible.

Information Retrieval Sound

no code implementations • 2 Aug 2017 • Frank Nielsen, Richard Nock

The information geometry induced by the Bregman generator set to the Shannon negentropy on this space yields a dually flat space called the mixture family manifold.

no code implementations • ICML 2017 • Ke Sun, Frank Nielsen

Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks.

no code implementations • 14 Jul 2017 • Gaëtan Hadjeres, Frank Nielsen, François Pachet

VAEs (Variational AutoEncoders) have proved to be powerful in the context of density modeling and have been used in a variety of contexts for creative purposes.

no code implementations • 10 Apr 2017 • Richard Nock, Frank Nielsen

In Valiant's model of evolution, a class of representations is evolvable iff a polynomial-time process of random mutations guided by selection converges with high probability to a representation as $\epsilon$-close as desired from the optimal one, for any required $\epsilon>0$.

no code implementations • 3 Apr 2017 • Frank Nielsen, Ke Sun

In the Hilbert simplex geometry, the distance is the non-separable Hilbert's metric distance which satisfies the property of information monotonicity with distance level set functions described by polytope boundaries.

no code implementations • 1 Mar 2017 • Gautier Marti, Frank Nielsen, Mikołaj Bińkowski, Philippe Donnat

We review the state of the art of clustering financial time series and the study of their correlations alongside other interaction networks.

no code implementations • 16 Feb 2017 • Frank Nielsen, Richard Nock

Comparative convexity is a generalization of convexity relying on abstract notions of means.

no code implementations • 14 Jan 2017 • Frank Nielsen, Ke Sun, Stéphane Marchand-Maillet

We describe a framework to build distances by measuring the tightness of inequalities, and introduce the notion of proper statistical divergences and improper pseudo-divergences.

no code implementations • 9 Dec 2016 • Frank Nielsen, Richard Nock

We present a series of closed-form maximum entropy upper bounds for the differential entropy of a continuous univariate random variable and study the properties of that series.

5 code implementations • ICML 2017 • Gaëtan Hadjeres, François Pachet, Frank Nielsen

This paper introduces DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces.

1 code implementation • 30 Oct 2016 • Gautier Marti, Sebastien Andler, Frank Nielsen, Philippe Donnat

We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset.

no code implementations • 22 Sep 2016 • Frank Nielsen, Boris Muzellec, Richard Nock

We consider the supervised classification problem of machine learning in Cayley-Klein projective geometries: We show how to learn a curved Mahalanobis metric distance corresponding to either the hyperbolic geometry or the elliptic geometry using the Large Margin Nearest Neighbor (LMNN) framework.

1 code implementation • 15 Sep 2016 • Boris Muzellec, Richard Nock, Giorgio Patrini, Frank Nielsen

We also present the first application of optimal transport to the problem of ecological inference, that is, the reconstruction of joint distributions from their marginals, a problem of large interest in the social sciences.

no code implementations • 20 Jun 2016 • Ke Sun, Frank Nielsen

Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks.

no code implementations • 19 Jun 2016 • Frank Nielsen, Ke Sun

Information-theoretic measures such as the entropy, cross-entropy and the Kullback-Leibler divergence between two mixture models is a core primitive in many signal processing tasks.

no code implementations • 28 Apr 2016 • Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat

This clustering methodology leverages copulas which are distributions encoding the dependence structure between several random variables.

no code implementations • 6 Apr 2016 • Frank Nielsen, Richard Nock

Matrix data sets are common nowadays like in biomedical imaging where the Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) modality produces data sets of 3D symmetric positive definite matrices anchored at voxel positions capturing the anisotropic diffusion properties of water molecules in biological tissues.

no code implementations • 14 Mar 2016 • Junlin Yao, Frank Nielsen

State-of-the-art methods via subspace clustering seek to solve the problem in two steps: First, an affinity matrix is built from data, with appearance features or motion patterns.

no code implementations • 13 Mar 2016 • Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat

Researchers have used from 30 days to several years of daily returns as source data for clustering financial time series based on their correlations.

no code implementations • 8 Feb 2016 • Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni

We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss.

no code implementations • 3 Feb 2016 • Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen

For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds --- state of the art contenders appear to be significantly more complex and / or display less favorable (approximation) properties.

no code implementations • 3 Feb 2016 • Frank Nielsen

But more precisely, what do we mean by information in images?

no code implementations • 27 Sep 2015 • Gautier Marti, Frank Nielsen, Philippe Donnat

This paper presents a new methodology for clustering multivariate time series leveraging optimal transport between copulas.

no code implementations • 23 Jun 2014 • Frank Nielsen, Richard Nock

This novel heuristic can improve Hartigan's $k$-means when it has converged to a local minimum.

no code implementations • 11 Mar 2014 • Frank Nielsen, Richard Nock

We present a generic dynamic programming method to compute the optimal clustering of $n$ scalar elements into $k$ pairwise disjoint intervals.

no code implementations • 20 Jan 2014 • Frank Nielsen

When no cost incurs for correct classification and unit cost is charged for misclassification, Bayes' test reduces to the maximum a posteriori decision rule, and Bayes risk simplifies to Bayes' error, the probability of error.

no code implementations • 29 Mar 2013 • Frank Nielsen

Clustering histograms can be performed using the celebrated $k$-means centroid-based algorithm.

no code implementations • NeurIPS 2008 • Richard Nock, Frank Nielsen

Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classification calibration, ties up the minimization of the surrogates and classification risks, and left as an important problem the algorithmic questions about the minimization of these surrogates.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.