Search Results for author: Frank Nielsen

Found 73 papers, 10 papers with code

Approximation and bounding techniques for the Fisher-Rao distances

no code implementations15 Mar 2024 Frank Nielsen

Uniparametric and biparametric statistical models always have Fisher Hessian metrics, and in general a simple test allows to check whether the Fisher information matrix yields a Hessian metric or not.

Tempered Calculus for ML: Application to Hyperbolic Model Embedding

no code implementations6 Feb 2024 Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth

Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc.

Divergences induced by dual subtractive and divisive normalizations of exponential families and their convex deformations

no code implementations20 Dec 2023 Frank Nielsen

Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning among others.

The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

no code implementations22 Nov 2023 Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities.

Optimal Transport with Tempered Exponential Measures

no code implementations7 Sep 2023 Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "\`a-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, "\`a-la-Sinkhorn-Cuturi", which gets near-linear approximation algorithms but leads to maximally un-sparse plans.

Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions

no code implementations20 Jul 2023 Frank Nielsen

We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions.

Clustering

Product Jacobi-Theta Boltzmann machines with score matching

no code implementations10 Mar 2023 Andrea Pasquale, Daniel Krefl, Stefano Carrazza, Frank Nielsen

The estimation of probability density functions is a non trivial task that over the last years has been tackled with machine learning techniques.

Simplifying Momentum-based Positive-definite Submanifold Optimization with Applications to Deep Learning

1 code implementation20 Feb 2023 Wu Lin, Valentin Duruisseaux, Melvin Leok, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

Riemannian submanifold optimization with momentum is computationally challenging because, to ensure that the iterates remain on the submanifold, we often need to solve difficult differential equations.

A numerical approximation method for the Fisher-Rao distance between multivariate normal distributions

no code implementations16 Feb 2023 Frank Nielsen

We consider experimentally the linear interpolation curves in the ordinary, natural and expectation parameterizations of the normal distributions, and compare these curves with a curve derived from the Calvo and Oller's isometric embedding of the Fisher-Rao $d$-variate normal manifold into the cone of $(d+1)\times (d+1)$ symmetric positive-definite matrices [Journal of multivariate analysis 35. 2 (1990): 223-242].

Variational Representations of Annealing Paths: Bregman Information under Monotonic Embedding

no code implementations15 Sep 2022 Rob Brekelmans, Frank Nielsen

Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest.

On the Influence of Enforcing Model Identifiability on Learning dynamics of Gaussian Mixture Models

no code implementations17 Jun 2022 Pascal Mattia Esser, Frank Nielsen

A common way to learn and analyze statistical models is to consider operations in the model parameter space.

Non-linear Embeddings in Hilbert Simplex Geometry

no code implementations22 Mar 2022 Frank Nielsen, Ke Sun

A key technique of machine learning and computer vision is to embed discrete weighted graphs into continuous spaces for further downstream processing.

Towards Modeling and Resolving Singular Parameter Spaces using Stratifolds

no code implementations7 Dec 2021 Pascal Mattia Esser, Frank Nielsen

We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.

cCorrGAN: Conditional Correlation GAN for Learning Empirical Conditional Distributions in the Elliptope

no code implementations22 Jul 2021 Gautier Marti, Victor Goubet, Frank Nielsen

We propose a methodology to approximate conditional distributions in the elliptope of correlation matrices based on conditional generative adversarial networks.

Structured second-order methods via natural gradient descent

no code implementations22 Jul 2021 Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces.

Second-order methods

Fast approximations of the Jeffreys divergence between univariate Gaussian mixture models via exponential polynomial densities

no code implementations13 Jul 2021 Frank Nielsen

Since the Jeffreys divergence between Gaussian mixture models is not available in closed-form, various techniques with pros and cons have been proposed in the literature to either estimate, approximate, or lower and upper bound this divergence.

Model Selection

q-Paths: Generalizing the Geometric Annealing Path using Power Means

1 code implementation1 Jul 2021 Vaden Masrani, Rob Brekelmans, Thang Bui, Frank Nielsen, Aram Galstyan, Greg Ver Steeg, Frank Wood

Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average.

Bayesian Inference

On a Variational Definition for the Jensen-Shannon Symmetrization of Distances based on the Information Radius

no code implementations19 Feb 2021 Frank Nielsen

We generalize the Jensen-Shannon divergence by considering a variational definition with respect to a generic mean extending thereby the notion of Sibson's information radius.

Quantization Information Theory Information Theory

Tractable structured natural gradient descent using local parameterizations

no code implementations15 Feb 2021 Wu Lin, Frank Nielsen, Mohammad Emtiyaz Khan, Mark Schmidt

Natural-gradient descent (NGD) on structured parameter spaces (e. g., low-rank covariances) is computationally challenging due to difficult Fisher-matrix computations.

Variational Inference

On $f$-divergences between Cauchy distributions

no code implementations29 Jan 2021 Frank Nielsen, Kazuki Okamura

We prove that the $f$-divergences between univariate Cauchy distributions are all symmetric, and can be expressed as strictly increasing scalar functions of the symmetric chi-squared divergence.

Information Theory Information Theory Statistics Theory Statistics Theory

On information projections between multivariate elliptical and location-scale families

no code implementations11 Jan 2021 Frank Nielsen

We study information projections with respect to statistical $f$-divergences between any two location-scale families.

Information Theory Information Theory

Likelihood Ratio Exponential Families

no code implementations NeurIPS Workshop DL-IG 2020 Rob Brekelmans, Frank Nielsen, Alireza Makhzani, Aram Galstyan, Greg Ver Steeg

The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance sampling.

LEMMA

Annealed Importance Sampling with q-Paths

2 code implementations NeurIPS Workshop DL-IG 2020 Rob Brekelmans, Vaden Masrani, Thang Bui, Frank Wood, Aram Galstyan, Greg Ver Steeg, Frank Nielsen

Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target.

On Voronoi diagrams and dual Delaunay complexes on the information-geometric Cauchy manifolds

no code implementations12 Jun 2020 Frank Nielsen

We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams.

Cumulant-free closed-form formulas for some common (dis)similarities between densities of an exponential family

no code implementations5 Mar 2020 Frank Nielsen, Richard Nock

It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, $\alpha$-divergences, and Jeffreys' divergences between densities belonging to a same exponential family have generic closed-form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family.

Schoenberg-Rao distances: Entropy-based and geometry-aware statistical Hilbert distances

1 code implementation19 Feb 2020 Gaëtan Hadjeres, Frank Nielsen

Distances between probability distributions that take into account the geometry of their sample space, like the Wasserstein or the Maximum Mean Discrepancy (MMD) distances have received a lot of attention in machine learning as they can, for instance, be used to compare probability distributions with disjoint supports.

BIG-bench Machine Learning Density Estimation

Information-Geometric Set Embeddings (IGSE): From Sets to Probability Distributions

no code implementations27 Nov 2019 Ke Sun, Frank Nielsen

This letter introduces an abstract learning problem called the "set embedding": The objective is to map sets into probability distributions so as to lose less information.

On geodesic triangles with right angles in a dually flat space

no code implementations9 Oct 2019 Frank Nielsen

The dualistic structure of statistical manifolds in information geometry yields eight types of geodesic triangles passing through three given points, the triangle vertices.

A note on the quasiconvex Jensen divergences and the quasiconvex Bregman divergences derived thereof

no code implementations19 Sep 2019 Frank Nielsen, Gaëtan Hadjeres

We then define the strictly quasiconvex Bregman divergences as the limit case of scaled and skewed quasiconvex Jensen divergences, and report a simple closed-form formula which shows that these divergences are only pseudo-divergences at countably many inflection points of the generators.

A Geometric Modeling of Occam's Razor in Deep Learning

no code implementations27 May 2019 Ke Sun, Frank Nielsen

Why do deep neural networks (DNNs) benefit from very high dimensional parameter spaces?

On a generalization of the Jensen-Shannon divergence and the JS-symmetrization of distances relying on abstract means

no code implementations8 Apr 2019 Frank Nielsen

The Jensen-Shannon divergence is a renown bounded symmetrization of the unbounded Kullback-Leibler divergence which measures the total Kullback-Leibler divergence to the average mixture distribution.

Clustering

On power chi expansions of $f$-divergences

no code implementations14 Mar 2019 Frank Nielsen, Gaëtan Hadjeres

We consider both finite and infinite power chi expansions of $f$-divergences derived from Taylor's expansions of smooth generators, and elaborate on cases where these expansions yield closed-form formula, bounded approximations, or analytic divergence series expressions of $f$-divergences.

Variation Network: Learning High-level Attributes for Controlled Input Manipulation

1 code implementation ICLR 2019 Gaëtan Hadjeres, Frank Nielsen

This paper presents the Variation Network (VarNet), a generative model providing means to manipulate the high-level attributes of a given input.

Vocal Bursts Intensity Prediction

The statistical Minkowski distances: Closed-form formula for Gaussian Mixture Models

no code implementations9 Jan 2019 Frank Nielsen

The traditional Minkowski distances are induced by the corresponding Minkowski norms in real-valued vector spaces.

On The Chain Rule Optimal Transport Distance

no code implementations19 Dec 2018 Frank Nielsen, Ke Sun

We experimentally evaluate our new family of distances by quantifying the upper bounds of several jointly convex distances between statistical mixtures, and by proposing a novel efficient method to learn Gaussian mixture models (GMMs) by simplifying kernel density estimators with respect to our distance.

Geometry and clustering with metrics derived from separable Bregman divergences

no code implementations25 Oct 2018 Erika Gomes-Gonçalves, Henryk Gzyl, Frank Nielsen

Separable Bregman divergences induce Riemannian metric spaces that are isometric to the Euclidean space after monotone embeddings.

Clustering Quantization

The Bregman chord divergence

no code implementations22 Oct 2018 Frank Nielsen, Richard Nock

Distances are fundamental primitives whose choice significantly impacts the performances of algorithms in machine learning and signal processing.

Sinkhorn AutoEncoders

2 code implementations ICLR 2019 Giorgio Patrini, Rianne van den Berg, Patrick Forré, Marcello Carioni, Samarth Bhargav, Max Welling, Tim Genewein, Frank Nielsen

We show that minimizing the p-Wasserstein distance between the generator and the true data distribution is equivalent to the unconstrained min-min optimization of the p-Wasserstein distance between the encoder aggregated posterior and the prior in latent space, plus a reconstruction error.

Probabilistic Programming

An elementary introduction to information geometry

no code implementations17 Aug 2018 Frank Nielsen

In this survey, we describe the fundamental differential-geometric structures of information manifolds, state the fundamental theorem of information geometry, and illustrate some use cases of these information manifolds in information sciences.

Guaranteed Deterministic Bounds on the Total Variation Distance between Univariate Mixtures

no code implementations29 Jun 2018 Frank Nielsen, Ke Sun

The total variation distance is a core statistical distance between probability measures that satisfies the metric axioms, with value always falling in $[0, 1]$.

Two-sample testing

q-Neurons: Neuron Activations based on Stochastic Jackson's Derivative Operators

1 code implementation1 Jun 2018 Frank Nielsen, Ke Sun

We propose a new generic type of stochastic neurons, called $q$-neurons, that considers activation functions based on Jackson's $q$-derivatives with stochastic parameters $q$.

Monte Carlo Information Geometry: The dually flat case

no code implementations20 Mar 2018 Frank Nielsen, Gaëtan Hadjeres

When equipping a statistical manifold with the KL divergence, the induced manifold structure is dually flat, and the KL divergence between distributions amounts to an equivalent Bregman divergence on their corresponding parameters.

Clustering

A generalization of the Jensen divergence: The chord gap divergence

no code implementations29 Sep 2017 Frank Nielsen

We introduce a novel family of distances, called the chord gap divergences, that generalizes the Jensen divergences (also called the Burbea-Rao distances), and study its properties.

Clustering

Interactive Music Generation with Positional Constraints using Anticipation-RNNs

no code implementations19 Sep 2017 Gaëtan Hadjeres, Frank Nielsen

We demonstrate its efficiency on the task of generating melodies satisfying positional constraints in the style of the soprano parts of the J. S.

Music Generation

Deep rank-based transposition-invariant distances on musical sequences

no code implementations3 Sep 2017 Gaëtan Hadjeres, Frank Nielsen

These musical sequences belong to a given corpus (or style) and it is obvious that a good distance on musical sequences should take this information into account; being able to define a distance ex nihilo which could be applicable to all music styles seems implausible.

Information Retrieval Sound

On $w$-mixtures: Finite convex combinations of prescribed component distributions

no code implementations2 Aug 2017 Frank Nielsen, Richard Nock

The information geometry induced by the Bregman generator set to the Shannon negentropy on this space yields a dually flat space called the mixture family manifold.

Relative Fisher Information and Natural Gradient for Learning Large Modular Models

no code implementations ICML 2017 Ke Sun, Frank Nielsen

Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks.

GLSR-VAE: Geodesic Latent Space Regularization for Variational AutoEncoder Architectures

no code implementations14 Jul 2017 Gaëtan Hadjeres, Frank Nielsen, François Pachet

VAEs (Variational AutoEncoders) have proved to be powerful in the context of density modeling and have been used in a variety of contexts for creative purposes.

Music Generation

Evolving a Vector Space with any Generating Set

no code implementations10 Apr 2017 Richard Nock, Frank Nielsen

In Valiant's model of evolution, a class of representations is evolvable iff a polynomial-time process of random mutations guided by selection converges with high probability to a representation as $\epsilon$-close as desired from the optimal one, for any required $\epsilon>0$.

Clustering in Hilbert simplex geometry

no code implementations3 Apr 2017 Frank Nielsen, Ke Sun

In the Hilbert simplex geometry, the distance is the non-separable Hilbert's metric distance which satisfies the property of information monotonicity with distance level set functions described by polytope boundaries.

Clustering

A review of two decades of correlations, hierarchies, networks and clustering in financial markets

no code implementations1 Mar 2017 Gautier Marti, Frank Nielsen, Mikołaj Bińkowski, Philippe Donnat

We review the state of the art of clustering financial time series and the study of their correlations alongside other interaction networks.

BIG-bench Machine Learning Clustering +3

On Hölder projective divergences

no code implementations14 Jan 2017 Frank Nielsen, Ke Sun, Stéphane Marchand-Maillet

We describe a framework to build distances by measuring the tightness of inequalities, and introduce the notion of proper statistical divergences and improper pseudo-divergences.

Clustering

A series of maximum entropy upper bounds of the differential entropy

no code implementations9 Dec 2016 Frank Nielsen, Richard Nock

We present a series of closed-form maximum entropy upper bounds for the differential entropy of a continuous univariate random variable and study the properties of that series.

BIG-bench Machine Learning

DeepBach: a Steerable Model for Bach Chorales Generation

5 code implementations ICML 2017 Gaëtan Hadjeres, François Pachet, Frank Nielsen

This paper introduces DeepBach, a graphical model aimed at modeling polyphonic music and specifically hymn-like pieces.

Exploring and measuring non-linear correlations: Copulas, Lightspeed Transportation and Clustering

1 code implementation30 Oct 2016 Gautier Marti, Sebastien Andler, Frank Nielsen, Philippe Donnat

We propose a methodology to explore and measure the pairwise correlations that exist between variables in a dataset.

Clustering

Large Margin Nearest Neighbor Classification using Curved Mahalanobis Distances

no code implementations22 Sep 2016 Frank Nielsen, Boris Muzellec, Richard Nock

We consider the supervised classification problem of machine learning in Cayley-Klein projective geometries: We show how to learn a curved Mahalanobis metric distance corresponding to either the hyperbolic geometry or the elliptic geometry using the Large Margin Nearest Neighbor (LMNN) framework.

BIG-bench Machine Learning Classification +1

Tsallis Regularized Optimal Transport and Ecological Inference

1 code implementation15 Sep 2016 Boris Muzellec, Richard Nock, Giorgio Patrini, Frank Nielsen

We also present the first application of optimal transport to the problem of ecological inference, that is, the reconstruction of joint distributions from their marginals, a problem of large interest in the social sciences.

Relative Natural Gradient for Learning Large Complex Models

no code implementations20 Jun 2016 Ke Sun, Frank Nielsen

Fisher information and natural gradient provided deep insights and powerful tools to artificial neural networks.

Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities

no code implementations19 Jun 2016 Frank Nielsen, Ke Sun

Information-theoretic measures such as the entropy, cross-entropy and the Kullback-Leibler divergence between two mixture models is a core primitive in many signal processing tasks.

Fast $(1+ε)$-approximation of the Löwner extremal matrices of high-dimensional symmetric matrices

no code implementations6 Apr 2016 Frank Nielsen, Richard Nock

Matrix data sets are common nowadays like in biomedical imaging where the Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) modality produces data sets of 3D symmetric positive definite matrices anchored at voxel positions capturing the anisotropic diffusion properties of water molecules in biological tissues.

Clustering

SSSC-AM: A Unified Framework for Video Co-Segmentation by Structured Sparse Subspace Clustering with Appearance and Motion Features

no code implementations14 Mar 2016 Junlin Yao, Frank Nielsen

State-of-the-art methods via subspace clustering seek to solve the problem in two steps: First, an affinity matrix is built from data, with appearance features or motion patterns.

Clustering Segmentation

Clustering Financial Time Series: How Long is Enough?

no code implementations13 Mar 2016 Gautier Marti, Sébastien Andler, Frank Nielsen, Philippe Donnat

Researchers have used from 30 days to several years of daily returns as source data for clustering financial time series based on their correlations.

Clustering Time Series +1

Loss factorization, weakly supervised learning and label noise robustness

no code implementations8 Feb 2016 Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni

We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss.

Generalization Bounds Weakly-supervised Learning

Image and Information

no code implementations3 Feb 2016 Frank Nielsen

But more precisely, what do we mean by information in images?

k-variates++: more pluses in the k-means++

no code implementations3 Feb 2016 Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen

For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds --- state of the art contenders appear to be significantly more complex and / or display less favorable (approximation) properties.

Clustering

Optimal Copula Transport for Clustering Multivariate Time Series

no code implementations27 Sep 2015 Gautier Marti, Frank Nielsen, Philippe Donnat

This paper presents a new methodology for clustering multivariate time series leveraging optimal transport between copulas.

Clustering Clustering Multivariate Time Series +2

Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

no code implementations23 Jun 2014 Frank Nielsen, Richard Nock

This novel heuristic can improve Hartigan's $k$-means when it has converged to a local minimum.

Clustering

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

no code implementations11 Mar 2014 Frank Nielsen, Richard Nock

We present a generic dynamic programming method to compute the optimal clustering of $n$ scalar elements into $k$ pairwise disjoint intervals.

Clustering Model Selection

Generalized Bhattacharyya and Chernoff upper bounds on Bayes error using quasi-arithmetic means

no code implementations20 Jan 2014 Frank Nielsen

When no cost incurs for correct classification and unit cost is charged for misclassification, Bayes' test reduces to the maximum a posteriori decision rule, and Bayes risk simplifies to Bayes' error, the probability of error.

General Classification

On the symmetrical Kullback-Leibler Jeffreys centroids

no code implementations29 Mar 2013 Frank Nielsen

Clustering histograms can be performed using the celebrated $k$-means centroid-based algorithm.

Clustering

On the Efficient Minimization of Classification Calibrated Surrogates

no code implementations NeurIPS 2008 Richard Nock, Frank Nielsen

Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classification calibration, ties up the minimization of the surrogates and classification risks, and left as an important problem the algorithmic questions about the minimization of these surrogates.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.