Search Results for author: Romain Couillet

Found 43 papers, 13 papers with code

Asymptotic Bayes risk of semi-supervised learning with uncertain labeling

no code implementations26 Mar 2024 Victor Leger, Romain Couillet

This article considers a semi-supervised classification setting on a Gaussian mixture model, where the data is not labeled strictly as usual, but instead with uncertain labels.

A Large Dimensional Analysis of Multi-task Semi-Supervised Learning

no code implementations21 Feb 2024 Victor Leger, Romain Couillet

This article conducts a large dimensional study of a simple yet quite versatile classification model, encompassing at once multi-task and semi-supervised learning, and taking into account uncertain labeling.

Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering

no code implementations19 Feb 2024 Hugo Lebeau, Florent Chatelain, Romain Couillet

The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now.

Clustering

A Random Matrix Approach to Low-Multilinear-Rank Tensor Approximation

no code implementations5 Feb 2024 Hugo Lebeau, Florent Chatelain, Romain Couillet

This work presents a comprehensive understanding of the estimation of a planted low-rank signal from a general spiked tensor model near the computational threshold.

Asymptotic Bayes risk of semi-supervised multitask learning on Gaussian mixture

1 code implementation3 Mar 2023 Minh-Toan Nguyen, Romain Couillet

In the supervised case, we derive a simple algorithm that attains the Bayes optimal performance.

When Random Tensors meet Random Matrices

no code implementations23 Dec 2021 Mohamed El Amine Seddik, Maxime Guillaud, Romain Couillet

Relying on random matrix theory (RMT), this paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise.

LEMMA

PCA-based Multi Task Learning: a Random Matrix Approach

no code implementations1 Nov 2021 Malik Tiomoko, Romain Couillet, Frédéric Pascal

The article proposes and theoretically analyses a \emph{computationally efficient} multi-task learning (MTL) extension of popular principal component analysis (PCA)-based supervised learning schemes \cite{barshan2011supervised, bair2006prediction}.

Multi-Task Learning

Multi-task learning on the edge: cost-efficiency and theoretical optimality

1 code implementation9 Oct 2021 Sami Fakhry, Romain Couillet, Malik Tiomoko

This article proposes a distributed multi-task learning (MTL) algorithm based on supervised principal component analysis (SPCA) which is: (i) theoretically optimal for Gaussian mixtures, (ii) computationally cheap and scalable.

Multi-Task Learning

Random matrices in service of ML footprint: ternary random features with no performance loss

2 code implementations ICLR 2022 Hafiz Tiomoko Ali, Zhenyu Liao, Romain Couillet

As a result, for any kernel matrix ${\bf K}$ of the form above, we propose a novel random features technique, called Ternary Random Feature (TRF), that (i) asymptotically yields the same limiting kernel as the original ${\bf K}$ in a spectral sense and (ii) can be computed and stored much more efficiently, by wisely tuning (in a data-dependent manner) the function $\sigma$ and the random vector ${\bf w}$, both taking values in $\{-1, 0, 1\}$.

Quantization

Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns

no code implementations6 Sep 2021 Cosme Louart, Romain Couillet

Given a random matrix $X= (x_1,\ldots, x_n)\in \mathcal M_{p, n}$ with independent columns and satisfying concentration of measure hypotheses and a parameter $z$ whose distance to the spectrum of $\frac{1}{n} XX^T$ should not depend on $p, n$, it was previously shown that the functionals $\text{tr}(AR(z))$, for $R(z) = (\frac{1}{n}XX^T- zI_p)^{-1}$ and $A\in \mathcal M_{p}$ deterministic, have a standard deviation of order $O(\|A\|_* / \sqrt n)$.

A Random Matrix Perspective on Random Tensors

no code implementations2 Aug 2021 José Henrique de Morais Goulart, Romain Couillet, Pierre Comon

A numerical verification provides evidence that the same holds for orders 4 and 5, leading us to conjecture that, for any order, our fixed-point equation is equivalent to the known characterization of the ML estimation performance that had been obtained by relying on spin glasses.

Community Detection

Nishimori meets Bethe: a spectral method for node classification in sparse weighted graphs

1 code implementation5 Mar 2021 Lorenzo Dall'Amico, Romain Couillet, Nicolas Tremblay

This article unveils a new relation between the Nishimori temperature parametrizing a distribution P and the Bethe free energy on random Erdos-Renyi graphs with edge weights distributed according to P. Estimating the Nishimori temperature being a task of major importance in Bayesian inference problems, as a practical corollary of this new relation, a numerical method is proposed to accurately estimate the Nishimori temperature from the eigenvalues of the Bethe Hessian matrix of the weighted graph.

Bayesian Inference General Classification +2

Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering

1 code implementation24 Feb 2021 Romain Couillet, Florent Chatelain, Nicolas Le Bihan

The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis.

Clustering

Concentration of measure and generalized product of random vectors with an application to Hanson-Wright-like inequalities

no code implementations16 Feb 2021 Cosme Louart, Romain Couillet

Starting from concentration of measure hypotheses on $m$ random vectors $Z_1,\ldots, Z_m$, this article provides an expression of the concentration of functionals $\phi(Z_1,\ldots, Z_m)$ where the variations of $\phi$ on each variable depend on the product of the norms (or semi-norms) of the other variables (as if $\phi$ were a product).

BIG-bench Machine Learning

Deciphering and Optimizing Multi-Task Learning: a Random Matrix Approach

no code implementations ICLR 2021 Malik Tiomoko, Hafiz Tiomoko Ali, Romain Couillet

This article provides theoretical insights into the inner workings of multi-task and transfer learning methods, by studying the tractable least-square support vector machine multi-task learning (LS-SVM MTL) method, in the limit of large ($p$) and numerous ($n$) data.

Multi-Task Learning

Word Representations Concentrate and This is Good News!

1 code implementation CONLL 2020 Romain Couillet, Yagmur Gizem Cinar, Eric Gaussier, Muhammad Imran

This article establishes that, unlike the legacy tf*idf representation, recent natural language representations (word embedding vectors) tend to exhibit a so-called \textit{concentration of measure phenomenon}, in the sense that, as the representation size $p$ and database size $n$ are both large, their behavior is similar to that of large dimensional Gaussian random vectors.

Sparse Quantized Spectral Clustering

no code implementations ICLR 2021 Zhenyu Liao, Romain Couillet, Michael W. Mahoney

Given a large data matrix, sparsifying, quantizing, and/or performing other entry-wise nonlinear operations can have numerous benefits, ranging from speeding up iterative algorithms for core numerical linear algebra problems to providing nonlinear filters to design state-of-the-art neural network models.

Clustering Quantization

Large Dimensional Analysis and Improvement of Multi Task Learning

no code implementations3 Sep 2020 Malik Tiomoko, Romain Couillet, Hafiz Tiomoko

Multi Task Learning (MTL) efficiently leverages useful information contained in multiple related tasks to help improve the generalization performance of all tasks.

Multi-Task Learning

A Concentration of Measure and Random Matrix Approach to Large Dimensional Robust Statistics

no code implementations17 Jun 2020 Cosme Louart, Romain Couillet

This article studies the \emph{robust covariance matrix estimation} of a data collection $X = (x_1,\ldots, x_n)$ with $x_i = \sqrt \tau_i z_i + m$, where $z_i \in \mathbb R^p$ is a \textit{concentrated vector} (e. g., an elliptical random vector), $m\in \mathbb R^p$ a deterministic signal and $\tau_i\in \mathbb R$ a scalar perturbation of possibly large amplitude, under the assumption where both $n$ and $p$ are large.

Consistent Semi-Supervised Graph Regularization for High Dimensional Data

no code implementations13 Jun 2020 Xiaoyi Mai, Romain Couillet

Semi-supervised Laplacian regularization, a standard graph-based approach for learning from both labelled and unlabelled data, was recently demonstrated to have an insignificant high dimensional learning efficiency with respect to unlabelled data (Mai and Couillet 2018), causing it to be outperformed by its unsupervised counterpart, spectral clustering, given sufficient unlabelled data.

Clustering Vocal Bursts Intensity Prediction

A Random Matrix Analysis of Random Fourier Features: Beyond the Gaussian Kernel, a Precise Phase Transition, and the Corresponding Double Descent

no code implementations NeurIPS 2020 Zhenyu Liao, Romain Couillet, Michael W. Mahoney

This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$, their dimension $p$, and the dimension of feature space $N$ are all large and comparable.

regression

A unified framework for spectral clustering in sparse graphs

1 code implementation20 Mar 2020 Lorenzo Dall'Amico, Romain Couillet, Nicolas Tremblay

This article considers spectral community detection in the regime of sparse networks with heterogeneous degree distributions, for which we devise an algorithm to efficiently retrieve communities.

Clustering Community Detection

Random Matrix Theory Proves that Deep Learning Representations of GAN-data Behave as Gaussian Mixtures

no code implementations ICML 2020 Mohamed El Amine Seddik, Cosme Louart, Mohamed Tamaazousti, Romain Couillet

This paper shows that deep learning (DL) representations of data produced by generative adversarial nets (GANs) are random vectors which fall within the class of so-called \textit{concentrated} random vectors.

Optimal Laplacian regularization for sparse spectral community detection

no code implementations3 Dec 2019 Lorenzo Dall'Amico, Romain Couillet, Nicolas Tremblay

Regularization of the classical Laplacian matrices was empirically shown to improve spectral clustering in sparse networks.

Clustering Community Detection

Inner-product Kernels are Asymptotically Equivalent to Binary Discrete Kernels

no code implementations15 Sep 2019 Zhenyu Liao, Romain Couillet

This article investigates the eigenspectrum of the inner product-type kernel matrix $\sqrt{p} \mathbf{K}=\{f( \mathbf{x}_i^{\sf T} \mathbf{x}_j/\sqrt{p})\}_{i, j=1}^n $ under a binary mixture model in the high dimensional regime where the number of data $n$ and their dimension $p$ are both large and comparable.

A Kernel Random Matrix-Based Approach for Sparse PCA

no code implementations ICLR 2019 Mohamed El Amine Seddik, Mohamed Tamaazousti, Romain Couillet

In this paper, we present a random matrix approach to recover sparse principal components from n p-dimensional vectors.

Random Matrix-Improved Estimation of the Wasserstein Distance between two Centered Gaussian Distributions

1 code implementation8 Mar 2019 Malik Tiomoko, Romain Couillet

This article proposes a method to consistently estimate functionals $\frac1p\sum_{i=1}^pf(\lambda_i(C_1C_2))$ of the eigenvalues of the product of two covariance matrices $C_1, C_2\in\mathbb{R}^{p\times p}$ based on the empirical estimates $\lambda_i(\hat C_1\hat C_2)$ ($\hat C_a=\frac1{n_a}\sum_{i=1}^{n_a} x_i^{(a)}x_i^{(a){{\sf T}}}$), when the size $p$ and number $n_a$ of the (zero mean) samples $x_i^{(a)}$ are similar.

Random Matrix Improved Covariance Estimation for a Large Class of Metrics

no code implementations7 Feb 2019 Malik Tiomoko, Florent Bouchard, Guillaume Ginholac, Romain Couillet

Relying on recent advances in statistical estimation of covariance distances based on random matrix theory, this article proposes an improved covariance and precision matrix estimation for a wide family of metrics.

BIG-bench Machine Learning

A Geometric Approach of Gradient Descent Algorithms in Linear Neural Networks

no code implementations8 Nov 2018 Yacine Chitour, Zhenyu Liao, Romain Couillet

We translate a well-known empirical observation of linear neural nets into a conjecture that we call the \emph{overfitting conjecture} which states that, for almost all training data and initial conditions, the trajectory of the corresponding gradient descent system converges to a global minimum.

Random matrix-improved estimation of covariance matrix distances

no code implementations10 Oct 2018 Romain Couillet, Malik Tiomoko, Steeve Zozor, Eric Moisan

Given two sets $x_1^{(1)},\ldots, x_{n_1}^{(1)}$ and $x_1^{(2)},\ldots, x_{n_2}^{(2)}\in\mathbb{R}^p$ (or $\mathbb{C}^p$) of random vectors with zero mean and positive definite covariance matrices $C_1$ and $C_2\in\mathbb{R}^{p\times p}$ (or $\mathbb{C}^{p\times p}$), respectively, this article provides novel estimators for a wide range of distances between $C_1$ and $C_2$ (along with divergences between some zero mean and covariance $C_1$ or $C_2$ probability measures) of the form $\frac1p\sum_{i=1}^n f(\lambda_i(C_1^{-1}C_2))$ (with $\lambda_i(X)$ the eigenvalues of matrix $X$).

Latent heterogeneous multilayer community detection

no code implementations16 Jun 2018 Hafiz Tiomoko Ali, Sijia Liu, Yasin Yilmaz, Romain Couillet, Indika Rajapakse, Alfred Hero

We propose a method for simultaneously detecting shared and unshared communities in heterogeneous multilayer weighted and undirected networks.

Community Detection

The Dynamics of Learning: A Random Matrix Approach

no code implementations ICML 2018 Zhenyu Liao, Romain Couillet

Understanding the learning dynamics of neural networks is one of the key issues for the improvement of optimization algorithms as well as for the theoretical comprehension of why deep neural nets work so well today.

Binary Classification General Classification

On the Spectrum of Random Features Maps of High Dimensional Data

1 code implementation ICML 2018 Zhenyu Liao, Romain Couillet

Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators.

BIG-bench Machine Learning Vocal Bursts Intensity Prediction

A random matrix analysis and improvement of semi-supervised learning for large dimensional data

no code implementations9 Nov 2017 Xiaoyi Mai, Romain Couillet

This article provides an original understanding of the behavior of a class of graph-oriented semi-supervised learning algorithms in the limit of large and numerous data.

General Classification

A Large Dimensional Study of Regularized Discriminant Analysis Classifiers

1 code implementation1 Nov 2017 Khalil Elkhalil, Abla Kammoun, Romain Couillet, Tareq Y. Al-Naffouri, Mohamed-Slim Alouini

This article carries out a large dimensional analysis of standard regularized discriminant analysis classifiers designed on the assumption that data arise from a Gaussian mixture model with different means and covariances.

valid

A Random Matrix Approach to Neural Networks

1 code implementation17 Feb 2017 Cosme Louart, Zhenyu Liao, Romain Couillet

This article studies the Gram random matrix model $G=\frac1T\Sigma^{\rm T}\Sigma$, $\Sigma=\sigma(WX)$, classically found in the analysis of random feature maps and random neural networks, where $X=[x_1,\ldots, x_T]\in{\mathbb R}^{p\times T}$ is a (data) matrix of bounded norm, $W\in{\mathbb R}^{n\times p}$ is a matrix of independent zero-mean unit variance entries, and $\sigma:{\mathbb R}\to{\mathbb R}$ is a Lipschitz continuous (activation) function --- $\sigma(WX)$ being understood entry-wise.

LEMMA

A Large Dimensional Analysis of Least Squares Support Vector Machines

1 code implementation11 Jan 2017 Zhenyu Liao, Romain Couillet

In this article, a large dimensional performance analysis of kernel least squares support vector machines (LS-SVMs) is provided under the assumption of a two-class Gaussian mixture model for the input data.

Spectral community detection in heterogeneous large networks

no code implementations3 Nov 2016 Hafiz Tiomoko Ali, Romain Couillet

The analysis of this equivalent spiked random matrix allows us to improve spectral methods for community detection and assess their performances in the regime under study.

Clustering Community Detection

Random matrices meet machine learning: a large dimensional analysis of LS-SVM

no code implementations7 Sep 2016 Zhenyu Liao, Romain Couillet

This article proposes a performance analysis of kernel least squares support vector machines (LS-SVMs) based on a random matrix approach, in the regime where both the dimension of data $p$ and their number $n$ grow large at the same rate.

BIG-bench Machine Learning

The Asymptotic Performance of Linear Echo State Neural Networks

no code implementations25 Mar 2016 Romain Couillet, Gilles Wainrib, Harry Sevi, Hafiz Tiomoko Ali

In this article, a study of the mean-square error (MSE) performance of linear echo-state neural networks is performed, both for training and testing tasks.

Large Dimensional Analysis of Robust M-Estimators of Covariance with Outliers

no code implementations4 Mar 2015 David Morales-Jimenez, Romain Couillet, Matthew R. McKay

A large dimensional characterization of robust M-estimators of covariance (or scatter) is provided under the assumption that the dataset comprises independent (essentially Gaussian) legitimate samples as well as arbitrary deterministic samples, referred to as outliers.

Cannot find the paper you are looking for? You can Submit a new open access paper.