Search Results for author: Stéphan Clémençon

Found 29 papers, 7 papers with code

On Ranking-based Tests of Independence

1 code implementation12 Mar 2024 Myrto Limnios, Stéphan Clémençon

In this paper we develop a novel nonparametric framework to test the independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$, based on {\it Receiver Operating Characteristic} (ROC) analysis and bipartite ranking.

Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues

1 code implementation22 Mar 2023 Morgane Goibert, Clément Calauzènes, Ekhine Irurozki, Stéphan Clémençon

As the issue of robustness in AI systems becomes vital, statistical learning techniques that are reliable even in presence of partly contaminated data have to be developed.

Assessing Uncertainty in Similarity Scoring: Performance & Fairness in Face Recognition

no code implementations14 Nov 2022 Jean-Rémy Conti, Stéphan Clémençon

The ROC curve is the major tool for assessing not only the performance but also the fairness properties of a similarity scoring function.

Face Recognition Fairness +1

Mitigating Gender Bias in Face Recognition Using the von Mises-Fisher Mixture Model

1 code implementation24 Oct 2022 Jean-Rémy Conti, Nathan Noiry, Vincent Despiegel, Stéphane Gentric, Stéphan Clémençon

In spite of the high performance and reliability of deep learning algorithms in a wide range of everyday applications, many investigations tend to show that a lot of models exhibit biases, discriminating against specific subgroups of the population (e. g. gender, ethnicity).

Face Recognition Face Verification +1

Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications

no code implementations20 Jan 2022 Morgane Goibert, Stéphan Clémençon, Ekhine Irurozki, Pavlo Mozharovskyi

The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i. e. realizations of a random permutation $\Sigma$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say.

Novel Concepts

Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis

no code implementations21 Jun 2021 Guillaume Staerman, Pavlo Mozharovskyi, Stéphan Clémençon

Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e. g. inference, hypothesis testing).

Anomaly Detection

Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking

1 code implementation7 Apr 2021 Stéphan Clémençon, Myrto Limnios, Nicolas Vayatis

The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis.

Anomaly Detection Information Retrieval +2

Concentration bounds for the empirical angular measure with statistical learning applications

no code implementations7 Apr 2021 Stéphan Clémençon, Hamid Jalalzai, Stéphane Lhaut, Anne Sabourin, Johan Segers

The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins.

Binary Classification Unsupervised Anomaly Detection +1

Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

no code implementations12 Feb 2020 Robin Vogel, Mastane Achab, Stéphan Clémençon, Charles Tillier

We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution) but is still defined on the same measurable space as $P$ and dominates it.

Selection bias Transfer Learning

Weighted Empirical Risk Minimization: Transfer Learning based on Importance Sampling

no code implementations25 Sep 2019 Robin Vogel, Mastane Achab, Charles Tillier, Stéphan Clémençon

We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the \textit{test distribution}) but is still defined on the same measurable space as $P$ and dominates it.

Transfer Learning

A Multivariate Extreme Value Theory Approach to Anomaly Clustering and Visualization

1 code implementation17 Jul 2019 Maël Chiapino, Stéphan Clémençon, Vincent Feuillard, Anne Sabourin

In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector X = (X1,.

Clustering Graph Mining

On Tree-based Methods for Similarity Learning

1 code implementation21 Jun 2019 Stéphan Clémençon, Robin Vogel

In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods.

Empirical Risk Minimization under Random Censorship: Theory and Practice

no code implementations5 Jun 2019 Guillaume Ausset, Stéphan Clémençon, François Portier

As ignoring censorship in the risk computation may clearly lead to a severe underestimation of the target duration and jeopardize prediction, we propose to consider a plug-in estimate of the true risk based on a Kaplan-Meier estimator of the conditional survival function of the censorship $C$ given $X$, referred to as Kaplan-Meier risk, in order to perform empirical risk minimization.

A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization

no code implementations ICML 2018 Robin Vogel, Aurélien Bellet, Stéphan Clémençon

In this paper, similarity learning is investigated from the perspective of pairwise bipartite ranking, where the goal is to rank the elements of a database by decreasing order of the probability that they share the same label with some query data point, based on the similarity scores.

Metric Learning

SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk

no code implementations NeurIPS 2015 Guillaume Papa, Stéphan Clémençon, Aurélien Bellet

In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tuples (e. g., pairs or triplets) of observations, rather than over individual observations.

Clustering Metric Learning

Extending Gossip Algorithms to Distributed Estimation of U-Statistics

no code implementations NeurIPS 2015 Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

Efficient and robust algorithms for decentralized estimation in networks are essential to many distributed systems.

Sparsity in Multivariate Extremes with Applications to Anomaly Detection

no code implementations21 Jul 2015 Nicolas Goix, Anne Sabourin, Stéphan Clémençon

Capturing the dependence structure of multivariate extreme events is a major concern in many fields involving the management of risks stemming from multiple sources, e. g. portfolio monitoring, insurance, environmental risk management and anomaly detection.

Anomaly Detection Dimensionality Reduction +1

On Anomaly Ranking and Excess-Mass Curves

no code implementations5 Feb 2015 Nicolas Goix, Anne Sabourin, Stéphan Clémençon

Extensions to the multivariate setting are far from straightforward and it is precisely the main purpose of this paper to introduce a novel and convenient (functional) criterion for measuring the performance of a scoring function regarding the anomaly ranking task, referred to as the Excess-Mass curve (EM curve).

Anomaly Detection

Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

no code implementations12 Jan 2015 Stéphan Clémençon, Aurélien Bellet, Igor Colin

In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by $U$-statistics of degree $d\geq 1$, i. e. functionals of the training data with low variance that take the form of averages over $k$-tuples.

Clustering Metric Learning +1

Survey schemes for stochastic gradient descent with applications to M-estimation

no code implementations9 Jan 2015 Stéphan Clémençon, Patrice Bertail, Emilie Chautru, Guillaume Papa

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible.

Survey Sampling

Functional Bipartite Ranking: a Wavelet-Based Filtering Approach

no code implementations18 Dec 2013 Stéphan Clémençon, Marine Depecker

It is the main goal of this article to address the bipartite ranking issue from the perspective of functional data analysis (FDA).

An SIR Graph Growth Model for the Epidemics of Communicable Diseases

no code implementations9 Dec 2013 Charanpal Dhanjal, Stéphan Clémençon

It is the main purpose of this paper to introduce a graph-valued stochastic process in order to model the spread of a communicable infectious disease.

Learning Reputation in an Authorship Network

no code implementations25 Nov 2013 Charanpal Dhanjal, Stéphan Clémençon

The idea is to use Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) to perform topic modelling in order to find authors who have worked in a query field.

Topic Models

Efficient Eigen-updating for Spectral Graph Clustering

no code implementations7 Jan 2013 Charanpal Dhanjal, Romaric Gaudel, Stéphan Clémençon

Namely, the method promoted in this article can be viewed as an incremental eigenvalue solution for the spectral clustering method described by Ng.

Clustering Graph Clustering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.