Search Results for author: Stéphan Clémençon

Found 29 papers, 7 papers with code

On Ranking-based Tests of Independence

1 code implementation • 12 Mar 2024 • Myrto Limnios, Stéphan Clémençon

In this paper we develop a novel nonparametric framework to test the independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$, based on {\it Receiver Operating Characteristic} (ROC) analysis and bipartite ranking.

Paper
Code

Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues

1 code implementation • 22 Mar 2023 • Morgane Goibert, Clément Calauzènes, Ekhine Irurozki, Stéphan Clémençon

As the issue of robustness in AI systems becomes vital, statistical learning techniques that are reliable even in presence of partly contaminated data have to be developed.

Paper
Code

Assessing Uncertainty in Similarity Scoring: Performance & Fairness in Face Recognition

no code implementations • 14 Nov 2022 • Jean-Rémy Conti, Stéphan Clémençon

The ROC curve is the major tool for assessing not only the performance but also the fairness properties of a similarity scoring function.

Face Recognition Fairness +1

Paper
Add Code

Mitigating Gender Bias in Face Recognition Using the von Mises-Fisher Mixture Model

1 code implementation • 24 Oct 2022 • Jean-Rémy Conti, Nathan Noiry, Vincent Despiegel, Stéphane Gentric, Stéphan Clémençon

In spite of the high performance and reliability of deep learning algorithms in a wide range of everyday applications, many investigations tend to show that a lot of models exhibit biases, discriminating against specific subgroups of the population (e. g. gender, ethnicity).

Ranked #1 on Face Verification on LFW

Face Recognition Face Verification +1

Paper
Code

Statistical Depth Functions for Ranking Distributions: Definitions, Statistical Learning and Applications

no code implementations • 20 Jan 2022 • Morgane Goibert, Stéphan Clémençon, Ekhine Irurozki, Pavlo Mozharovskyi

The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i. e. realizations of a random permutation $\Sigma$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say.

Novel Concepts

Paper
Add Code

Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics

no code implementations • 20 Sep 2021 • Myrto Limnios, Nathan Noiry, Stéphan Clémençon

The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently.

Binary Classification Fraud Detection +2

Paper
Add Code

Affine-Invariant Integrated Rank-Weighted Depth: Definition, Properties and Finite Sample Analysis

no code implementations • 21 Jun 2021 • Guillaume Staerman, Pavlo Mozharovskyi, Stéphan Clémençon

Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e. g. inference, hypothesis testing).

Anomaly Detection

Paper
Add Code

Concentration Inequalities for Two-Sample Rank Processes with Application to Bipartite Ranking

1 code implementation • 7 Apr 2021 • Stéphan Clémençon, Myrto Limnios, Nicolas Vayatis

The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis.

Anomaly Detection Information Retrieval +2

Paper
Code

Concentration bounds for the empirical angular measure with statistical learning applications

no code implementations • 7 Apr 2021 • Stéphan Clémençon, Hamid Jalalzai, Stéphane Lhaut, Anne Sabourin, Johan Segers

The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins.

Binary Classification Unsupervised Anomaly Detection +1

Paper
Add Code

A Pseudo-Metric between Probability Distributions based on Depth-Trimmed Regions

1 code implementation • 23 Mar 2021 • Guillaume Staerman, Pavlo Mozharovskyi, Pierre Colombo, Stéphan Clémençon, Florence d'Alché-Buc

a probability distribution or a data set.

Paper
Code

Weighted Empirical Risk Minimization: Sample Selection Bias Correction based on Importance Sampling

no code implementations • 12 Feb 2020 • Robin Vogel, Mastane Achab, Stéphan Clémençon, Charles Tillier

We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution) but is still defined on the same measurable space as $P$ and dominates it.

Selection bias Transfer Learning

Paper
Add Code

Weighted Empirical Risk Minimization: Transfer Learning based on Importance Sampling

no code implementations • 25 Sep 2019 • Robin Vogel, Mastane Achab, Charles Tillier, Stéphan Clémençon

We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the \textit{test distribution}) but is still defined on the same measurable space as $P$ and dominates it.

Transfer Learning

Paper
Add Code

A Multivariate Extreme Value Theory Approach to Anomaly Clustering and Visualization

1 code implementation • 17 Jul 2019 • Maël Chiapino, Stéphan Clémençon, Vincent Feuillard, Anne Sabourin

In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector X = (X1,.

Clustering Graph Mining

Paper
Code

On Tree-based Methods for Similarity Learning

1 code implementation • 21 Jun 2019 • Stéphan Clémençon, Robin Vogel

In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods.

Paper
Code

Empirical Risk Minimization under Random Censorship: Theory and Practice

no code implementations • 5 Jun 2019 • Guillaume Ausset, Stéphan Clémençon, François Portier

As ignoring censorship in the risk computation may clearly lead to a severe underestimation of the target duration and jeopardize prediction, we propose to consider a plug-in estimate of the true risk based on a Kaplan-Meier estimator of the conditional survival function of the censorship $C$ given $X$, referred to as Kaplan-Meier risk, in order to perform empirical risk minimization.

Paper
Add Code

A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization

no code implementations • ICML 2018 • Robin Vogel, Aurélien Bellet, Stéphan Clémençon

In this paper, similarity learning is investigated from the perspective of pairwise bipartite ranking, where the goal is to rank the elements of a database by decreasing order of the probability that they share the same label with some query data point, based on the similarity scores.

Metric Learning

Paper
Add Code

Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions

no code implementations • 8 Jun 2016 • Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

In decentralized networks (of sensors, connected objects, etc.

Metric Learning

Paper
Add Code

Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking

no code implementations • 31 Mar 2016 • Nicolas Goix, Anne Sabourin, Stéphan Clémençon

Extremes play a special role in Anomaly Detection.

Anomaly Detection Dimensionality Reduction

Paper
Add Code

SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk

no code implementations • NeurIPS 2015 • Guillaume Papa, Stéphan Clémençon, Aurélien Bellet

In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tuples (e. g., pairs or triplets) of observations, rather than over individual observations.

Clustering Metric Learning

Paper
Add Code

Extending Gossip Algorithms to Distributed Estimation of U-Statistics

no code implementations • NeurIPS 2015 • Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

Efficient and robust algorithms for decentralized estimation in networks are essential to many distributed systems.

Paper
Add Code

Sparsity in Multivariate Extremes with Applications to Anomaly Detection

no code implementations • 21 Jul 2015 • Nicolas Goix, Anne Sabourin, Stéphan Clémençon

Capturing the dependence structure of multivariate extreme events is a major concern in many fields involving the management of risks stemming from multiple sources, e. g. portfolio monitoring, insurance, environmental risk management and anomaly detection.

Anomaly Detection Dimensionality Reduction +1

Paper
Add Code

On Anomaly Ranking and Excess-Mass Curves

no code implementations • 5 Feb 2015 • Nicolas Goix, Anne Sabourin, Stéphan Clémençon

Extensions to the multivariate setting are far from straightforward and it is precisely the main purpose of this paper to introduce a novel and convenient (functional) criterion for measuring the performance of a scoring function regarding the anomaly ranking task, referred to as the Excess-Mass curve (EM curve).

Anomaly Detection

Paper
Add Code

Scaling-up Empirical Risk Minimization: Optimization of Incomplete U-statistics

no code implementations • 12 Jan 2015 • Stéphan Clémençon, Aurélien Bellet, Igor Colin

In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by $U$-statistics of degree $d\geq 1$, i. e. functionals of the training data with low variance that take the form of averages over $k$-tuples.

Clustering Metric Learning +1

Paper
Add Code

Survey schemes for stochastic gradient descent with applications to M-estimation

no code implementations • 9 Jan 2015 • Stéphan Clémençon, Patrice Bertail, Emilie Chautru, Guillaume Papa

In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible.

Survey Sampling

Paper
Add Code

Online Matrix Completion Through Nuclear Norm Regularisation

no code implementations • 10 Jan 2014 • Charanpal Dhanjal, Romaric Gaudel, Stéphan Clémençon

It is the main goal of this paper to propose a novel method to perform matrix completion on-line.

Matrix Completion Recommendation Systems

Paper
Add Code

Functional Bipartite Ranking: a Wavelet-Based Filtering Approach

no code implementations • 18 Dec 2013 • Stéphan Clémençon, Marine Depecker

It is the main goal of this article to address the bipartite ranking issue from the perspective of functional data analysis (FDA).

Paper
Add Code

An SIR Graph Growth Model for the Epidemics of Communicable Diseases

no code implementations • 9 Dec 2013 • Charanpal Dhanjal, Stéphan Clémençon

It is the main purpose of this paper to introduce a graph-valued stochastic process in order to model the spread of a communicable infectious disease.

Paper
Add Code

Learning Reputation in an Authorship Network

no code implementations • 25 Nov 2013 • Charanpal Dhanjal, Stéphan Clémençon

The idea is to use Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) to perform topic modelling in order to find authors who have worked in a query field.

Topic Models

Paper
Add Code

Efficient Eigen-updating for Spectral Graph Clustering

no code implementations • 7 Jan 2013 • Charanpal Dhanjal, Romaric Gaudel, Stéphan Clémençon

Namely, the method promoted in this article can be viewed as an incremental eigenvalue solution for the spectral clustering method described by Ng.

Clustering Graph Clustering +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.