1 code implementation • 12 Mar 2024 • Myrto Limnios, Stéphan Clémençon
In this paper we develop a novel nonparametric framework to test the independence of two random variables $\mathbf{X}$ and $\mathbf{Y}$ with unknown respective marginals $H(dx)$ and $G(dy)$ and joint distribution $F(dx dy)$, based on {\it Receiver Operating Characteristic} (ROC) analysis and bipartite ranking.
1 code implementation • 22 Mar 2023 • Morgane Goibert, Clément Calauzènes, Ekhine Irurozki, Stéphan Clémençon
As the issue of robustness in AI systems becomes vital, statistical learning techniques that are reliable even in presence of partly contaminated data have to be developed.
no code implementations • 14 Nov 2022 • Jean-Rémy Conti, Stéphan Clémençon
The ROC curve is the major tool for assessing not only the performance but also the fairness properties of a similarity scoring function.
1 code implementation • 24 Oct 2022 • Jean-Rémy Conti, Nathan Noiry, Vincent Despiegel, Stéphane Gentric, Stéphan Clémençon
In spite of the high performance and reliability of deep learning algorithms in a wide range of everyday applications, many investigations tend to show that a lot of models exhibit biases, discriminating against specific subgroups of the population (e. g. gender, ethnicity).
Ranked #1 on Face Verification on LFW
no code implementations • 20 Jan 2022 • Morgane Goibert, Stéphan Clémençon, Ekhine Irurozki, Pavlo Mozharovskyi
The concept of median/consensus has been widely investigated in order to provide a statistical summary of ranking data, i. e. realizations of a random permutation $\Sigma$ of a finite set, $\{1,\; \ldots,\; n\}$ with $n\geq 1$ say.
no code implementations • 20 Sep 2021 • Myrto Limnios, Nathan Noiry, Stéphan Clémençon
The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently.
no code implementations • 21 Jun 2021 • Guillaume Staerman, Pavlo Mozharovskyi, Stéphan Clémençon
Because it determines a center-outward ordering of observations in $\mathbb{R}^d$ with $d\geq 2$, the concept of statistical depth permits to define quantiles and ranks for multivariate data and use them for various statistical tasks (e. g. inference, hypothesis testing).
no code implementations • 7 Apr 2021 • Stéphan Clémençon, Hamid Jalalzai, Stéphane Lhaut, Anne Sabourin, Johan Segers
The angular measure on the unit sphere characterizes the first-order dependence structure of the components of a random vector in extreme regions and is defined in terms of standardized margins.
1 code implementation • 7 Apr 2021 • Stéphan Clémençon, Myrto Limnios, Nicolas Vayatis
The ROC curve is the gold standard for measuring the performance of a test/scoring statistic regarding its capacity to discriminate between two statistical populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis.
1 code implementation • 23 Mar 2021 • Guillaume Staerman, Pavlo Mozharovskyi, Pierre Colombo, Stéphan Clémençon, Florence d'Alché-Buc
a probability distribution or a data set.
no code implementations • 12 Feb 2020 • Robin Vogel, Mastane Achab, Stéphan Clémençon, Charles Tillier
We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the test distribution) but is still defined on the same measurable space as $P$ and dominates it.
no code implementations • 25 Sep 2019 • Robin Vogel, Mastane Achab, Charles Tillier, Stéphan Clémençon
We consider statistical learning problems, when the distribution $P'$ of the training observations $Z'_1,\; \ldots,\; Z'_n$ differs from the distribution $P$ involved in the risk one seeks to minimize (referred to as the \textit{test distribution}) but is still defined on the same measurable space as $P$ and dominates it.
1 code implementation • 17 Jul 2019 • Maël Chiapino, Stéphan Clémençon, Vincent Feuillard, Anne Sabourin
In a wide variety of situations, anomalies in the behaviour of a complex system, whose health is monitored through the observation of a random vector X = (X1,.
1 code implementation • 21 Jun 2019 • Stéphan Clémençon, Robin Vogel
In many situations, the choice of an adequate similarity measure or metric on the feature space dramatically determines the performance of machine learning methods.
no code implementations • 5 Jun 2019 • Guillaume Ausset, Stéphan Clémençon, François Portier
As ignoring censorship in the risk computation may clearly lead to a severe underestimation of the target duration and jeopardize prediction, we propose to consider a plug-in estimate of the true risk based on a Kaplan-Meier estimator of the conditional survival function of the censorship $C$ given $X$, referred to as Kaplan-Meier risk, in order to perform empirical risk minimization.
no code implementations • ICML 2018 • Robin Vogel, Aurélien Bellet, Stéphan Clémençon
In this paper, similarity learning is investigated from the perspective of pairwise bipartite ranking, where the goal is to rank the elements of a database by decreasing order of the probability that they share the same label with some query data point, based on the similarity scores.
no code implementations • 8 Jun 2016 • Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon
In decentralized networks (of sensors, connected objects, etc.
no code implementations • 31 Mar 2016 • Nicolas Goix, Anne Sabourin, Stéphan Clémençon
Extremes play a special role in Anomaly Detection.
no code implementations • NeurIPS 2015 • Guillaume Papa, Stéphan Clémençon, Aurélien Bellet
In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tuples (e. g., pairs or triplets) of observations, rather than over individual observations.
no code implementations • NeurIPS 2015 • Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon
Efficient and robust algorithms for decentralized estimation in networks are essential to many distributed systems.
no code implementations • 21 Jul 2015 • Nicolas Goix, Anne Sabourin, Stéphan Clémençon
Capturing the dependence structure of multivariate extreme events is a major concern in many fields involving the management of risks stemming from multiple sources, e. g. portfolio monitoring, insurance, environmental risk management and anomaly detection.
no code implementations • 5 Feb 2015 • Nicolas Goix, Anne Sabourin, Stéphan Clémençon
Extensions to the multivariate setting are far from straightforward and it is precisely the main purpose of this paper to introduce a novel and convenient (functional) criterion for measuring the performance of a scoring function regarding the anomaly ranking task, referred to as the Excess-Mass curve (EM curve).
no code implementations • 12 Jan 2015 • Stéphan Clémençon, Aurélien Bellet, Igor Colin
In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by $U$-statistics of degree $d\geq 1$, i. e. functionals of the training data with low variance that take the form of averages over $k$-tuples.
no code implementations • 9 Jan 2015 • Stéphan Clémençon, Patrice Bertail, Emilie Chautru, Guillaume Papa
In certain situations that shall be undoubtedly more and more common in the Big Data era, the datasets available are so massive that computing statistics over the full sample is hardly feasible, if not unfeasible.
no code implementations • 10 Jan 2014 • Charanpal Dhanjal, Romaric Gaudel, Stéphan Clémençon
It is the main goal of this paper to propose a novel method to perform matrix completion on-line.
no code implementations • 18 Dec 2013 • Stéphan Clémençon, Marine Depecker
It is the main goal of this article to address the bipartite ranking issue from the perspective of functional data analysis (FDA).
no code implementations • 9 Dec 2013 • Charanpal Dhanjal, Stéphan Clémençon
It is the main purpose of this paper to introduce a graph-valued stochastic process in order to model the spread of a communicable infectious disease.
no code implementations • 25 Nov 2013 • Charanpal Dhanjal, Stéphan Clémençon
The idea is to use Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) to perform topic modelling in order to find authors who have worked in a query field.
no code implementations • 7 Jan 2013 • Charanpal Dhanjal, Romaric Gaudel, Stéphan Clémençon
Namely, the method promoted in this article can be viewed as an incremental eigenvalue solution for the spectral clustering method described by Ng.