1 code implementation • ECCV 2020 • Christian Simon, Piotr Koniusz, Richard Nock, Mehrtash Harandi
Inspired by optimization techniques, we propose a novel meta-learning algorithm with gradient modulation to encourage fast-adaptation of neural networks in the absence of abundant data.
no code implementations • 2 Jul 2024 • Richard Nock, Yishay Mansour
Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing.
no code implementations • 13 Jun 2024 • Nathan Stromberg, Rohan Ayyagari, Sanmi Koyejo, Richard Nock, Lalitha Sankar
Last-layer retraining methods have emerged as an efficient framework for correcting existing base models.
no code implementations • 22 Feb 2024 • Mathieu Guillame-Bert, Richard Nock
Second, what has been learned progresses back bottom-up via attention and aggregation mechanisms, progressively crafting new features that complete at the end the set of observation features over which a single tree is learned, boosting's iteration clock is incremented and new class residuals are computed.
no code implementations • 16 Feb 2024 • Nathan Stromberg, Rohan Ayyagari, Monica Welfert, Sanmi Koyejo, Richard Nock, Lalitha Sankar
Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data.
no code implementations • 6 Feb 2024 • Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth
Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc.
no code implementations • 22 Nov 2023 • Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth
Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities.
no code implementations • 7 Sep 2023 • Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth
In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "\`a-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, "\`a-la-Sinkhorn-Cuturi", which gets near-linear approximation algorithms but leads to maximally un-sparse plans.
no code implementations • 7 Aug 2023 • Richard Nock, Mathieu Guillame-Bert
We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data.
no code implementations • 17 Feb 2023 • Tyler Sypherd, Nathan Stromberg, Richard Nock, Visar Berisha, Lalitha Sankar
There is a growing need for models that are interpretable and have reduced energy and computational cost (e. g., in health care analytics and federated learning).
no code implementations • 27 Jan 2023 • Kevin Lam, Christian Walder, Spiridon Penev, Richard Nock
Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0, 1]$ to estimate probabilities for binary problems.
no code implementations • 4 Nov 2022 • Ehsan Amid, Richard Nock, Manfred Warmuth
The link with exponential families has allowed $k$-means clustering to be generalized to a wide variety of data generating distributions in exponential families and clustering distortions among Bregman divergences.
no code implementations • 19 May 2022 • Yishay Mansour, Richard Nock, Robert C. Williamson
A landmark negative result of Long and Servedio established a worst-case spectacular failure of a supervised learning trio (loss, algorithm, model) otherwise praised for its high precision machinery.
1 code implementation • 31 Jan 2022 • Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie
We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias.
no code implementations • 26 Jan 2022 • Richard Nock, Mathieu Guillame-Bert
While Generative Adversarial Networks (GANs) achieve spectacular results on unstructured data like images, there is still a gap on tabular data, data for which state of the art supervised learning still favours to a large extent decision tree (DT)-based models.
no code implementations • CVPR 2022 • Yao Ni, Piotr Koniusz, Richard Hartley, Richard Nock
In our design, the manifold learning and coding steps are intertwined with layers of the discriminator, with the goal of attracting intermediate feature representations onto manifolds.
no code implementations • 18 Jun 2021 • Tyler Sypherd, Richard Nock, Lalitha Sankar
Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior.
1 code implementation • 1 Dec 2020 • Alexander Soen, Hisham Husain, Richard Nock
Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness.
1 code implementation • NeurIPS 2020 • Christian Walder, Richard Nock
Loss functions are a cornerstone of machine learning and the starting point of most algorithms.
no code implementations • 5 Mar 2020 • Frank Nielsen, Richard Nock
It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, $\alpha$-divergences, and Jeffreys' divergences between densities belonging to a same exponential family have generic closed-form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family.
no code implementations • 11 Feb 2020 • Zac Cranko, Zhan Shi, Xinhua Zhang, Richard Nock, Simon Kornblith
The problem of adversarial examples has highlighted the need for a theory of regularisation that is general enough to apply to exotic function classes, such as universal approximators.
no code implementations • ICML 2020 • Richard Nock, Aditya Krishna Menon
In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses.
no code implementations • 26 Jan 2020 • Richard Nock, Wilko Henecka
To address this, we craft a new parametererized proper loss, called the M$\alpha$-loss, which, as we show, allows to finely tune the tradeoff in the complete spectrum of sensitivity vs boosting guarantees.
9 code implementations • 10 Dec 2019 • Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Özgür, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao
FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.
1 code implementation • NeurIPS 2019 • Amir Dezfouli, Hassan Ashtiani, Omar Ghattas, Richard Nock, Peter Dayan, Cheng Soon Ong
Individual characteristics in human decision-making are often quantified by fitting a parametric cognitive model to subjects' behavior and then studying differences between them in the associated parameter space.
no code implementations • NeurIPS 2019 • Hisham Husain, Richard Nock, Robert C. Williamson
First, we find that the $f$-GAN and WAE objectives partake in a primal-dual relationship and are equivalent under some assumptions, which then allows us to explicate the success of WAE.
no code implementations • 25 Sep 2019 • Zac Cranko, Zhan Shi, Xinhua Zhang, Simon Kornblith, Richard Nock
Distributional robust risk (DRR) minimisation has arisen as a flexible and effective framework for machine learning.
no code implementations • 19 Feb 2019 • Zac Cranko, Robert C. Williamson, Richard Nock
The study of a machine learning problem is in many ways is difficult to separate from the study of the loss function being used.
no code implementations • 3 Feb 2019 • Hisham Husain, Richard Nock, Robert C. Williamson
First, we find that the $f$-GAN and WAE objectives partake in a primal-dual relationship and are equivalent under some assumptions, which then allows us to explicate the success of WAE.
no code implementations • 31 Jan 2019 • Christian J. Walder, Paul Roussel, Richard Nock, Cheng Soon Ong, Masashi Sugiyama
We introduce a family of pairwise stochastic gradient estimators for gradients of expectations, which are related to the log-derivative trick, but involve pairwise interactions between samples.
2 code implementations • NeurIPS 2018 • Marta Avalos, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun
Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA.
no code implementations • 22 Oct 2018 • Frank Nielsen, Richard Nock
Distances are fundamental primitives whose choice significantly impacts the performances of algorithms in machine learning and signal processing.
no code implementations • 4 Sep 2018 • Zac Cranko, Simon Kornblith, Zhan Shi, Richard Nock
Robust risk minimisation has several advantages: it has been studied with regards to improving the generalisation properties of models and robustness to adversarial perturbation.
1 code implementation • 1 Sep 2018 • Kelvin Hsu, Richard Nock, Fabio Ramos
Conditional kernel mean embeddings are nonparametric models that encode conditional expectations in a reproducing kernel Hilbert space.
no code implementations • 13 Aug 2018 • Qiongkai Xu, Juyan Zhang, Lizhen Qu, Lexing Xie, Richard Nock
In this paper, we investigate the diversity aspect of paraphrase generation.
no code implementations • ICML 2018 • Amir Dezfouli, Edwin Bonilla, Richard Nock
Traditional methods for the discovery of latent network structures are limited in two ways: they either assume that all the signal comes from the network (i. e. there is no source of signal outside the network) or they place constraints on the network parameters to ensure model or algorithmic stability.
no code implementations • 19 Jun 2018 • Leif W. Hanlen, Richard Nock, Hanna Suominen, Neil Bacon
Confidential text corpora exist in many forms, but do not allow arbitrary sharing.
1 code implementation • 13 Jun 2018 • Hisham Husain, Zac Cranko, Richard Nock
Privacy enforces an information theoretic barrier on approximation, and we show how to reach this barrier with guarantees on the approximation of the target non private density.
no code implementations • 8 Jun 2018 • Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder
A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss.
no code implementations • 22 Mar 2018 • Zac Cranko, Richard Nock
There has recently been a steady increase in the number iterative approaches to density estimation.
no code implementations • 11 Mar 2018 • Richard Nock, Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Giorgio Patrini, Guillaume Smith, Brian Thorne
In our experiments, we modify a simple token-based entity resolution algorithm so that it indeed aims at avoiding matching rows belonging to different classes, and perform experiments in the setting where entity resolution relies on noisy data, which is very relevant to real world domains.
no code implementations • 29 Nov 2017 • Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, Brian Thorne
Our results bring a clear and strong support for federated learning: under reasonable assumptions on the number and magnitude of entity resolution's mistakes, it can be extremely beneficial to carry out federated learning in the setting where each peer's data provides a significant uplift to the other.
no code implementations • 2 Aug 2017 • Frank Nielsen, Richard Nock
The information geometry induced by the Bregman generator set to the Shannon negentropy on this space yields a dually flat space called the mixture family manifold.
1 code implementation • NeurIPS 2017 • Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson
In this paper, we unveil a broad class of distributions for which such convergence happens --- namely, deformed exponential families, a wide superset of exponential families --- and show tight connections with the three other key GAN parameters: loss, game and architecture.
no code implementations • 10 Apr 2017 • Richard Nock, Frank Nielsen
In Valiant's model of evolution, a class of representations is evolvable iff a polynomial-time process of random mutations guided by selection converges with high probability to a representation as $\epsilon$-close as desired from the optimal one, for any required $\epsilon>0$.
no code implementations • 27 Feb 2017 • Amir Dezfouli, Edwin V. Bonilla, Richard Nock
We propose a network structure discovery model for continuous observations that generalizes linear causal models by incorporating a Gaussian process (GP) prior on a network-independent component, and random sparsity and weight matrices as the network-dependent parameters.
no code implementations • 16 Feb 2017 • Frank Nielsen, Richard Nock
Comparative convexity is a generalization of convexity relying on abstract notions of means.
no code implementations • 9 Dec 2016 • Frank Nielsen, Richard Nock
We present a series of closed-form maximum entropy upper bounds for the differential entropy of a continuous univariate random variable and study the properties of that series.
no code implementations • NeurIPS 2016 • Richard Nock
It has recently been shown that supervised learning linear classifiers with two of the most popular losses, the logistic and square loss, is equivalent to optimizing an equivalent loss over sufficient statistics about the class: Rademacher observations (rados).
no code implementations • 22 Sep 2016 • Frank Nielsen, Boris Muzellec, Richard Nock
We consider the supervised classification problem of machine learning in Cayley-Klein projective geometries: We show how to learn a curved Mahalanobis metric distance corresponding to either the hyperbolic geometry or the elliptic geometry using the Large Margin Nearest Neighbor (LMNN) framework.
1 code implementation • 15 Sep 2016 • Boris Muzellec, Richard Nock, Giorgio Patrini, Frank Nielsen
We also present the first application of optimal transport to the problem of ecological inference, that is, the reconstruction of joint distributions from their marginals, a problem of large interest in the social sciences.
2 code implementations • CVPR 2017 • Giorgio Patrini, Alessandro Rozza, Aditya Menon, Richard Nock, Lizhen Qu
We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise.
Ranked #2 on Image Classification on Clothing1M (using clean data) (using extra training data)
no code implementations • NeurIPS 2016 • Richard Nock, Aditya Krishna Menon, Cheng Soon Ong
Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.
no code implementations • 13 Jun 2016 • Richard Nock, Giorgio Patrini, Finnian Lattimore, Tiberio Caetano
It is usual to consider data protection and learnability as conflicting objectives.
no code implementations • 6 Apr 2016 • Frank Nielsen, Richard Nock
Matrix data sets are common nowadays like in biomedical imaging where the Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) modality produces data sets of 3D symmetric positive definite matrices anchored at voxel positions capturing the anisotropic diffusion properties of water molecules in biological tissues.
no code implementations • 13 Mar 2016 • Giorgio Patrini, Richard Nock, Stephen Hardy, Tiberio Caetano
Our goal is to learn a classifier in the cross product space of the two domains, in the hard case in which no shared ID is available -- e. g. due to anonymization.
no code implementations • 8 Feb 2016 • Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni
We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss.
no code implementations • 3 Feb 2016 • Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen
For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds --- state of the art contenders appear to be significantly more complex and / or display less favorable (approximation) properties.
no code implementations • 16 Dec 2015 • Richard Nock
We first show that this unexpected equivalence can actually be generalized to other example / rado losses, with necessary and sufficient conditions for the equivalence, exemplified on four losses that bear popular names in various fields: exponential (boosting), mean-variance (finance), Linear Hinge (on-line learning), ReLU (deep learning), and unhinged (statistics).
no code implementations • 9 Feb 2015 • Richard Nock, Giorgio Patrini, Arik Friedman
We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework.
no code implementations • NeurIPS 2014 • Giorgio Patrini, Richard Nock, Paul Rivera, Tiberio Caetano
In Learning with Label Proportions (LLP), the objective is to learn a supervised classifier when, instead of labels, only label proportions for bags of observations are known.
no code implementations • 23 Jun 2014 • Frank Nielsen, Richard Nock
This novel heuristic can improve Hartigan's $k$-means when it has converged to a local minimum.
no code implementations • 11 Mar 2014 • Frank Nielsen, Richard Nock
We present a generic dynamic programming method to compute the optimal clustering of $n$ scalar elements into $k$ pairwise disjoint intervals.
no code implementations • NeurIPS 2008 • Richard Nock, Frank Nielsen
Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classification calibration, ties up the minimization of the surrogates and classification risks, and left as an important problem the algorithmic questions about the minimization of these surrogates.