Search Results for author: Richard Nock

Found 64 papers, 12 papers with code

On Modulating the Gradient for Meta-Learning

1 code implementation ECCV 2020 Christian Simon, Piotr Koniusz, Richard Nock, Mehrtash Harandi

Inspired by optimization techniques, we propose a novel meta-learning algorithm with gradient modulation to encourage fast-adaptation of neural networks in the absence of abundant data.

Meta-Learning

How to Boost Any Loss Function

no code implementations2 Jul 2024 Richard Nock, Yishay Mansour

Boosting is a highly successful ML-born optimization setting in which one is required to computationally efficiently learn arbitrarily good models based on the access to a weak learner oracle, providing classifiers performing at least slightly differently from random guessing.

Boosting gets full Attention for Relational Learning

no code implementations22 Feb 2024 Mathieu Guillame-Bert, Richard Nock

Second, what has been learned progresses back bottom-up via attention and aggregation mechanisms, progressively crafting new features that complete at the end the set of observation features over which a single tree is learned, boosting's iteration clock is incremented and new class residuals are computed.

Relational Reasoning

Robustness to Subpopulation Shift with Domain Label Noise via Regularized Annotation of Domains

no code implementations16 Feb 2024 Nathan Stromberg, Rohan Ayyagari, Monica Welfert, Sanmi Koyejo, Richard Nock, Lalitha Sankar

Existing methods for last layer retraining that aim to optimize worst-group accuracy (WGA) rely heavily on well-annotated groups in the training data.

Tempered Calculus for ML: Application to Hyperbolic Model Embedding

no code implementations6 Feb 2024 Richard Nock, Ehsan Amid, Frank Nielsen, Alexander Soen, Manfred K. Warmuth

Most mathematical distortions used in ML are fundamentally integral in nature: $f$-divergences, Bregman divergences, (regularized) optimal transport distances, integral probability metrics, geodesic distances, etc.

The Tempered Hilbert Simplex Distance and Its Application To Non-linear Embeddings of TEMs

no code implementations22 Nov 2023 Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

Tempered Exponential Measures (TEMs) are a parametric generalization of the exponential family of distributions maximizing the tempered entropy function among positive measures subject to a probability normalization of their power densities.

Optimal Transport with Tempered Exponential Measures

no code implementations7 Sep 2023 Ehsan Amid, Frank Nielsen, Richard Nock, Manfred K. Warmuth

In the field of optimal transport, two prominent subfields face each other: (i) unregularized optimal transport, "\`a-la-Kantorovich", which leads to extremely sparse plans but with algorithms that scale poorly, and (ii) entropic-regularized optimal transport, "\`a-la-Sinkhorn-Cuturi", which gets near-linear approximation algorithms but leads to maximally un-sparse plans.

Generative Forests

no code implementations7 Aug 2023 Richard Nock, Mathieu Guillame-Bert

We focus on generative AI for a type of data that still represent one of the most prevalent form of data: tabular data.

Density Estimation Imputation +2

Smoothly Giving up: Robustness for Simple Models

no code implementations17 Feb 2023 Tyler Sypherd, Nathan Stromberg, Richard Nock, Visar Berisha, Lalitha Sankar

There is a growing need for models that are interpretable and have reduced energy and computational cost (e. g., in health care analytics and federated learning).

Federated Learning regression

LegendreTron: Uprising Proper Multiclass Loss Learning

no code implementations27 Jan 2023 Kevin Lam, Christian Walder, Spiridon Penev, Richard Nock

Existing methods do this by fitting an inverse canonical link function which monotonically maps $\mathbb{R}$ to $[0, 1]$ to estimate probabilities for binary problems.

Clustering above Exponential Families with Tempered Exponential Measures

no code implementations4 Nov 2022 Ehsan Amid, Richard Nock, Manfred Warmuth

The link with exponential families has allowed $k$-means clustering to be generalized to a wide variety of data generating distributions in exponential families and clustering distortions among Bregman divergences.

Clustering

What killed the Convex Booster ?

no code implementations19 May 2022 Yishay Mansour, Richard Nock, Robert C. Williamson

A landmark negative result of Long and Servedio established a worst-case spectacular failure of a supervised learning trio (loss, algorithm, model) otherwise praised for its high precision machinery.

Fair Wrapping for Black-box Predictions

1 code implementation31 Jan 2022 Alexander Soen, Ibrahim Alabdulmohsin, Sanmi Koyejo, Yishay Mansour, Nyalleng Moorosi, Richard Nock, Ke Sun, Lexing Xie

We introduce a new family of techniques to post-process ("wrap") a black-box classifier in order to reduce its bias.

Fairness

Generative Trees: Adversarial and Copycat

no code implementations26 Jan 2022 Richard Nock, Mathieu Guillame-Bert

While Generative Adversarial Networks (GANs) achieve spectacular results on unstructured data like images, there is still a gap on tabular data, data for which state of the art supervised learning still favours to a large extent decision tree (DT)-based models.

Imputation

Manifold Learning Benefits GANs

no code implementations CVPR 2022 Yao Ni, Piotr Koniusz, Richard Hartley, Richard Nock

In our design, the manifold learning and coding steps are intertwined with layers of the discriminator, with the goal of attracting intermediate feature representations onto manifolds.

Denoising

Being Properly Improper

no code implementations18 Jun 2021 Tyler Sypherd, Richard Nock, Lalitha Sankar

Hence, optimizing a proper loss function on twisted data could perilously lead the learning algorithm towards the twisted posterior, rather than to the desired clean posterior.

Fair Densities via Boosting the Sufficient Statistics of Exponential Families

1 code implementation1 Dec 2020 Alexander Soen, Hisham Husain, Richard Nock

Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness.

Fairness

All your loss are belong to Bayes

1 code implementation NeurIPS 2020 Christian Walder, Richard Nock

Loss functions are a cornerstone of machine learning and the starting point of most algorithms.

Gaussian Processes

Cumulant-free closed-form formulas for some common (dis)similarities between densities of an exponential family

no code implementations5 Mar 2020 Frank Nielsen, Richard Nock

It is well-known that the Bhattacharyya, Hellinger, Kullback-Leibler, $\alpha$-divergences, and Jeffreys' divergences between densities belonging to a same exponential family have generic closed-form formulas relying on the strictly convex and real-analytic cumulant function characterizing the exponential family.

Generalised Lipschitz Regularisation Equals Distributional Robustness

no code implementations11 Feb 2020 Zac Cranko, Zhan Shi, Xinhua Zhang, Richard Nock, Simon Kornblith

The problem of adversarial examples has highlighted the need for a theory of regularisation that is general enough to apply to exotic function classes, such as universal approximators.

Supervised Learning: No Loss No Cry

no code implementations ICML 2020 Richard Nock, Aditya Krishna Menon

In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses.

Boosted and Differentially Private Ensembles of Decision Trees

no code implementations26 Jan 2020 Richard Nock, Wilko Henecka

To address this, we craft a new parametererized proper loss, called the M$\alpha$-loss, which, as we show, allows to finely tune the tradeoff in the complete spectrum of sensitivity vs boosting guarantees.

Disentangled behavioural representations

1 code implementation NeurIPS 2019 Amir Dezfouli, Hassan Ashtiani, Omar Ghattas, Richard Nock, Peter Dayan, Cheng Soon Ong

Individual characteristics in human decision-making are often quantified by fitting a parametric cognitive model to subjects' behavior and then studying differences between them in the associated parameter space.

Decision Making

A Primal-Dual link between GANs and Autoencoders

no code implementations NeurIPS 2019 Hisham Husain, Richard Nock, Robert C. Williamson

First, we find that the $f$-GAN and WAE objectives partake in a primal-dual relationship and are equivalent under some assumptions, which then allows us to explicate the success of WAE.

Generalization Bounds

Certifying Distributional Robustness using Lipschitz Regularisation

no code implementations25 Sep 2019 Zac Cranko, Zhan Shi, Xinhua Zhang, Simon Kornblith, Richard Nock

Distributional robust risk (DRR) minimisation has arisen as a flexible and effective framework for machine learning.

Proper-Composite Loss Functions in Arbitrary Dimensions

no code implementations19 Feb 2019 Zac Cranko, Robert C. Williamson, Richard Nock

The study of a machine learning problem is in many ways is difficult to separate from the study of the loss function being used.

Density Estimation scoring rule

Adversarial Networks and Autoencoders: The Primal-Dual Relationship and Generalization Bounds

no code implementations3 Feb 2019 Hisham Husain, Richard Nock, Robert C. Williamson

First, we find that the $f$-GAN and WAE objectives partake in a primal-dual relationship and are equivalent under some assumptions, which then allows us to explicate the success of WAE.

Generalization Bounds

New Tricks for Estimating Gradients of Expectations

no code implementations31 Jan 2019 Christian J. Walder, Paul Roussel, Richard Nock, Cheng Soon Ong, Masashi Sugiyama

We introduce a family of pairwise stochastic gradient estimators for gradients of expectations, which are related to the log-derivative trick, but involve pairwise interactions between samples.

Representation Learning of Compositional Data

2 code implementations NeurIPS 2018 Marta Avalos, Richard Nock, Cheng Soon Ong, Julien Rouar, Ke Sun

Our approach combines the benefits of the log-ratio transformation from compositional data analysis and exponential family PCA.

Representation Learning

The Bregman chord divergence

no code implementations22 Oct 2018 Frank Nielsen, Richard Nock

Distances are fundamental primitives whose choice significantly impacts the performances of algorithms in machine learning and signal processing.

Lipschitz Networks and Distributional Robustness

no code implementations4 Sep 2018 Zac Cranko, Simon Kornblith, Zhan Shi, Richard Nock

Robust risk minimisation has several advantages: it has been studied with regards to improving the generalisation properties of models and robustness to adversarial perturbation.

Hyperparameter Learning for Conditional Kernel Mean Embeddings with Rademacher Complexity Bounds

1 code implementation1 Sep 2018 Kelvin Hsu, Richard Nock, Fabio Ramos

Conditional kernel mean embeddings are nonparametric models that encode conditional expectations in a reproducing kernel Hilbert space.

Variational Network Inference: Strong and Stable with Concrete Support

no code implementations ICML 2018 Amir Dezfouli, Edwin Bonilla, Richard Nock

Traditional methods for the discovery of latent network structures are limited in two ways: they either assume that all the signal comes from the network (i. e. there is no source of signal outside the network) or they place constraints on the network parameters to ensure model or algorithmic stability.

Integral Privacy for Sampling

1 code implementation13 Jun 2018 Hisham Husain, Zac Cranko, Richard Nock

Privacy enforces an information theoretic barrier on approximation, and we show how to reach this barrier with guarantees on the approximation of the target non private density.

Density Estimation Fairness

Monge blunts Bayes: Hardness Results for Adversarial Training

no code implementations8 Jun 2018 Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder

A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss.

Boosted Density Estimation Remastered

no code implementations22 Mar 2018 Zac Cranko, Richard Nock

There has recently been a steady increase in the number iterative approaches to density estimation.

Density Estimation Generative Adversarial Network

Entity Resolution and Federated Learning get a Federated Resolution

no code implementations11 Mar 2018 Richard Nock, Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Giorgio Patrini, Guillaume Smith, Brian Thorne

In our experiments, we modify a simple token-based entity resolution algorithm so that it indeed aims at avoiding matching rows belonging to different classes, and perform experiments in the setting where entity resolution relies on noisy data, which is very relevant to real world domains.

Entity Resolution Federated Learning +1

Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption

no code implementations29 Nov 2017 Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, Brian Thorne

Our results bring a clear and strong support for federated learning: under reasonable assumptions on the number and magnitude of entity resolution's mistakes, it can be extremely beneficial to carry out federated learning in the setting where each peer's data provides a significant uplift to the other.

Entity Resolution Federated Learning +1

On $w$-mixtures: Finite convex combinations of prescribed component distributions

no code implementations2 Aug 2017 Frank Nielsen, Richard Nock

The information geometry induced by the Bregman generator set to the Shannon negentropy on this space yields a dually flat space called the mixture family manifold.

f-GANs in an Information Geometric Nutshell

1 code implementation NeurIPS 2017 Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson

In this paper, we unveil a broad class of distributions for which such convergence happens --- namely, deformed exponential families, a wide superset of exponential families --- and show tight connections with the three other key GAN parameters: loss, game and architecture.

Evolving a Vector Space with any Generating Set

no code implementations10 Apr 2017 Richard Nock, Frank Nielsen

In Valiant's model of evolution, a class of representations is evolvable iff a polynomial-time process of random mutations guided by selection converges with high probability to a representation as $\epsilon$-close as desired from the optimal one, for any required $\epsilon>0$.

Semi-parametric Network Structure Discovery Models

no code implementations27 Feb 2017 Amir Dezfouli, Edwin V. Bonilla, Richard Nock

We propose a network structure discovery model for continuous observations that generalizes linear causal models by incorporating a Gaussian process (GP) prior on a network-independent component, and random sparsity and weight matrices as the network-dependent parameters.

Uncertainty Quantification Variational Inference

A series of maximum entropy upper bounds of the differential entropy

no code implementations9 Dec 2016 Frank Nielsen, Richard Nock

We present a series of closed-form maximum entropy upper bounds for the differential entropy of a continuous univariate random variable and study the properties of that series.

BIG-bench Machine Learning

On Regularizing Rademacher Observation Losses

no code implementations NeurIPS 2016 Richard Nock

It has recently been shown that supervised learning linear classifiers with two of the most popular losses, the logistic and square loss, is equivalent to optimizing an equivalent loss over sufficient statistics about the class: Rademacher observations (rados).

Entity Resolution

Large Margin Nearest Neighbor Classification using Curved Mahalanobis Distances

no code implementations22 Sep 2016 Frank Nielsen, Boris Muzellec, Richard Nock

We consider the supervised classification problem of machine learning in Cayley-Klein projective geometries: We show how to learn a curved Mahalanobis metric distance corresponding to either the hyperbolic geometry or the elliptic geometry using the Large Margin Nearest Neighbor (LMNN) framework.

BIG-bench Machine Learning Classification +1

Tsallis Regularized Optimal Transport and Ecological Inference

1 code implementation15 Sep 2016 Boris Muzellec, Richard Nock, Giorgio Patrini, Frank Nielsen

We also present the first application of optimal transport to the problem of ecological inference, that is, the reconstruction of joint distributions from their marginals, a problem of large interest in the social sciences.

Making Deep Neural Networks Robust to Label Noise: a Loss Correction Approach

2 code implementations CVPR 2017 Giorgio Patrini, Alessandro Rozza, Aditya Menon, Richard Nock, Lizhen Qu

We present a theoretically grounded approach to train deep neural networks, including recurrent networks, subject to class-dependent label noise.

Ranked #2 on Image Classification on Clothing1M (using clean data) (using extra training data)

Diversity Learning with noisy labels +1

A scaled Bregman theorem with applications

no code implementations NeurIPS 2016 Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.

BIG-bench Machine Learning Clustering

Fast $(1+ε)$-approximation of the Löwner extremal matrices of high-dimensional symmetric matrices

no code implementations6 Apr 2016 Frank Nielsen, Richard Nock

Matrix data sets are common nowadays like in biomedical imaging where the Diffusion Tensor Magnetic Resonance Imaging (DT-MRI) modality produces data sets of 3D symmetric positive definite matrices anchored at voxel positions capturing the anisotropic diffusion properties of water molecules in biological tissues.

Clustering

Fast Learning from Distributed Datasets without Entity Matching

no code implementations13 Mar 2016 Giorgio Patrini, Richard Nock, Stephen Hardy, Tiberio Caetano

Our goal is to learn a classifier in the cross product space of the two domains, in the hard case in which no shared ID is available -- e. g. due to anonymization.

Entity Resolution

Loss factorization, weakly supervised learning and label noise robustness

no code implementations8 Feb 2016 Giorgio Patrini, Frank Nielsen, Richard Nock, Marcello Carioni

We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the loss.

Generalization Bounds Weakly-supervised Learning

k-variates++: more pluses in the k-means++

no code implementations3 Feb 2016 Richard Nock, Raphaël Canyasse, Roksana Boreli, Frank Nielsen

For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds --- state of the art contenders appear to be significantly more complex and / or display less favorable (approximation) properties.

Clustering

Learning Games and Rademacher Observations Losses

no code implementations16 Dec 2015 Richard Nock

We first show that this unexpected equivalence can actually be generalized to other example / rado losses, with necessary and sufficient conditions for the equivalence, exemplified on four losses that bear popular names in various fields: exponential (boosting), mean-variance (finance), Linear Hinge (on-line learning), ReLU (deep learning), and unhinged (statistics).

Rademacher Observations, Private Data, and Boosting

no code implementations9 Feb 2015 Richard Nock, Giorgio Patrini, Arik Friedman

We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework.

(Almost) No Label No Cry

no code implementations NeurIPS 2014 Giorgio Patrini, Richard Nock, Paul Rivera, Tiberio Caetano

In Learning with Label Proportions (LLP), the objective is to learn a supervised classifier when, instead of labels, only label proportions for bags of observations are known.

Generalization Bounds Privacy Preserving +1

Further heuristics for $k$-means: The merge-and-split heuristic and the $(k,l)$-means

no code implementations23 Jun 2014 Frank Nielsen, Richard Nock

This novel heuristic can improve Hartigan's $k$-means when it has converged to a local minimum.

Clustering

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

no code implementations11 Mar 2014 Frank Nielsen, Richard Nock

We present a generic dynamic programming method to compute the optimal clustering of $n$ scalar elements into $k$ pairwise disjoint intervals.

Clustering Model Selection

On the Efficient Minimization of Classification Calibrated Surrogates

no code implementations NeurIPS 2008 Richard Nock, Frank Nielsen

Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classification calibration, ties up the minimization of the surrogates and classification risks, and left as an important problem the algorithmic questions about the minimization of these surrogates.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.