Search Results for author: Aditya Krishna Menon

Found 35 papers, 13 papers with code

When in Doubt, Summon the Titans: Efficient Inference with Large Models

no code implementations19 Oct 2021 Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification

Training Over-parameterized Models with Non-decomposable Objectives

no code implementations NeurIPS 2021 Harikrishna Narasimhan, Aditya Krishna Menon

Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints.


Teacher's pet: understanding and mitigating biases in distillation

no code implementations19 Jun 2021 Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.

Knowledge Distillation

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

no code implementations12 May 2021 Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account.

Interval-censored Hawkes processes

no code implementations16 Apr 2021 Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne Dong, Aditya Krishna Menon, Lexing Xie

We propose the multi-impulse exogenous function when the exogenous events are observed as event time and the latent homogeneous Poisson process exogenous function when the exogenous events are presented as interval-censored volumes.

RankDistil: Knowledge Distillation for Ranking

no code implementations AISTATS 2021 Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.

Document Ranking Knowledge Distillation

Distilling Double Descent

no code implementations13 Feb 2021 Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset.

Overparameterisation and worst-case generalisation: friend or foe?

no code implementations ICLR 2021 Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Overparameterised neural networks have demonstrated the remarkable ability to perfectly fit training samples, while still generalising to unseen test samples.

Structured Prediction

Semantic Label Smoothing for Sequence to Sequence Problems

no code implementations EMNLP 2020 Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.

Machine Translation Translation

Long-tail learning via logit adjustment

2 code implementations ICLR 2021 Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples.

Long-tail Learning

Why distillation helps: a statistical perspective

no code implementations21 May 2020 Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques.

Knowledge Distillation

Can gradient clipping mitigate label noise?

1 code implementation ICLR 2020 Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum.

Doubly-stochastic mining for heterogeneous retrieval

no code implementations23 Apr 2020 Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.

Stochastic Optimization

Federated Learning with Only Positive Labels

no code implementations ICML 2020 Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class.

Federated Learning Multi-class Classification

Does label smoothing mitigate label noise?

no code implementations ICML 2020 Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.

Learning with noisy labels

Supervised Learning: No Loss No Cry

no code implementations ICML 2020 Richard Nock, Aditya Krishna Menon

In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses.

Online Hierarchical Clustering Approximations

no code implementations20 Sep 2019 Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar

The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice.

Noise-tolerant fair classification

1 code implementation NeurIPS 2019 Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, Nakul Verma

We finally show that our procedure is empirically effective on two case-studies involving sensitive feature censoring.

Classification Fairness +1

Fairness risk measures

1 code implementation24 Jan 2019 Robert C. Williamson, Aditya Krishna Menon

In this paper, we propose a new definition of fairness that generalises some existing proposals, while allowing for generic sensitive features and resulting in a convex objective.


Comparative Document Summarisation via Classification

1 code implementation6 Dec 2018 Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, Lexing Xie

This paper considers extractive summarisation in a comparative setting: given two or more document groups (e. g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups.

Classification General Classification +1

Complementary-Label Learning for Arbitrary Losses and Models

1 code implementation Proceedings of the 36th International Conference on Machine Learning, 2019 Takashi Ishida, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

In contrast to the standard classification paradigm where the true class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label, which only specifies one of the classes that the pattern does not belong to.

General Classification Image Classification

On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

1 code implementation ICLR 2019 Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM.

Monge blunts Bayes: Hardness Results for Adversarial Training

no code implementations8 Jun 2018 Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder

A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss.

Anomaly Detection using One-Class Neural Networks

4 code implementations18 Feb 2018 Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets.

Anomaly Detection

Revisiting revisits in trajectory recommendation

no code implementations17 Aug 2017 Aditya Krishna Menon, Dawei Chen, Lexing Xie, Cheng Soon Ong

Trajectory recommendation is the problem of recommending a sequence of places in a city for a tourist to visit.

f-GANs in an Information Geometric Nutshell

1 code implementation NeurIPS 2017 Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson

In this paper, we unveil a broad class of distributions for which such convergence happens --- namely, deformed exponential families, a wide superset of exponential families --- and show tight connections with the three other key GAN parameters: loss, game and architecture.

The cost of fairness in classification

no code implementations25 May 2017 Aditya Krishna Menon, Robert C. Williamson

We study the problem of learning classifiers with a fairness constraint, with three main contributions towards the goal of quantifying the problem's inherent tradeoffs.

Classification Fairness +1

Robust, Deep and Inductive Anomaly Detection

5 code implementations22 Apr 2017 Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

PCA is a classical statistical technique whose simplicity and maturity has seen it find widespread use as an anomaly detection technique.

Anomaly Detection

A scaled Bregman theorem with applications

no code implementations NeurIPS 2016 Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.

Learning from Binary Labels with Instance-Dependent Corruption

no code implementations3 May 2016 Aditya Krishna Menon, Brendan van Rooyen, Nagarajan Natarajan

Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise.

An Average Classification Algorithm

no code implementations4 Jun 2015 Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion.

Classification General Classification

Learning with Symmetric Label Noise: The Importance of Being Unhinged

1 code implementation NeurIPS 2015 Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

However, Long and Servedio [2010] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.