no code implementations • 3 Feb 2023 • Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar
The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns.
no code implementations • 30 Jan 2023 • Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar
Knowledge distillation has been widely-used to improve the performance of a "student" network by hoping to mimic soft probabilities of a "teacher" network.
no code implementations • 29 Jan 2023 • Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar
In this paper, we formally relate these problems, and show how they may be jointly solved.
no code implementations • 28 Jan 2023 • Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar
Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps.
no code implementations • 27 Jan 2023 • Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar
Our distillation approach is theoretically justified and applies to both dual encoder (DE) and cross-encoder (CE) models.
no code implementations • 28 Oct 2022 • Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar
Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?
no code implementations • 13 Jun 2022 • Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon
We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods.
no code implementations • 27 Apr 2022 • Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar
Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners.
no code implementations • 19 Oct 2021 • Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar
In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.
no code implementations • 29 Sep 2021 • Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar
In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.
no code implementations • 29 Sep 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Seungyeon Kim, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query.
no code implementations • NeurIPS 2021 • Harikrishna Narasimhan, Aditya Krishna Menon
Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints.
no code implementations • 19 Jun 2021 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.
no code implementations • 12 May 2021 • Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar
Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account.
no code implementations • 16 Apr 2021 • Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne Dong, Aditya Krishna Menon, Lexing Xie
We propose the multi-impulse exogenous function - for when the exogenous events are observed as event time - and the latent homogeneous Poisson process exogenous function - for when the exogenous events are presented as interval-censored volumes.
no code implementations • AISTATS 2021 • Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar
Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.
no code implementations • 13 Feb 2021 • Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou
Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset.
no code implementations • ICLR 2021 • Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar
Overparameterised neural networks have demonstrated the remarkable ability to perfectly fit training samples, while still generalising to unseen test samples.
no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar
Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Umanga Bista, Alexander Patrick Mathews, Aditya Krishna Menon, Lexing Xie
Most work on multi-document summarization has focused on generic summarization of information present in each individual document set.
3 code implementations • ICLR 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar
Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples.
Ranked #41 on
Long-tail Learning
on ImageNet-LT
no code implementations • 21 May 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar
In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques.
1 code implementation • ICLR 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar
Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum.
no code implementations • 23 Apr 2020 • Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar
Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.
1 code implementation • ICML 2020 • Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar
We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class.
no code implementations • ICML 2020 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar
Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.
Ranked #10 on
Learning with noisy labels
on CIFAR-10N-Random3
no code implementations • ICML 2020 • Richard Nock, Aditya Krishna Menon
In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses.
no code implementations • 20 Sep 2019 • Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar
The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice.
1 code implementation • NeurIPS 2019 • Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, Nakul Verma
We finally show that our procedure is empirically effective on two case-studies involving sensitive feature censoring.
1 code implementation • 24 Jan 2019 • Robert C. Williamson, Aditya Krishna Menon
In this paper, we propose a new definition of fairness that generalises some existing proposals, while allowing for generic sensitive features and resulting in a convex objective.
no code implementations • 18 Jan 2019 • Dawei Chen, Cheng Soon Ong, Aditya Krishna Menon
Playlist recommendation involves producing a set of songs that a user might enjoy.
1 code implementation • 6 Dec 2018 • Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, Lexing Xie
This paper considers extractive summarisation in a comparative setting: given two or more document groups (e. g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups.
1 code implementation • Proceedings of the 36th International Conference on Machine Learning, 2019 • Takashi Ishida, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama
In contrast to the standard classification paradigm where the true class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label, which only specifies one of the classes that the pattern does not belong to.
Ranked #21 on
Image Classification
on Kuzushiji-MNIST
1 code implementation • ICLR 2019 • Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama
In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM.
no code implementations • 8 Jun 2018 • Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder
A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss.
4 code implementations • 18 Feb 2018 • Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla
We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets.
no code implementations • 17 Aug 2017 • Aditya Krishna Menon, Dawei Chen, Lexing Xie, Cheng Soon Ong
Trajectory recommendation is the problem of recommending a sequence of places in a city for a tourist to visit.
1 code implementation • NeurIPS 2017 • Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson
In this paper, we unveil a broad class of distributions for which such convergence happens --- namely, deformed exponential families, a wide superset of exponential families --- and show tight connections with the three other key GAN parameters: loss, game and architecture.
no code implementations • 25 May 2017 • Aditya Krishna Menon, Robert C. Williamson
We study the problem of learning classifiers with a fairness constraint, with three main contributions towards the goal of quantifying the problem's inherent tradeoffs.
5 code implementations • 22 Apr 2017 • Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla
PCA is a classical statistical technique whose simplicity and maturity has seen it find widespread use as an anomaly detection technique.
no code implementations • NeurIPS 2016 • Richard Nock, Aditya Krishna Menon, Cheng Soon Ong
Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.
no code implementations • 3 May 2016 • Aditya Krishna Menon, Brendan van Rooyen, Nagarajan Natarajan
Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise.
no code implementations • 4 Jun 2015 • Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson
When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion.
1 code implementation • NeurIPS 2015 • Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson
However, Long and Servedio [2010] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing.
2 code implementations • Proceedings of the 24th International Conference on World Wide Web 2015 • Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, Lexing Xie
This paper proposes AutoRec, a novel autoencoder framework for collaborative filtering (CF).
Ranked #5 on
Recommendation Systems
on MovieLens 1M