Search Results for author: Aditya Krishna Menon

Found 49 papers, 14 papers with code

Metric-aware LLM inference

no code implementations7 Mar 2024 Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

Large language models (LLMs) have demonstrated strong results on a range of NLP tasks.

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

no code implementations12 Oct 2023 Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft model can reduce decoding latency by 6-10x with minimal performance drop, compared to standard decoding without distillation.

Knowledge Distillation Language Modelling +1

What do larger image classifiers memorise?

no code implementations9 Oct 2023 Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels.

Image Classification Knowledge Distillation +2

Think before you speak: Training Language Models With Pause Tokens

no code implementations3 Oct 2023 Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token.

GSM8K Question Answering

The importance of feature preprocessing for differentially private linear optimization

no code implementations19 Jul 2023 Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon

Training machine learning models with differential privacy (DP) has received increasing interest in recent years.

Image Classification

When Does Confidence-Based Cascade Deferral Suffice?

no code implementations NeurIPS 2023 Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn.

Plugin estimators for selective classification with out-of-distribution detection

no code implementations29 Jan 2023 Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar

Recent work on selective classification with OOD detection (SCOD) has argued for the unified study of these problems; however, the formal underpinnings of this problem are still nascent, and existing techniques are heuristic in nature.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

When does mixup promote local linearity in learned representations?

no code implementations28 Oct 2022 Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?

Representation Learning

Robust Distillation for Worst-class Performance

no code implementations13 Jun 2022 Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon

We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods.

Knowledge Distillation

ELM: Embedding and Logit Margins for Long-Tail Learning

no code implementations27 Apr 2022 Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners.

Contrastive Learning Long-tail Learning

When in Doubt, Summon the Titans: Efficient Inference with Large Models

no code implementations19 Oct 2021 Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification

In defense of dual-encoders for neural ranking

no code implementations29 Sep 2021 Aditya Krishna Menon, Sadeep Jayasumana, Seungyeon Kim, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query.

Information Retrieval Natural Questions +1

When in Doubt, Summon the Titans: A Framework for Efficient Inference with Large Models

no code implementations29 Sep 2021 Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification

Training Over-parameterized Models with Non-decomposable Objectives

no code implementations NeurIPS 2021 Harikrishna Narasimhan, Aditya Krishna Menon

Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints.

Fairness

Teacher's pet: understanding and mitigating biases in distillation

no code implementations19 Jun 2021 Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.

Image Classification Knowledge Distillation

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

no code implementations12 May 2021 Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account.

Retrieval

Interval-censored Hawkes processes

no code implementations16 Apr 2021 Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne Dong, Aditya Krishna Menon, Lexing Xie

We propose the multi-impulse exogenous function - for when the exogenous events are observed as event time - and the latent homogeneous Poisson process exogenous function - for when the exogenous events are presented as interval-censored volumes.

Point Processes

RankDistil: Knowledge Distillation for Ranking

no code implementations AISTATS 2021 Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.

Document Ranking Knowledge Distillation

Distilling Double Descent

no code implementations13 Feb 2021 Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset.

Overparameterisation and worst-case generalisation: friend or foe?

no code implementations ICLR 2021 Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Overparameterised neural networks have demonstrated the remarkable ability to perfectly fit training samples, while still generalising to unseen test samples.

Structured Prediction

Semantic Label Smoothing for Sequence to Sequence Problems

no code implementations EMNLP 2020 Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.

Machine Translation Translation

Long-tail learning via logit adjustment

3 code implementations ICLR 2021 Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples.

Long-tail Learning

Why distillation helps: a statistical perspective

no code implementations21 May 2020 Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques.

Knowledge Distillation Retrieval

Can gradient clipping mitigate label noise?

1 code implementation ICLR 2020 Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum.

Doubly-stochastic mining for heterogeneous retrieval

no code implementations23 Apr 2020 Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.

Retrieval Stochastic Optimization

Federated Learning with Only Positive Labels

1 code implementation ICML 2020 Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class.

Federated Learning Multi-class Classification

Does label smoothing mitigate label noise?

no code implementations ICML 2020 Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.

Learning with noisy labels

Supervised Learning: No Loss No Cry

no code implementations ICML 2020 Richard Nock, Aditya Krishna Menon

In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses.

Online Hierarchical Clustering Approximations

no code implementations20 Sep 2019 Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar

The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice.

Clustering

Noise-tolerant fair classification

1 code implementation NeurIPS 2019 Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, Nakul Verma

We finally show that our procedure is empirically effective on two case-studies involving sensitive feature censoring.

Classification Fairness +1

Fairness risk measures

1 code implementation24 Jan 2019 Robert C. Williamson, Aditya Krishna Menon

In this paper, we propose a new definition of fairness that generalises some existing proposals, while allowing for generic sensitive features and resulting in a convex objective.

Fairness

Comparative Document Summarisation via Classification

1 code implementation6 Dec 2018 Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, Lexing Xie

This paper considers extractive summarisation in a comparative setting: given two or more document groups (e. g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups.

Binary Classification Classification +2

Complementary-Label Learning for Arbitrary Losses and Models

1 code implementation Proceedings of the 36th International Conference on Machine Learning, 2019 Takashi Ishida, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

In contrast to the standard classification paradigm where the true class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label, which only specifies one of the classes that the pattern does not belong to.

General Classification Image Classification

On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

1 code implementation ICLR 2019 Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM.

Monge blunts Bayes: Hardness Results for Adversarial Training

no code implementations8 Jun 2018 Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder

A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss.

Anomaly Detection using One-Class Neural Networks

4 code implementations18 Feb 2018 Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets.

Anomaly Detection

Revisiting revisits in trajectory recommendation

no code implementations17 Aug 2017 Aditya Krishna Menon, Dawei Chen, Lexing Xie, Cheng Soon Ong

Trajectory recommendation is the problem of recommending a sequence of places in a city for a tourist to visit.

f-GANs in an Information Geometric Nutshell

1 code implementation NeurIPS 2017 Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson

In this paper, we unveil a broad class of distributions for which such convergence happens --- namely, deformed exponential families, a wide superset of exponential families --- and show tight connections with the three other key GAN parameters: loss, game and architecture.

The cost of fairness in classification

no code implementations25 May 2017 Aditya Krishna Menon, Robert C. Williamson

We study the problem of learning classifiers with a fairness constraint, with three main contributions towards the goal of quantifying the problem's inherent tradeoffs.

Classification Fairness +1

Robust, Deep and Inductive Anomaly Detection

5 code implementations22 Apr 2017 Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

PCA is a classical statistical technique whose simplicity and maturity has seen it find widespread use as an anomaly detection technique.

Anomaly Detection

A scaled Bregman theorem with applications

no code implementations NeurIPS 2016 Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.

BIG-bench Machine Learning Clustering

Learning from Binary Labels with Instance-Dependent Corruption

no code implementations3 May 2016 Aditya Krishna Menon, Brendan van Rooyen, Nagarajan Natarajan

Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise.

An Average Classification Algorithm

no code implementations4 Jun 2015 Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion.

Classification General Classification

Learning with Symmetric Label Noise: The Importance of Being Unhinged

1 code implementation NeurIPS 2015 Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

However, Long and Servedio [2010] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing.

Binary Classification Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.