Search Results for author: Aditya Krishna Menon

Found 51 papers, 14 papers with code

Language Model Cascades: Token-level uncertainty and beyond

no code implementations • 15 Apr 2024 • Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

While the principles underpinning cascading are well-studied for classification tasks - with deferral based on predicted class uncertainty favored theoretically and practically - a similar understanding is lacking for generative LM tasks.

Language Modelling

Paper
Add Code

Metric-aware LLM inference for regression and scoring

no code implementations • 7 Mar 2024 • Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

Large language models (LLMs) have demonstrated strong results on a range of NLP tasks.

regression

Paper
Add Code

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

no code implementations • 12 Oct 2023 • Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat, Aditya Krishna Menon, Afshin Rostamizadeh, Sanjiv Kumar, Jean-François Kagy, Rishabh Agarwal

Finally, in practical scenarios with models of varying sizes, first using distillation to boost the performance of the target model and then applying DistillSpec to train a well-aligned draft model can reduce decoding latency by 6-10x with minimal performance drop, compared to standard decoding without distillation.

Knowledge Distillation Language Modelling +1

Paper
Add Code

What do larger image classifiers memorise?

no code implementations • 9 Oct 2023 • Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels.

Image Classification Knowledge Distillation +2

Paper
Add Code

Think before you speak: Training Language Models With Pause Tokens

no code implementations • 3 Oct 2023 • Sachin Goyal, Ziwei Ji, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar, Vaishnavh Nagarajan

Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is an outcome of manipulating $K$ hidden vectors per layer, one vector per preceding token.

GSM8K Question Answering

Paper
Add Code

The importance of feature preprocessing for differentially private linear optimization

no code implementations • 19 Jul 2023 • Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon

Training machine learning models with differential privacy (DP) has received increasing interest in recent years.

Image Classification

Paper
Add Code

When Does Confidence-Based Cascade Deferral Suffice?

no code implementations • NeurIPS 2023 • Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar

Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn.

Paper
Add Code

On student-teacher deviations in distillation: does it pay to disobey?

no code implementations • NeurIPS 2023 • Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network.

Knowledge Distillation

Paper
Add Code

Plugin estimators for selective classification with out-of-distribution detection

no code implementations • 29 Jan 2023 • Harikrishna Narasimhan, Aditya Krishna Menon, Wittawat Jitkrittum, Sanjiv Kumar

Recent work on selective classification with OOD detection (SCOD) has argued for the unified study of these problems; however, the formal underpinnings of this problem are still nascent, and existing techniques are heuristic in nature.

Out-of-Distribution Detection Out of Distribution (OOD) Detection

Paper
Add Code

Supervision Complexity and its Role in Knowledge Distillation

no code implementations • 28 Jan 2023 • Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar

Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps.

Image Classification Knowledge Distillation

Paper
Add Code

EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

no code implementations • 27 Jan 2023 • Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar

Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR).

Information Retrieval Knowledge Distillation +2

Paper
Add Code

When does mixup promote local linearity in learned representations?

no code implementations • 28 Oct 2022 • Arslan Chaudhry, Aditya Krishna Menon, Andreas Veit, Sadeep Jayasumana, Srikumar Ramalingam, Sanjiv Kumar

Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?

Representation Learning

Paper
Add Code

Robust Distillation for Worst-class Performance

no code implementations • 13 Jun 2022 • Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon

We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods.

Knowledge Distillation

Paper
Add Code

ELM: Embedding and Logit Margins for Long-Tail Learning

no code implementations • 27 Apr 2022 • Wittawat Jitkrittum, Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners.

Contrastive Learning Long-tail Learning

Paper
Add Code

When in Doubt, Summon the Titans: Efficient Inference with Large Models

no code implementations • 19 Oct 2021 • Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

In a nutshell, we use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples; for the "hard" examples, we fall-back to the teacher.

Image Classification

Paper
Add Code

In defense of dual-encoders for neural ranking

no code implementations • 29 Sep 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Seungyeon Kim, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Transformer-based models such as BERT have proven successful in information retrieval problem, which seek to identify relevant documents for a given query.

Information Retrieval Natural Questions +1

Paper
Add Code

When in Doubt, Summon the Titans: A Framework for Efficient Inference with Large Models

no code implementations • 29 Sep 2021 • Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Amr Ahmed, Sanjiv Kumar

Image Classification

Paper
Add Code

Training Over-parameterized Models with Non-decomposable Objectives

no code implementations • NeurIPS 2021 • Harikrishna Narasimhan, Aditya Krishna Menon

Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints.

Fairness

Paper
Add Code

Teacher's pet: understanding and mitigating biases in distillation

no code implementations • 19 Jun 2021 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model.

Image Classification Knowledge Distillation

Paper
Add Code

Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

no code implementations • 12 May 2021 • Ankit Singh Rawat, Aditya Krishna Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix X. Yu, Sashank Reddi, Sanjiv Kumar

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account.

Retrieval

Paper
Add Code

Interval-censored Hawkes processes

no code implementations • 16 Apr 2021 • Marian-Andrei Rizoiu, Alexander Soen, Shidi Li, Pio Calderon, Leanne Dong, Aditya Krishna Menon, Lexing Xie

We propose the multi-impulse exogenous function - for when the exogenous events are observed as event time - and the latent homogeneous Poisson process exogenous function - for when the exogenous events are presented as interval-censored volumes.

Point Processes

Paper
Add Code

RankDistil: Knowledge Distillation for Ranking

no code implementations • AISTATS 2021 • Sashank J. Reddi, Rama Kumar Pasumarthi, Aditya Krishna Menon, Ankit Singh Rawat Felix Yu, Seungyeon Kim, Andreas Veit, Sanjiv Kumar

Knowledge distillation is an approach to improve the performance of a student model by using the knowledge of a complex teacher. Despite its success in several deep learning applications, the study of distillation is mostly confined to classification settings.

Document Ranking Knowledge Distillation

Paper
Add Code

Distilling Double Descent

no code implementations • 13 Feb 2021 • Andrew Cotter, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sashank J. Reddi, Yichen Zhou

Distillation is the technique of training a "student" model based on examples that are labeled by a separate "teacher" model, which itself is trained on a labeled dataset.

Paper
Add Code

Overparameterisation and worst-case generalisation: friend or foe?

no code implementations • ICLR 2021 • Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Overparameterised neural networks have demonstrated the remarkable ability to perfectly fit training samples, while still generalising to unseen test samples.

Structured Prediction

Paper
Add Code

Semantic Label Smoothing for Sequence to Sequence Problems

no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.

Machine Translation Translation

Paper
Add Code

SupMMD: A Sentence Importance Model for Extractive Summarization using Maximum Mean Discrepancy

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Umanga Bista, Alexander Patrick Mathews, Aditya Krishna Menon, Lexing Xie

Most work on multi-document summarization has focused on generic summarization of information present in each individual document set.

Document Summarization Extractive Summarization +3

Paper
Code

Long-tail learning via logit adjustment

3 code implementations • ICLR 2021 • Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Real-world classification problems typically exhibit an imbalanced or long-tailed label distribution, wherein many labels are associated with only a few samples.

Ranked #49 on Long-tail Learning on ImageNet-LT

Long-tail Learning

32,815

Paper
Code

Why distillation helps: a statistical perspective

no code implementations • 21 May 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Seungyeon Kim, Sanjiv Kumar

In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques.

Knowledge Distillation Retrieval

Paper
Add Code

Can gradient clipping mitigate label noise?

1 code implementation • ICLR 2020 • Aditya Krishna Menon, Ankit Singh Rawat, Sashank J. Reddi, Sanjiv Kumar

Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum.

Paper
Code

Doubly-stochastic mining for heterogeneous retrieval

no code implementations • 23 Apr 2020 • Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.

Retrieval Stochastic Optimization

Paper
Add Code

Federated Learning with Only Positive Labels

1 code implementation • ICML 2020 • Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

We consider learning a multi-class classification model in the federated setting, where each user has access to the positive data associated with only a single class.

Federated Learning Multi-class Classification

Paper
Code

Does label smoothing mitigate label noise?

no code implementations • ICML 2020 • Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors.

Ranked #12 on Learning with noisy labels on CIFAR-10N-Random3

Learning with noisy labels

Paper
Add Code

Supervised Learning: No Loss No Cry

no code implementations • ICML 2020 • Richard Nock, Aditya Krishna Menon

In detail, we cast {\sc SLIsotron} as learning a loss from a family of composite square losses.

Paper
Add Code

Online Hierarchical Clustering Approximations

no code implementations • 20 Sep 2019 • Aditya Krishna Menon, Anand Rajagopalan, Baris Sumengen, Gui Citovsky, Qin Cao, Sanjiv Kumar

The second algorithm, OHAC, is an online counterpart to offline HAC, which is known to yield a 1/3-approximation to the MW revenue, and produce good quality clusters in practice.

Clustering

Paper
Add Code

Noise-tolerant fair classification

1 code implementation • NeurIPS 2019 • Alexandre Louis Lamy, Ziyuan Zhong, Aditya Krishna Menon, Nakul Verma

We finally show that our procedure is empirically effective on two case-studies involving sensitive feature censoring.

Classification Fairness +1

Paper
Code

Fairness risk measures

1 code implementation • 24 Jan 2019 • Robert C. Williamson, Aditya Krishna Menon

In this paper, we propose a new definition of fairness that generalises some existing proposals, while allowing for generic sensitive features and resulting in a convex objective.

Fairness

Paper
Code

Cold-start Playlist Recommendation with Multitask Learning

no code implementations • 18 Jan 2019 • Dawei Chen, Cheng Soon Ong, Aditya Krishna Menon

Playlist recommendation involves producing a set of songs that a user might enjoy.

Binary Classification General Classification

Paper
Add Code

Comparative Document Summarisation via Classification

1 code implementation • 6 Dec 2018 • Umanga Bista, Alexander Mathews, Minjeong Shin, Aditya Krishna Menon, Lexing Xie

This paper considers extractive summarisation in a comparative setting: given two or more document groups (e. g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups.

Binary Classification Classification +2

Paper
Code

Complementary-Label Learning for Arbitrary Losses and Models

1 code implementation • Proceedings of the 36th International Conference on Machine Learning, 2019 • Takashi Ishida, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

In contrast to the standard classification paradigm where the true class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label, which only specifies one of the classes that the pattern does not belong to.

Ranked #22 on Image Classification on Kuzushiji-MNIST

General Classification Image Classification

Paper
Code

On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

1 code implementation • ICLR 2019 • Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama

In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM.

Paper
Code

Monge blunts Bayes: Hardness Results for Adversarial Training

no code implementations • 8 Jun 2018 • Zac Cranko, Aditya Krishna Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder

A key feature of our result is that it holds for all proper losses, and for a popular subset of these, the optimisation of this central measure appears to be independent of the loss.

Paper
Add Code

Anomaly Detection using One-Class Neural Networks

4 code implementations • 18 Feb 2018 • Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

We propose a one-class neural network (OC-NN) model to detect anomalies in complex data sets.

Anomaly Detection

162

Paper
Code

Revisiting revisits in trajectory recommendation

no code implementations • 17 Aug 2017 • Aditya Krishna Menon, Dawei Chen, Lexing Xie, Cheng Soon Ong

Trajectory recommendation is the problem of recommending a sequence of places in a city for a tourist to visit.

Paper
Add Code

f-GANs in an Information Geometric Nutshell

1 code implementation • NeurIPS 2017 • Richard Nock, Zac Cranko, Aditya Krishna Menon, Lizhen Qu, Robert C. Williamson

In this paper, we unveil a broad class of distributions for which such convergence happens --- namely, deformed exponential families, a wide superset of exponential families --- and show tight connections with the three other key GAN parameters: loss, game and architecture.

Paper
Code

The cost of fairness in classification

no code implementations • 25 May 2017 • Aditya Krishna Menon, Robert C. Williamson

We study the problem of learning classifiers with a fairness constraint, with three main contributions towards the goal of quantifying the problem's inherent tradeoffs.

Classification Fairness +1

Paper
Add Code

Robust, Deep and Inductive Anomaly Detection

5 code implementations • 22 Apr 2017 • Raghavendra Chalapathy, Aditya Krishna Menon, Sanjay Chawla

PCA is a classical statistical technique whose simplicity and maturity has seen it find widespread use as an anomaly detection technique.

Anomaly Detection

162

Paper
Code

A scaled Bregman theorem with applications

no code implementations • NeurIPS 2016 • Richard Nock, Aditya Krishna Menon, Cheng Soon Ong

Experiments on each of these domains validate the analyses and suggest that the scaled Bregman theorem might be a worthy addition to the popular handful of Bregman divergence properties that have been pervasive in machine learning.

BIG-bench Machine Learning Clustering

Paper
Add Code

Learning from Binary Labels with Instance-Dependent Corruption

no code implementations • 3 May 2016 • Aditya Krishna Menon, Brendan van Rooyen, Nagarajan Natarajan

Suppose we have a sample of instances paired with binary labels corrupted by arbitrary instance- and label-dependent noise.

Paper
Add Code

An Average Classification Algorithm

no code implementations • 4 Jun 2015 • Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion.

Classification General Classification

Paper
Add Code

Learning with Symmetric Label Noise: The Importance of Being Unhinged

1 code implementation • NeurIPS 2015 • Brendan van Rooyen, Aditya Krishna Menon, Robert C. Williamson

However, Long and Servedio [2010] proved that under symmetric label noise (SLN), minimisation of any convex potential over a linear function class can result in classification performance equivalent to random guessing.

Binary Classification Classification +1

Paper
Code

AutoRec: Autoencoders Meet Collaborative Filtering

2 code implementations • Proceedings of the 24th International Conference on World Wide Web 2015 • Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, Lexing Xie

This paper proposes AutoRec, a novel autoencoder framework for collaborative filtering (CF).

Ranked #5 on Recommendation Systems on MovieLens 1M

Collaborative Filtering Recommendation Systems

135

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.