1 code implementation • 25 Jul 2023 • Rahul Vashisht, Harish G. Ramaswamy
Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention.
1 code implementation • 30 Dec 2022 • Lakshmi Narayan Pandey, Rahul Vashisht, Harish G. Ramaswamy
In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network.
1 code implementation • 18 Oct 2022 • Harikrishna Narasimhan, Harish G. Ramaswamy, Shiv Kumar Tavker, Drona Khurana, Praneeth Netrapalli, Shivani Agarwal
We present consistent algorithms for multiclass learning with complex performance metrics and constraints, where the objective and constraints are defined by arbitrary functions of the confusion matrix.
no code implementations • 25 Nov 2021 • Umangi Jain, Harish G. Ramaswamy
Despite their massive success, training successful deep neural networks still largely relies on experimentally choosing an architecture, hyper-parameters, initialization, and training mechanism.
1 code implementation • 16 Dec 2020 • Depen Morwani, Rahul Vashisht, Harish G. Ramaswamy
Recent papers have shown that sufficiently overparameterized neural networks can perfectly fit even random labels.
1 code implementation • 24 Oct 2020 • Depen Morwani, Harish G. Ramaswamy
We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate.
no code implementations • ICML 2020 • Mingyuan Zhang, Harish G. Ramaswamy, Shivani Agarwal
In particular, the F-measure explicitly balances recall (fraction of active labels predicted to be active) and precision (fraction of labels predicted to be active that are actually so), both of which are important in evaluating the overall performance of a multi-label classifier.
3 code implementations • WACV 2020 • Saurabh Desai, Harish G. Ramaswamy
In response to recent criticism of gradient-based visualization techniques, we propose a new methodology to generate visual explanations for deep Convolutional Neural Networks (CNN) - based models.
no code implementations • NAACL 2019 • Siddhartha Arora, Mitesh M. Khapra, Harish G. Ramaswamy
In order to overcome this, we use standard simple models which do not capture all pairwise interactions, but learn to emulate certain characteristics of a complex teacher network.
no code implementations • NeurIPS 2018 • Anirban Laha, Saneem A. Chemmengath, Priyanka Agrawal, Mitesh M. Khapra, Karthik Sankaranarayanan, Harish G. Ramaswamy
Converting an n-dimensional vector to a probability distribution over n objects is a commonly used component in many machine learning tasks like multiclass classification, multilabel classification, attention mechanisms etc.
no code implementations • 8 Mar 2016 • Harish G. Ramaswamy, Clayton Scott, Ambuj Tewari
Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component.
no code implementations • 15 May 2015 • Harish G. Ramaswamy, Ambuj Tewari, Shivani Agarwal
We consider the problem of $n$-class classification ($n\geq 2$), where the classifier can choose to abstain from making predictions at a given cost, say, a factor $\alpha$ of the cost of misclassification.
no code implementations • 1 Jan 2015 • Harish G. Ramaswamy, Harikrishna Narasimhan, Shivani Agarwal
In this paper, we provide a unified framework for analysing a multi-class non-decomposable performance metric, where the problem of finding the optimal classifier for the performance metric is viewed as an optimization problem over the space of all confusion matrices achievable under the given distribution.
no code implementations • 12 Aug 2014 • Harish G. Ramaswamy, Shivani Agarwal
We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be calibrated with respect to a loss matrix in this setting.
no code implementations • NeurIPS 2013 • Harish G. Ramaswamy, Shivani Agarwal, Ambuj Tewari
The design of convex, calibrated surrogate losses, whose minimization entails consistency with respect to a desired target loss, is an important concept to have emerged in the theory of machine learning in recent years.
no code implementations • NeurIPS 2012 • Harish G. Ramaswamy, Shivani Agarwal
We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be classification calibrated with respect to a loss matrix in this setting.