Search Results for author: Harish G. Ramaswamy

Found 16 papers, 6 papers with code

On the Learning Dynamics of Attention Networks

1 code implementation25 Jul 2023 Rahul Vashisht, Harish G. Ramaswamy

Attention models are typically learned by optimizing one of three standard loss functions that are variously called -- soft attention, hard attention, and latent variable marginal likelihood (LVML) attention.

Hard Attention

On the Interpretability of Attention Networks

1 code implementation30 Dec 2022 Lakshmi Narayan Pandey, Rahul Vashisht, Harish G. Ramaswamy

In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network.

Image Captioning

Consistent Multiclass Algorithms for Complex Metrics and Constraints

1 code implementation18 Oct 2022 Harikrishna Narasimhan, Harish G. Ramaswamy, Shiv Kumar Tavker, Drona Khurana, Praneeth Netrapalli, Shivani Agarwal

We present consistent algorithms for multiclass learning with complex performance metrics and constraints, where the objective and constraints are defined by arbitrary functions of the confusion matrix.

Fairness

Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

no code implementations25 Nov 2021 Umangi Jain, Harish G. Ramaswamy

Despite their massive success, training successful deep neural networks still largely relies on experimentally choosing an architecture, hyper-parameters, initialization, and training mechanism.

Using noise resilience for ranking generalization of deep neural networks

1 code implementation16 Dec 2020 Depen Morwani, Rahul Vashisht, Harish G. Ramaswamy

Recent papers have shown that sufficiently overparameterized neural networks can perfectly fit even random labels.

Position

Inductive Bias of Gradient Descent for Weight Normalized Smooth Homogeneous Neural Nets

1 code implementation24 Oct 2020 Depen Morwani, Harish G. Ramaswamy

We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate.

Inductive Bias

Convex Calibrated Surrogates for the Multi-Label F-Measure

no code implementations ICML 2020 Mingyuan Zhang, Harish G. Ramaswamy, Shivani Agarwal

In particular, the F-measure explicitly balances recall (fraction of active labels predicted to be active) and precision (fraction of labels predicted to be active that are actually so), both of which are important in evaluating the overall performance of a multi-label classifier.

Multi-Label Classification

Ablation-CAM: Visual Explanations for Deep Convolutional Network via Gradient-free Localization

3 code implementations WACV 2020 Saurabh Desai, Harish G. Ramaswamy

In response to recent criticism of gradient-based visualization techniques, we propose a new methodology to generate visual explanations for deep Convolutional Neural Networks (CNN) - based models.

On Knowledge distillation from complex networks for response prediction

no code implementations NAACL 2019 Siddhartha Arora, Mitesh M. Khapra, Harish G. Ramaswamy

In order to overcome this, we use standard simple models which do not capture all pairwise interactions, but learn to emulate certain characteristics of a complex teacher network.

Knowledge Distillation Question Answering

On Controllable Sparse Alternatives to Softmax

no code implementations NeurIPS 2018 Anirban Laha, Saneem A. Chemmengath, Priyanka Agrawal, Mitesh M. Khapra, Karthik Sankaranarayanan, Harish G. Ramaswamy

Converting an n-dimensional vector to a probability distribution over n objects is a commonly used component in many machine learning tasks like multiclass classification, multilabel classification, attention mechanisms etc.

Abstractive Text Summarization Classification +3

Mixture Proportion Estimation via Kernel Embedding of Distributions

no code implementations8 Mar 2016 Harish G. Ramaswamy, Clayton Scott, Ambuj Tewari

Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component.

Anomaly Detection Weakly-supervised Learning

Consistent Algorithms for Multiclass Classification with a Reject Option

no code implementations15 May 2015 Harish G. Ramaswamy, Ambuj Tewari, Shivani Agarwal

We consider the problem of $n$-class classification ($n\geq 2$), where the classifier can choose to abstain from making predictions at a given cost, say, a factor $\alpha$ of the cost of misclassification.

Classification General Classification

Consistent Classification Algorithms for Multi-class Non-Decomposable Performance Metrics

no code implementations1 Jan 2015 Harish G. Ramaswamy, Harikrishna Narasimhan, Shivani Agarwal

In this paper, we provide a unified framework for analysing a multi-class non-decomposable performance metric, where the problem of finding the optimal classifier for the performance metric is viewed as an optimization problem over the space of all confusion matrices achievable under the given distribution.

Classification General Classification +2

Convex Calibration Dimension for Multiclass Loss Matrices

no code implementations12 Aug 2014 Harish G. Ramaswamy, Shivani Agarwal

We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be calibrated with respect to a loss matrix in this setting.

General Classification

Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses

no code implementations NeurIPS 2013 Harish G. Ramaswamy, Shivani Agarwal, Ambuj Tewari

The design of convex, calibrated surrogate losses, whose minimization entails consistency with respect to a desired target loss, is an important concept to have emerged in the theory of machine learning in recent years.

Classification Calibration Dimension for General Multiclass Losses

no code implementations NeurIPS 2012 Harish G. Ramaswamy, Shivani Agarwal

We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be classification calibrated with respect to a loss matrix in this setting.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.