Search Results for author: Hossein Mobahi

Found 22 papers, 5 papers with code

Neglected Hessian component explains mysteries in Sharpness regularization

no code implementations19 Jan 2024 Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

We find that regularizing feature exploitation but not feature exploration yields performance similar to gradient penalties.

On the Foundations of Shortcut Learning

no code implementations24 Oct 2023 Katherine L. Hermann, Hossein Mobahi, Thomas Fel, Michael C. Mozer

Deep-learning models can extract a rich assortment of features from data.

On student-teacher deviations in distillation: does it pay to disobey?

no code implementations NeurIPS 2023 Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network.

Knowledge Distillation

Sharpness-Aware Minimization Improves Language Model Generalization

no code implementations ACL 2022 Dara Bahri, Hossein Mobahi, Yi Tay

The allure of superhuman-level capabilities has led to considerable interest in language models like GPT-3 and T5, wherein the research has, by and large, revolved around new model architectures, training tasks, and loss objectives, along with substantial engineering efforts to scale up model capacity and dataset size.

Language Modelling Natural Questions

The Low-Rank Simplicity Bias in Deep Networks

1 code implementation18 Mar 2021 Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, Phillip Isola

We show empirically that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well.

Image Classification

Data Augmentation via Structured Adversarial Perturbations

no code implementations5 Nov 2020 Calvin Luo, Hossein Mobahi, Samy Bengio

The advantage of adversarial augmentation is that it replaces sampling with the use of a single, calculated perturbation that maximally increases the loss.

Data Augmentation

A Unifying View on Implicit Bias in Training Linear Neural Networks

no code implementations ICLR 2021 Chulhee Yun, Shankar Krishnan, Hossein Mobahi

For $L$-layer linear tensor networks that are orthogonally decomposable, we show that gradient flow on separable classification finds a stationary point of the $\ell_{2/L}$ max-margin problem in a "transformed" input space defined by the network.

Tensor Networks

Sharpness-Aware Minimization for Efficiently Improving Generalization

14 code implementations ICLR 2021 Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability.

 Ranked #1 on Image Classification on CIFAR-100 (using extra training data)

Fine-Grained Image Classification Learning with noisy labels

Self-Distillation Amplifies Regularization in Hilbert Space

no code implementations NeurIPS 2020 Hossein Mobahi, Mehrdad Farajtabar, Peter L. Bartlett

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another.

Knowledge Distillation L2 Regularization

A Closed-Form Learned Pooling for Deep Classification Networks

no code implementations10 Jun 2019 Vighnesh Birodkar, Hossein Mobahi, Dilip Krishnan, Samy Bengio

This operator can learn a strict super-set of what can be learned by average pooling or convolutions.

Classification Foveation +2

Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need

no code implementations29 Jan 2019 Vighnesh Birodkar, Hossein Mobahi, Samy Bengio

Large datasets have been crucial to the success of deep learning models in the recent years, which keep performing better as they are trained with more labelled data.

General Classification Image Classification +1

Predicting the Generalization Gap in Deep Networks with Margin Distributions

2 code implementations ICLR 2019 Yiding Jiang, Dilip Krishnan, Hossein Mobahi, Samy Bengio

In this paper, we propose such a measure, and conduct extensive empirical studies on how well it can predict the generalization gap.

Homotopy Analysis for Tensor PCA

no code implementations28 Oct 2016 Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi

For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the "high noise" regime.

A Theory of Local Matching: SIFT and Beyond

no code implementations19 Jan 2016 Hossein Mobahi, Stefano Soatto

Can it suggest new algorithms with reduced computational complexity or new descriptors with better accuracy for matching?

Training Recurrent Neural Networks by Diffusion

no code implementations16 Jan 2016 Hossein Mobahi

This work presents a new algorithm for training recurrent neural networks (although ideas are applicable to feedforward networks as well).

A Compositional Model for Low-Dimensional Image Set Representation

no code implementations CVPR 2014 Hossein Mobahi, Ce Liu, William T. Freeman

Learning a low-dimensional representation of images is useful for various applications in graphics and computer vision.

Cannot find the paper you are looking for? You can Submit a new open access paper.