Search Results for author: Hossein Mobahi

Found 22 papers, 5 papers with code

Neglected Hessian component explains mysteries in Sharpness regularization

no code implementations • 19 Jan 2024 • Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

We find that regularizing feature exploitation but not feature exploration yields performance similar to gradient penalties.

Paper
Add Code

On the Foundations of Shortcut Learning

no code implementations • 24 Oct 2023 • Katherine L. Hermann, Hossein Mobahi, Thomas Fel, Michael C. Mozer

Deep-learning models can extract a rich assortment of features from data.

Paper
Add Code

On student-teacher deviations in distillation: does it pay to disobey?

no code implementations • NeurIPS 2023 • Vaishnavh Nagarajan, Aditya Krishna Menon, Srinadh Bhojanapalli, Hossein Mobahi, Sanjiv Kumar

Knowledge distillation (KD) has been widely used to improve the test accuracy of a "student" network, by training it to mimic the soft probabilities of a trained "teacher" network.

Knowledge Distillation

Paper
Add Code

Sharpness-Aware Minimization Improves Language Model Generalization

no code implementations • ACL 2022 • Dara Bahri, Hossein Mobahi, Yi Tay

The allure of superhuman-level capabilities has led to considerable interest in language models like GPT-3 and T5, wherein the research has, by and large, revolved around new model architectures, training tasks, and loss objectives, along with substantial engineering efforts to scale up model capacity and dataset size.

Language Modelling Natural Questions

Paper
Add Code

An Empirical Study of Pre-trained Models on Out-of-distribution Generalization

no code implementations • 29 Sep 2021 • Yaodong Yu, Heinrich Jiang, Dara Bahri, Hossein Mobahi, Seungyeon Kim, Ankit Singh Rawat, Andreas Veit, Yi Ma

Concretely, we show that larger models and larger datasets need to be simultaneously leveraged to improve OOD performance.

Out-of-Distribution Generalization

Paper
Add Code

The Low-Rank Simplicity Bias in Deep Networks

1 code implementation • 18 Mar 2021 • Minyoung Huh, Hossein Mobahi, Richard Zhang, Brian Cheung, Pulkit Agrawal, Phillip Isola

We show empirically that our claim holds true on finite width linear and non-linear models on practical learning paradigms and show that on natural data, these are often the solutions that generalize well.

Image Classification

Paper
Code

NeurIPS 2020 Competition: Predicting Generalization in Deep Learning

no code implementations • 14 Dec 2020 • Yiding Jiang, Pierre Foret, Scott Yak, Daniel M. Roy, Hossein Mobahi, Gintare Karolina Dziugaite, Samy Bengio, Suriya Gunasekar, Isabelle Guyon, Behnam Neyshabur

Understanding generalization in deep learning is arguably one of the most important questions in deep learning.

Decision Making Generalization Bounds

Paper
Add Code

Data Augmentation via Structured Adversarial Perturbations

no code implementations • 5 Nov 2020 • Calvin Luo, Hossein Mobahi, Samy Bengio

The advantage of adversarial augmentation is that it replaces sampling with the use of a single, calculated perturbation that maximally increases the loss.

Data Augmentation

Paper
Add Code

A Unifying View on Implicit Bias in Training Linear Neural Networks

no code implementations • ICLR 2021 • Chulhee Yun, Shankar Krishnan, Hossein Mobahi

For $L$-layer linear tensor networks that are orthogonally decomposable, we show that gradient flow on separable classification finds a stationary point of the $\ell_{2/L}$ max-margin problem in a "transformed" input space defined by the network.

Tensor Networks

Paper
Add Code

Sharpness-Aware Minimization for Efficiently Improving Generalization

14 code implementations • ICLR 2021 • Pierre Foret, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

In today's heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability.

Ranked #1 on Image Classification on CIFAR-100 (using extra training data)

Fine-Grained Image Classification Learning with noisy labels

1,642

Paper
Code

Self-Distillation Amplifies Regularization in Hilbert Space

no code implementations • NeurIPS 2020 • Hossein Mobahi, Mehrdad Farajtabar, Peter L. Bartlett

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another.

Knowledge Distillation L2 Regularization

Paper
Add Code

Fantastic Generalization Measures and Where to Find Them

3 code implementations • ICLR 2020 • Yiding Jiang, Behnam Neyshabur, Hossein Mobahi, Dilip Krishnan, Samy Bengio

We present the first large scale study of generalization in deep networks.

valid

Paper
Code

A Closed-Form Learned Pooling for Deep Classification Networks

no code implementations • 10 Jun 2019 • Vighnesh Birodkar, Hossein Mobahi, Dilip Krishnan, Samy Bengio

This operator can learn a strict super-set of what can be learned by average pooling or convolutions.

Classification Foveation +2

Paper
Add Code

Semantic Redundancies in Image-Classification Datasets: The 10% You Don't Need

no code implementations • 29 Jan 2019 • Vighnesh Birodkar, Hossein Mobahi, Samy Bengio

Large datasets have been crucial to the success of deep learning models in the recent years, which keep performing better as they are trained with more labelled data.

General Classification Image Classification +1

Paper
Add Code

Predicting the Generalization Gap in Deep Networks with Margin Distributions

2 code implementations • ICLR 2019 • Yiding Jiang, Dilip Krishnan, Hossein Mobahi, Samy Bengio

In this paper, we propose such a measure, and conduct extensive empirical studies on how well it can predict the generalization gap.

32,745

Paper
Code

Large Margin Deep Networks for Classification

2 code implementations • NeurIPS 2018 • Gamaleldin F. Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, Samy Bengio

We present a formulation of deep learning that aims at producing a large margin classifier.

Classification Data Augmentation +1

32,745

Paper
Code

Homotopy Analysis for Tensor PCA

no code implementations • 28 Oct 2016 • Anima Anandkumar, Yuan Deng, Rong Ge, Hossein Mobahi

For the challenging problem of tensor PCA, we prove global convergence of the homotopy method in the "high noise" regime.

Paper
Add Code

A Theory of Local Matching: SIFT and Beyond

no code implementations • 19 Jan 2016 • Hossein Mobahi, Stefano Soatto

Can it suggest new algorithms with reduced computational complexity or new descriptors with better accuracy for matching?

Paper
Add Code

Training Recurrent Neural Networks by Diffusion

no code implementations • 16 Jan 2016 • Hossein Mobahi

This work presents a new algorithm for training recurrent neural networks (although ideas are applicable to feedforward networks as well).

Paper
Add Code

Learning with a Wasserstein Loss

no code implementations • NeurIPS 2015 • Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, Tomaso Poggio

In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance.

Multi-Label Learning TAG

Paper
Add Code

The Aperture Problem for Refractive Motion

no code implementations • CVPR 2015 • Tianfan Xue, Hossein Mobahi, Fredo Durand, William T. Freeman

We pose and solve a generalization of the aperture problem for moving refractive elements.

Optical Flow Estimation

Paper
Add Code

A Compositional Model for Low-Dimensional Image Set Representation

no code implementations • CVPR 2014 • Hossein Mobahi, Ce Liu, William T. Freeman

Learning a low-dimensional representation of images is useful for various applications in graphics and computer vision.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.