Search Results for author: Mahdi Soltanolkotabi

Found 63 papers, 14 papers with code

Compressive sensing with un-trained neural networks: Gradient descent finds a smooth approximation

1 code implementation ICML 2020 Reinhard Heckel, Mahdi Soltanolkotabi

For signal recovery from a few measurements, however, un-trained convolutional networks have an intriguing self-regularizing property: Even though the network can perfectly fit any image, the network recovers a natural image from few measurements when trained with gradient descent until convergence.

Compressive Sensing Denoising

Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models

no code implementations26 Mar 2024 Mohammad Shahab Sepehri, Zalan Fabian, Mahdi Soltanolkotabi

The landscape of computational building blocks of efficient image restoration architectures is dominated by a combination of convolutional processing and various attention mechanisms.

Image Restoration

Gradient Descent Provably Solves Nonlinear Tomographic Reconstruction

no code implementations6 Oct 2023 Sara Fridovich-Keil, Fabrizio Valdivia, Gordon Wetzstein, Benjamin Recht, Mahdi Soltanolkotabi

We show that this approach reduces metal artifacts compared to a commercial reconstruction of a human skull with metal dental crowns.

Computed Tomography (CT)

Learning A Disentangling Representation For PU Learning

no code implementations5 Oct 2023 Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy

In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings.

Clustering Density Estimation +2

Adapt and Diffuse: Sample-adaptive Reconstruction via Latent Diffusion Models

no code implementations12 Sep 2023 Zalan Fabian, Berk Tınaz, Mahdi Soltanolkotabi

Our framework acts as a wrapper that can be combined with any latent diffusion-based baseline solver, imbuing it with sample-adaptivity and acceleration.

Computational Efficiency

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

no code implementations25 Jul 2023 Yue Niu, Zalan Fabian, Sunwoo Lee, Mahdi Soltanolkotabi, Salman Avestimehr

Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training.

Stochastic Optimization

Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks

no code implementations13 Jul 2023 Liam Collins, Hamed Hassani, Mahdi Soltanolkotabi, Aryan Mokhtari, Sanjay Shakkottai

An increasingly popular machine learning paradigm is to pretrain a neural network (NN) on many tasks offline, then adapt it to downstream tasks, often by re-training only the last linear layer of the network.

Binary Classification Multi-Task Learning +1

On the Role of Attention in Prompt-tuning

no code implementations6 Jun 2023 Samet Oymak, Ankit Singh Rawat, Mahdi Soltanolkotabi, Christos Thrampoulidis

Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting.

DiracDiffusion: Denoising and Incremental Reconstruction with Assured Data-Consistency

no code implementations25 Mar 2023 Zalan Fabian, Berk Tınaz, Mahdi Soltanolkotabi

In this work, we propose a novel framework for inverse problem solving, namely we assume that the observation comes from a stochastic degradation process that gradually degrades and noises the original clean image.

Denoising Image Restoration

Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing

no code implementations24 Mar 2023 Mahdi Soltanolkotabi, Dominik Stöger, Changzhi Xie

We show that in this setting, factorized gradient descent enjoys two implicit properties: (1) coupling of the trajectory of gradient descent where the factors are coupled in various ways throughout the gradient update trajectory and (2) an algorithmic regularization property where the iterates show a propensity towards low-rank models despite the overparameterized nature of the factorized model.

SMILE: Scaling Mixture-of-Experts with Efficient Bi-level Routing

no code implementations10 Dec 2022 Chaoyang He, Shuai Zheng, Aston Zhang, George Karypis, Trishul Chilimbi, Mahdi Soltanolkotabi, Salman Avestimehr

The mixture of Expert (MoE) parallelism is a recent advancement that scales up the model size with constant computational cost.

The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning

no code implementations18 Sep 2022 Romain Cosentino, Sarath Shekkizhar, Mahdi Soltanolkotabi, Salman Avestimehr, Antonio Ortega

Self-supervised learning (SSL) has emerged as a desirable paradigm in computer vision due to the inability of supervised models to learn representations that can generalize in domains with limited labels.

Data Augmentation Self-Supervised Learning +1

Neural Networks can Learn Representations with Gradient Descent

no code implementations30 Jun 2022 Alex Damian, Jason D. Lee, Mahdi Soltanolkotabi

Furthermore, in a transfer learning setup where the data distributions in the source and target domain share the same representation $U$ but have different polynomial heads we show that a popular heuristic for transfer learning has a target sample complexity independent of $d$.

Transfer Learning

Toward a Geometrical Understanding of Self-supervised Contrastive Learning

no code implementations13 May 2022 Romain Cosentino, Anirvan Sengupta, Salman Avestimehr, Mahdi Soltanolkotabi, Antonio Ortega, Ted Willke, Mariano Tepper

When used for transfer learning, the projector is discarded since empirical results show that its representation generalizes more poorly than the encoder's.

Contrastive Learning Data Augmentation +2

Learning from many trajectories

no code implementations31 Mar 2022 Stephen Tu, Roy Frostig, Mahdi Soltanolkotabi

Specifically, we establish that the worst-case error rate of this problem is $\Theta(n / m T)$ whenever $m \gtrsim n$.

Learning Theory

HUMUS-Net: Hybrid unrolled multi-scale network architecture for accelerated MRI reconstruction

2 code implementations15 Mar 2022 Zalan Fabian, Berk Tınaz, Mahdi Soltanolkotabi

These models split input images into non-overlapping patches, embed the patches into lower-dimensional tokens and utilize a self-attention mechanism that does not suffer from the aforementioned weaknesses of convolutional architectures.

 Ranked #1 on MRI Reconstruction on fastMRI Knee 8x (using extra training data)

Anatomy MRI Reconstruction

FedCV: A Federated Learning Framework for Diverse Computer Vision Tasks

1 code implementation22 Nov 2021 Chaoyang He, Alay Dilipbhai Shah, Zhenheng Tang, Di Fan1Adarshan Naiynar Sivashunmugam, Keerti Bhogaraju, Mita Shimpi, Li Shen, Xiaowen Chu, Mahdi Soltanolkotabi, Salman Avestimehr

To bridge the gap and facilitate the development of FL for computer vision tasks, in this work, we propose a federated learning library and benchmarking framework, named FedCV, to evaluate FL on the three most representative computer vision tasks: image classification, image segmentation, and object detection.

Benchmarking Federated Learning +5

SSFL: Tackling Label Deficiency in Federated Learning via Personalized Self-Supervision

no code implementations6 Oct 2021 Chaoyang He, Zhengyu Yang, Erum Mushtaq, Sunwoo Lee, Mahdi Soltanolkotabi, Salman Avestimehr

In this paper we propose self-supervised federated learning (SSFL), a unified self-supervised and personalized federated learning framework, and a series of algorithms under this framework which work towards addressing these challenges.

Personalized Federated Learning Self-Supervised Learning

SLIM-QN: A Stochastic, Light, Momentumized Quasi-Newton Optimizer for Deep Neural Networks

no code implementations29 Sep 2021 Yue Niu, Zalan Fabian, Sunwoo Lee, Mahdi Soltanolkotabi, Salman Avestimehr

SLIM-QN addresses two key barriers in existing second-order methods for large-scale DNNs: 1) the high computational cost of obtaining the Hessian matrix and its inverse in every iteration (e. g. KFAC); 2) convergence instability due to stochastic training (e. g. L-BFGS).

Second-order methods

Fundamental Limits of Transfer Learning in Binary Classifications

no code implementations29 Sep 2021 Mohammadreza Mousavi Kalan, Salman Avestimehr, Mahdi Soltanolkotabi

Transfer learning is gaining traction as a promising technique to alleviate this barrier by utilizing the data of a related but different \emph{source} task to compensate for the lack of data in a \emph{target} task where there are few labeled training data.

Action Recognition Binary Classification +2

Outlier-Robust Sparse Estimation via Non-Convex Optimization

1 code implementation23 Sep 2021 Yu Cheng, Ilias Diakonikolas, Rong Ge, Shivam Gupta, Daniel M. Kane, Mahdi Soltanolkotabi

We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA.

Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction

no code implementations NeurIPS 2021 Dominik Stöger, Mahdi Soltanolkotabi

Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models.

Generalization Guarantees for Neural Architecture Search with Train-Validation Split

no code implementations29 Apr 2021 Samet Oymak, Mingchen Li, Mahdi Soltanolkotabi

In this approach, it is common to use bilevel optimization where one optimizes the model weights over the training data (inner problem) and various hyperparameters such as the configuration of the architecture over the validation data (outer problem).

Bilevel Optimization Generalization Bounds +2

FedNLP: Benchmarking Federated Learning Methods for Natural Language Processing Tasks

1 code implementation Findings (NAACL) 2022 Bill Yuchen Lin, Chaoyang He, Zihang Zeng, Hulin Wang, Yufen Huang, Christophe Dupuy, Rahul Gupta, Mahdi Soltanolkotabi, Xiang Ren, Salman Avestimehr

Increasing concerns and regulations about data privacy and sparsity necessitate the study of privacy-preserving, decentralized learning methods for natural language processing (NLP) tasks.

Benchmarking Federated Learning +5

Understanding Overparameterization in Generative Adversarial Networks

no code implementations12 Apr 2021 Yogesh Balaji, Mohammadmahdi Sajedi, Neha Mukund Kalibhat, Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

We also empirically study the role of model overparameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets.

PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers

1 code implementation5 Feb 2021 Chaoyang He, Shen Li, Mahdi Soltanolkotabi, Salman Avestimehr

PipeTransformer automatically adjusts the pipelining and data parallelism by identifying and freezing some layers during the training, and instead allocates resources for training of the remaining active layers.

Understanding Over-parameterization in Generative Adversarial Networks

no code implementations ICLR 2021 Yogesh Balaji, Mohammadmahdi Sajedi, Neha Mukund Kalibhat, Mucong Ding, Dominik Stöger, Mahdi Soltanolkotabi, Soheil Feizi

In this work, we present a comprehensive analysis of the importance of model over-parameterization in GANs both theoretically and empirically.

Data augmentation for deep learning based accelerated MRI reconstruction

no code implementations1 Jan 2021 Zalan Fabian, Reinhard Heckel, Mahdi Soltanolkotabi

Inspired by the success of Data Augmentation (DA) for classification problems, in this paper, we propose a pipeline for data augmentation for image reconstruction tasks arising in medical imaging and explore its effectiveness at reducing the required training data in a variety of settings.

Data Augmentation Image Restoration +1

Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View

no code implementations NeurIPS 2020 Christos Thrampoulidis, Samet Oymak, Mahdi Soltanolkotabi

Our theoretical analysis allows us to precisely characterize how the test error varies over different training algorithms, data distributions, problem dimensions as well as number of classes, inter/intra class correlations and class priors.

Binary Classification Classification +2

Precise Statistical Analysis of Classification Accuracies for Adversarial Training

no code implementations21 Oct 2020 Adel Javanmard, Mahdi Soltanolkotabi

Despite the wide empirical success of modern machine learning algorithms and models in a multitude of applications, they are known to be highly susceptible to seemingly small indiscernible perturbations to the input data known as \emph{adversarial attacks}.

Binary Classification Classification +1

Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks

2 code implementations NeurIPS 2020 Seyed Mohammadreza Mousavi Kalan, Zalan Fabian, A. Salman Avestimehr, Mahdi Soltanolkotabi

In this approach a model trained for a source task, where plenty of labeled training data is available, is used as a starting point for training a model on a related target task with only few labeled training data.

Transfer Learning

Learning the model-free linear quadratic regulator via random search

no code implementations L4DC 2020 Hesameddin Mohammadi, Mihailo R. Jovanovic', Mahdi Soltanolkotabi

Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers.

Reinforcement Learning (RL)

Approximation Schemes for ReLU Regression

no code implementations26 May 2020 Ilias Diakonikolas, Surbhi Goel, Sushrut Karmalkar, Adam R. Klivans, Mahdi Soltanolkotabi

We consider the fundamental problem of ReLU regression, where the goal is to output the best fitting ReLU with respect to square loss given access to draws from some unknown distribution.

regression

Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation

1 code implementation7 May 2020 Reinhard Heckel, Mahdi Soltanolkotabi

For signal recovery from a few measurements, however, un-trained convolutional networks have an intriguing self-regularizing property: Even though the network can perfectly fit any image, the network recovers a natural image from few measurements when trained with gradient descent until convergence.

Compressive Sensing Denoising

High-Dimensional Robust Mean Estimation via Gradient Descent

no code implementations ICML 2020 Yu Cheng, Ilias Diakonikolas, Rong Ge, Mahdi Soltanolkotabi

We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers.

LEMMA Vocal Bursts Intensity Prediction

Precise Tradeoffs in Adversarial Training for Linear Regression

no code implementations24 Feb 2020 Adel Javanmard, Mahdi Soltanolkotabi, Hamed Hassani

Furthermore, we precisely characterize the standard/robust accuracy and the corresponding tradeoff achieved by a contemporary mini-max adversarial training approach in a high-dimensional regime where the number of data points and the parameters of the model grow in proportion to each other.

regression

Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

no code implementations26 Dec 2019 Hesameddin Mohammadi, Armin Zare, Mahdi Soltanolkotabi, Mihailo R. Jovanović

Model-free reinforcement learning attempts to find an optimal control action for an unknown dynamical system by directly searching over the parameter space of controllers.

Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators

1 code implementation ICLR 2020 Reinhard Heckel, Mahdi Soltanolkotabi

A surprising experiment that highlights this architectural bias towards natural images is that one can remove noise and corruptions from a natural image without using any training data, by simply fitting (via gradient descent) a randomly initialized, over-parameterized convolutional generator to the corrupted image.

Attribute Denoising +1

GENERALIZATION GUARANTEES FOR NEURAL NETS VIA HARNESSING THE LOW-RANKNESS OF JACOBIAN

no code implementations25 Sep 2019 Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi

We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well.

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

no code implementations12 Jun 2019 Samet Oymak, Zalan Fabian, Mingchen Li, Mahdi Soltanolkotabi

We show that over the information space learning is fast and one can quickly train a model with zero training loss that can also generalize well.

Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

1 code implementation27 Mar 2019 Mingchen Li, Mahdi Soltanolkotabi, Samet Oymak

In particular, we prove that: (i) In the first few iterations where the updates are still in the vicinity of the initialization gradient descent only fits to the correct labels essentially ignoring the noisy labels.

Towards moderate overparameterization: global convergence guarantees for training shallow neural networks

no code implementations12 Feb 2019 Samet Oymak, Mahdi Soltanolkotabi

However, in practice much more moderate levels of overparameterization seems to be sufficient and in many cases overparameterized models seem to perfectly interpolate the training data as soon as the number of parameters exceed the size of the training data by a constant factor.

Fitting ReLUs via SGD and Quantized SGD

no code implementations19 Jan 2019 Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, A. Salman Avestimehr

Perhaps unexpectedly, we show that QSGD maintains the fast convergence of SGD to a globally optimal model while significantly reducing the communication cost.

Overparameterized Nonlinear Learning: Gradient Descent Takes the Shortest Path?

no code implementations25 Dec 2018 Samet Oymak, Mahdi Soltanolkotabi

In this paper we demonstrate that when the loss has certain properties over a minimally small neighborhood of the initial point, first order methods such as (stochastic) gradient descent have a few intriguing properties: (1) the iterates converge at a geometric rate to a global optima even when the loss is nonconvex, (2) among all global optima of the loss the iterates converge to one with a near minimal distance to the initial point, (3) the iterates take a near direct route from the initial point to this global optima.

Compressed Sensing with Deep Image Prior and Learned Regularization

1 code implementation17 Jun 2018 Dave Van Veen, Ajil Jalal, Mahdi Soltanolkotabi, Eric Price, Sriram Vishwanath, Alexandros G. Dimakis

We propose a novel method for compressed sensing recovery using untrained deep generative models.

Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy

no code implementations4 Jun 2018 Qian Yu, Songze Li, Netanel Raviv, Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi, Salman Avestimehr

We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms.

Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding

no code implementations24 May 2018 Songze Li, Seyed Mohammadreza Mousavi Kalan, Qian Yu, Mahdi Soltanolkotabi, A. Salman Avestimehr

In particular, PCR requires a recovery threshold that scales inversely proportionally with the amount of computation/storage available at each worker.

regression

End-to-end Learning of a Convolutional Neural Network via Deep Tensor Decomposition

no code implementations16 May 2018 Samet Oymak, Mahdi Soltanolkotabi

In this paper we study the problem of learning the weights of a deep convolutional neural network.

Tensor Decomposition

Fundamental Resource Trade-offs for Encoded Distributed Optimization

no code implementations31 Mar 2018 A. Salman Avestimehr, Seyed Mohammadreza Mousavi Kalan, Mahdi Soltanolkotabi

We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy), and straggler toleration in this framework.

Distributed Computing Distributed Optimization

Gradient Methods for Submodular Maximization

no code implementations NeurIPS 2017 Hamed Hassani, Mahdi Soltanolkotabi, Amin Karbasi

Despite the apparent lack of convexity in such functions, we prove that stochastic projected gradient methods can provide strong approximation guarantees for maximizing continuous submodular functions with convex constraints.

Active Learning

Theoretical insights into the optimization landscape of over-parameterized shallow neural networks

no code implementations16 Jul 2017 Mahdi Soltanolkotabi, Adel Javanmard, Jason D. Lee

In this paper we study the problem of learning a shallow artificial neural network that best fits a training data set.

Learning ReLUs via Gradient Descent

no code implementations NeurIPS 2017 Mahdi Soltanolkotabi

In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form $max(0,<w, x>)$ with $w$ denoting the weight vector.

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

no code implementations20 Feb 2017 Mahdi Soltanolkotabi

We focus on the under-determined setting where the number of measurements is significantly smaller than the dimension of the signal ($m<<n$).

Image Reconstruction

Fast and Reliable Parameter Estimation from Nonlinear Observations

no code implementations23 Oct 2016 Samet Oymak, Mahdi Soltanolkotabi

In this paper we study the problem of recovering a structured but unknown parameter ${\bf{\theta}}^*$ from $n$ nonlinear observations of the form $y_i=f(\langle {\bf{x}}_i,{\bf{\theta}}^*\rangle)$ for $i=1, 2,\ldots, n$.

Sharp Time--Data Tradeoffs for Linear Inverse Problems

no code implementations16 Jul 2015 Samet Oymak, Benjamin Recht, Mahdi Soltanolkotabi

We sharply characterize the convergence rate associated with a wide variety of random measurement ensembles in terms of the number of measurements and structural complexity of the signal with respect to the chosen penalty function.

Isometric sketching of any set via the Restricted Isometry Property

no code implementations11 Jun 2015 Samet Oymak, Benjamin Recht, Mahdi Soltanolkotabi

In this paper we show that for the purposes of dimensionality reduction certain class of structured random matrices behave similarly to random Gaussian matrices.

Dimensionality Reduction

Approximate Subspace-Sparse Recovery with Corrupted Data via Constrained $\ell_1$-Minimization

no code implementations23 Dec 2014 Ehsan Elhamifar, Mahdi Soltanolkotabi, Shankar Sastry

High-dimensional data often lie in low-dimensional subspaces corresponding to different classes they belong to.

Clustering

Robust subspace clustering

no code implementations11 Jan 2013 Mahdi Soltanolkotabi, Ehsan Elhamifar, Emmanuel J. Candès

Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space.

Clustering

Cannot find the paper you are looking for? You can Submit a new open access paper.