Search Results for author: Ali Ghodsi

Found 72 papers, 17 papers with code

Kernelized Supervised Dictionary Learning

no code implementations • 10 Jul 2012 • Mehrdad J. Gangeh, Ali Ghodsi, Mohamed S. Kamel

In this paper, we propose supervised dictionary learning (SDL) by incorporating information on class labels into the learning of the dictionary.

Dictionary Learning

Paper
Add Code

Supervised Texture Classification Using a Novel Compression-Based Similarity Measure

no code implementations • 12 Jul 2012 • Mehrdad J. Gangeh, Ali Ghodsi, Mohamed S. Kamel

To this end, by design, it solely uses P-frame coding to find the (dis)similarity among patches/images.

Classification General Classification +1

Paper
Add Code

Highly Available Transactions: Virtues and Limitations (Extended Version)

no code implementations • 1 Feb 2013 • Peter Bailis, Aaron Davidson, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica

To minimize network latency and remain online during server failures and network partitions, many modern distributed data storage systems eschew transactional functionality, which provides strong semantic guarantees for groups of multiple operations over multiple data items.

Databases

Paper
Add Code

A Fast Greedy Algorithm for Generalized Column Subset Selection

no code implementations • 24 Dec 2013 • Ahmed K. Farahat, Ali Ghodsi, Mohamed S. Kamel

This paper defines a generalized column subset selection problem which is concerned with the selection of a few columns from a source matrix A that best approximate the span of a target matrix B.

Paper
Add Code

Greedy Column Subset Selection for Large-scale Data Sets

no code implementations • 24 Dec 2013 • Ahmed K. Farahat, Ahmed Elgohary, Ali Ghodsi, Mohamed S. Kamel

The algorithm first learns a concise representation of all columns using random projection, and it then solves a generalized column subset selection problem at each machine in which a subset of columns are selected from the sub-matrix on that machine such that the reconstruction error of the concise representation is minimized.

Paper
Add Code

The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox

2 code implementations • 12 Sep 2014 • Daniel Crankshaw, Peter Bailis, Joseph E. Gonzalez, Haoyuan Li, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, Michael. I. Jordan

In this work, we present Velox, a new component of the Berkeley Data Analytics Stack.

Databases

110

Paper
Code

Supervised Dictionary Learning and Sparse Representation-A Review

no code implementations • 20 Feb 2015 • Mehrdad J. Gangeh, Ahmed K. Farahat, Ali Ghodsi, Mohamed S. Kamel

This review provides a broad, yet deep, view of the state-of-the-art methods for S-DLSR and allows for the advancement of research and development in this emerging area of research.

Denoising Dictionary Learning +1

Paper
Add Code

On the Invariance of Dictionary Learning and Sparse Representation to Projecting Data to a Discriminative Space

no code implementations • 6 Mar 2015 • Mehrdad J. Gangeh, Ali Ghodsi

In this paper, it is proved that dictionary learning and sparse representation is invariant to a linear transformation.

Dictionary Learning

Paper
Add Code

Semi-supervised Dictionary Learning Based on Hilbert-Schmidt Independence Criterion

no code implementations • 25 Apr 2016 • Mehrdad J. Gangeh, Safaa M. A. Bedawi, Ali Ghodsi, Fakhri Karray

The proposed method benefits from the supervisory information by learning the dictionary in a space where the dependency between the data and class labels is maximized.

Dictionary Learning

Paper
Add Code

Semi-Supervised Representation Learning based on Probabilistic Labeling

no code implementations • 10 May 2016 • Ershad Banijamali, Ali Ghodsi

Then, we map the data to lower-dimensional space using a linear transformation such that the dependency between the transformed data and the assigned labels is maximized.

Representation Learning

Paper
Add Code

Generative Mixture of Networks

no code implementations • 10 Feb 2017 • Ershad Banijamali, Ali Ghodsi, Pascal Poupart

The model consists of K networks that are trained together to learn the underlying distribution of a given data set.

Clustering

Paper
Add Code

Fast Spectral Clustering Using Autoencoders and Landmarks

no code implementations • 7 Apr 2017 • Ershad Banijamali, Ali Ghodsi

Spectral clustering is a powerful clustering algorithm that suffers from high computational complexity, due to eigen decomposition.

Clustering

Paper
Add Code

Deep Structure for end-to-end inverse rendering

no code implementations • 25 Aug 2017 • Shima Kamyab, Ali Ghodsi, S. Zohreh Azimifar

Inverse rendering in a 3D format denoted to recovering the 3D properties of a scene given 2D input image(s) and is typically done using 3D Morphable Model (3DMM) based methods from single view images.

Inverse Rendering

Paper
Add Code

Robust Locally-Linear Controllable Embedding

no code implementations • 15 Oct 2017 • Ershad Banijamali, Rui Shu, Mohammad Ghavamzadeh, Hung Bui, Ali Ghodsi

We also propose a principled variational approximation of the embedding posterior that takes the future observation into account, and thus, makes the variational approximation more robust against the noise.

Paper
Add Code

Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis

1 code implementation • 23 Nov 2017 • Seyed Mahdi Rezaeinia, Ali Ghodsi, Rouhollah Rahmani

In this paper we propose a novel method, Improved Word Vectors (IWV), which increases the accuracy of pre-trained word embeddings in sentiment analysis.

Marketing Part-Of-Speech Tagging +5

Paper
Code

JADE: Joint Autoencoders for Dis-Entanglement

no code implementations • 24 Nov 2017 • Ershad Banijamali, Amir-Hossein Karimi, Alexander Wong, Ali Ghodsi

The problem of feature disentanglement has been explored in the literature, for the purpose of image and video processing and text analysis.

Disentanglement General Classification

Paper
Add Code

Disentangling Dynamics and Content for Control and Planning

no code implementations • 24 Nov 2017 • Ershad Banijamali, Ahmad Khajenezhad, Ali Ghodsi, Mohammad Ghavamzadeh

In this paper, We study the problem of learning a controllable representation for high-dimensional observations of dynamical systems.

Paper
Add Code

A Berkeley View of Systems Challenges for AI

no code implementations • 15 Dec 2017 • Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel

With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production.

Machine Translation speech-recognition +1

Paper
Add Code

Text Classification based on Multiple Block Convolutional Highways

no code implementations • 23 Jul 2018 • Seyed Mahdi Rezaeinia, Ali Ghodsi, Rouhollah Rahmani

In the Text Classification areas of Sentiment Analysis, Subjectivity/Objectivity Analysis, and Opinion Polarity, Convolutional Neural Networks have gained special attention because of their performance and accuracy.

General Classification Sentiment Analysis +2

Paper
Add Code

SRP: Efficient class-aware embedding learning for large-scale data via supervised random projections

1 code implementation • 7 Nov 2018 • Amir-Hossein Karimi, Alexander Wong, Ali Ghodsi

While stochastic approximation strategies have been explored for unsupervised dimensionality reduction to tackle this challenge, such approaches are not well-suited for accelerating computational speed for supervised dimensionality reduction.

Supervised dimensionality reduction

Paper
Code

Deep Variational Sufficient Dimensionality Reduction

no code implementations • 18 Dec 2018 • Ershad Banijamali, Amir-Hossein Karimi, Ali Ghodsi

We consider the problem of sufficient dimensionality reduction (SDR), where the high-dimensional observation is transformed to a low-dimensional sub-space in which the information of the observations regarding the label variable is preserved.

Dimensionality Reduction General Classification

Paper
Add Code

Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry

1 code implementation • Nature Methods 2018 • Ngoc Hieu Tran, Rui Qiao, Lei Xin, Xin Chen, Chuyi Liu, Xianglilan Zhang, Baozhen Shan, Ali Ghodsi, Ming Li

We present DeepNovo-DIA, a de novo peptide-sequencing method for data-independent acquisition (DIA) mass spectrometry data.

de novo peptide sequencing

Paper
Code

DeepNovoV2: Better de novo peptide sequencing with deep learning

1 code implementation • 17 Apr 2019 • Rui Qiao, Ngoc Hieu Tran, Lei Xin, Baozhen Shan, Ming Li, Ali Ghodsi

Personalized cancer vaccines are envisioned as the next generation rational cancer immunotherapy.

de novo peptide sequencing

Paper
Code

Segmentation Approach for Coreference Resolution Task

no code implementations • 30 Jun 2020 • Aref Jafari, Ali Ghodsi

This has been accomplished by defining an embedding method for the position of all members of a coreference cluster in a document and resolving all of them for a given mention.

coreference-resolution Position +1

Paper
Add Code

Multidimensional Scaling, Sammon Mapping, and Isomap: Tutorial and Survey

1 code implementation • 17 Sep 2020 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Then, Sammon mapping, Isomap, and kernel Isomap are explained.

Dimensionality Reduction

Paper
Code

Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey

1 code implementation • 22 Sep 2020 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach.

Dimensionality Reduction

Paper
Code

A Neuro-Symbolic Method for Solving Differential and Functional Equations

no code implementations • 4 Nov 2020 • Maysum Panju, Ali Ghodsi

When neural networks are used to solve differential equations, they usually produce solutions in the form of black-box functions that are not directly mathematically interpretable.

Language Modelling valid

Paper
Add Code

Symbolically Solving Partial Differential Equations using Deep Learning

no code implementations • 12 Nov 2020 • Maysum Panju, Kourosh Parand, Ali Ghodsi

We describe a neural-based method for generating exact or approximate solutions to differential equations in the form of mathematical expressions.

Paper
Add Code

Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey

no code implementations • 17 Nov 2020 • Benyamin Ghojogh, Ali Ghodsi

Thereafter, we introduce the Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformer (GPT) as the stacks of encoders and decoders of transformer, respectively.

Deep Attention Natural Language Inference +1

Paper
Add Code

Locally Linear Embedding and its Variants: Tutorial and Survey

1 code implementation • 22 Nov 2020 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

In this paper, we first cover LLE, kernel LLE, inverse LLE, and feature fusion with LLE.

Dimensionality Reduction

Paper
Code

Legendre Deep Neural Network (LDNN) and its application for approximation of nonlinear Volterra–Fredholm–Hammerstein integral equations

no code implementations • 1 Jan 2021 • Kourosh Parand, Zeinab Hajimohammadi, Ali Ghodsi

In particular, Volterra–Fredholm–Hammerstein integral equations are the main type of these integral equations and researchers are interested in investigating and solving these equations.

Paper
Add Code

Factor Analysis, Probabilistic Principal Component Analysis, Variational Inference, and Variational Autoencoder: Tutorial and Survey

no code implementations • 4 Jan 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Finally, VAE is explained where the encoder, decoder and sampling from the latent space are introduced.

Dimensionality Reduction Variational Inference

Paper
Add Code

Computationally instrument-resolution-independent de novo peptide sequencing for high-resolution devices

2 code implementations • Nature Machine Intelligence 2021 • Rui Qiao, Ngoc Hieu Tran, Lei Xin, Xin Chen, Ming Li, Baozhen Shan, Ali Ghodsi

De novo peptide sequencing is the key technology for finding novel peptides from mass spectra.

de novo peptide sequencing

Paper
Code

Generative Locally Linear Embedding

1 code implementation • 4 Apr 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

In this work, we propose two novel generative versions of LLE, named Generative LLE (GLLE), whose linear reconstruction steps are stochastic rather than deterministic.

Dimensionality Reduction Variational Inference

Paper
Code

Annealing Knowledge Distillation

1 code implementation • EACL 2021 • Aref Jafari, Mehdi Rezagholizadeh, Pranav Sharma, Ali Ghodsi

Knowledge distillation (KD) is a prominent model compression technique for deep neural networks in which the knowledge of a trained large teacher model is transferred to a smaller student model.

Image Classification Knowledge Distillation +1

Paper
Code

Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax

1 code implementation • Findings (ACL) 2021 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Peyman Passban, Ali Ghodsi

We exploit a semi-supervised approach based on KD to train a model on augmented data.

Data Augmentation Knowledge Distillation +2

Paper
Code

Laplacian-Based Dimensionality Reduction Including Spectral Clustering, Laplacian Eigenmap, Locality Preserving Projection, Graph Embedding, and Diffusion Map: Tutorial and Survey

no code implementations • 3 Jun 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Versions of graph embedding are then explained which are generalized versions of Laplacian eigenmap and locality preserving projection.

Clustering Dimensionality Reduction +1

Paper
Add Code

Reproducing Kernel Hilbert Space, Mercer's Theorem, Eigenfunctions, Nyström Method, and Use of Kernels in Machine Learning: Tutorial and Survey

no code implementations • 15 Jun 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

We start with reviewing the history of kernels in functional analysis and machine learning.

BIG-bench Machine Learning Dimensionality Reduction +1

Paper
Add Code

SymbolicGPT: A Generative Transformer Model for Symbolic Regression

2 code implementations • 27 Jun 2021 • Mojtaba Valipour, Bowen You, Maysum Panju, Ali Ghodsi

Symbolic regression is the task of identifying a mathematical expression that best fits a provided dataset of input and output values.

Language Modelling regression +1

Paper
Code

Legendre Deep Neural Network (LDNN) and its application for approximation of nonlinear Volterra Fredholm Hammerstein integral equations

no code implementations • 27 Jun 2021 • Zeinab Hajimohammadi, Kourosh Parand, Ali Ghodsi

In this paper, we propose Legendre Deep Neural Network (LDNN) for solving nonlinear Volterra Fredholm Hammerstein integral equations (VFHIEs).

Paper
Add Code

Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey

no code implementations • 29 Jun 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants.

Dimensionality Reduction

Paper
Add Code

Generative locally linear embedding: A module for manifold unfolding and visualization

1 code implementation • Software Impacts 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

One can unfold the nonlinear manifold of a dataset for low-dimensional visualization and feature extraction.

Data Visualization Dimensionality Reduction +1

Paper
Code

Restricted Boltzmann Machine and Deep Belief Network: Tutorial and Survey

no code implementations • 26 Jul 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Then, we introduce the structures of BM and RBM.

Dimensionality Reduction Probabilistic Deep Learning

Paper
Add Code

Johnson-Lindenstrauss Lemma, Linear and Nonlinear Random Projections, Random Fourier Features, and Random Kitchen Sinks: Tutorial and Survey

no code implementations • 9 Aug 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

This is a tutorial and survey paper on the Johnson-Lindenstrauss (JL) lemma and linear and nonlinear random projections.

Dimensionality Reduction LEMMA

Paper
Add Code

Uniform Manifold Approximation and Projection (UMAP) and its Variants: Tutorial and Survey

no code implementations • 25 Aug 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

We start with UMAP algorithm where we explain probabilities of neighborhood in the input and embedding spaces, optimization of cost function, training algorithm, derivation of gradients, and supervised and semi-supervised embedding by UMAP.

Data Visualization Dimensionality Reduction

Paper
Add Code

KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation

no code implementations • 13 Sep 2021 • Marzieh S. Tahaei, Ella Charlaix, Vahid Partovi Nia, Ali Ghodsi, Mehdi Rezagholizadeh

We present our KroneckerBERT, a compressed version of the BERT_BASE model obtained using this framework.

Knowledge Distillation Language Modelling +1

Paper
Add Code

How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

1 code implementation • 13 Sep 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge of a large neural network into a smaller one.

Adversarial Robustness Data Augmentation +4

Paper
Code

Knowledge Distillation with Noisy Labels for Natural Language Understanding

no code implementations • WNUT (ACL) 2021 • Shivendra Bhardwaj, Abbas Ghaddar, Ahmad Rashid, Khalil Bibi, Chengyang Li, Ali Ghodsi, Philippe Langlais, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is extensively used to compress and deploy large pre-trained language models on edge devices for real-world applications.

Knowledge Distillation Natural Language Understanding

Paper
Add Code

KKT Conditions, First-Order and Second-Order Optimization, and Distributed Optimization: Tutorial and Survey

no code implementations • 5 Oct 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Then, we explain second-order methods including Newton's method for unconstrained, equality constrained, and inequality constrained problems....

Distributed Optimization Multiobjective Optimization +1

Paper
Add Code

Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher

no code implementations • COLING 2022 • Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, Ali Ghodsi

A case in point is that the best performing checkpoint of the teacher might not necessarily be the best teacher for training the student in KD.

Image Classification Knowledge Distillation +3

Paper
Add Code

Sufficient Dimension Reduction for High-Dimensional Regression and Low-Dimensional Embedding: Tutorial and Survey

no code implementations • 18 Oct 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Finally, we explain Kernel Dimension Reduction (KDR) both for supervised and unsupervised learning.

Dimensionality Reduction regression

Paper
Add Code

Generative Adversarial Networks and Adversarial Autoencoders: Tutorial and Survey

no code implementations • 26 Nov 2021 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Finally, we explain the autoencoders based on adversarial learning including adversarial autoencoder, PixelGAN, and implicit autoencoder.

Dimensionality Reduction Face Generation +4

Paper
Add Code

Spectral, Probabilistic, and Deep Metric Learning: Tutorial and Survey

no code implementations • 23 Jan 2022 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

In deep learning methods, we first introduce reconstruction autoencoders and supervised loss functions for metric learning.

Dimensionality Reduction Metric Learning

Paper
Add Code

When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation

1 code implementation • Findings (ACL) 2022 • Ehsan Kamalloo, Mehdi Rezagholizadeh, Ali Ghodsi

From a pre-generated pool of augmented samples, Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.

Data Augmentation Knowledge Distillation

Paper
Code

Theoretical Connection between Locally Linear Embedding, Factor Analysis, and Probabilistic PCA

no code implementations • 25 Mar 2022 • Benyamin Ghojogh, Ali Ghodsi, Fakhri Karray, Mark Crowley

Locally Linear Embedding (LLE) is a nonlinear spectral dimensionality reduction and manifold learning method.

Dimensionality Reduction Variational Inference

Paper
Add Code

Do we need Label Regularization to Fine-tune Pre-trained Language Models?

no code implementations • 25 May 2022 • Ivan Kobyzev, Aref Jafari, Mehdi Rezagholizadeh, Tianda Li, Alan Do-Omri, Peng Lu, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) is a prominent neural model compression technique that heavily relies on teacher network predictions to guide the training of a student model.

Knowledge Distillation Model Compression

Paper
Add Code

DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation

2 code implementations • 14 Oct 2022 • Mojtaba Valipour, Mehdi Rezagholizadeh, Ivan Kobyzev, Ali Ghodsi

Our DyLoRA method trains LoRA blocks for a range of ranks instead of a single rank by sorting the representation learned by the adapter module at different ranks during training.

Natural Language Understanding Text Generation

1,966

Paper
Code

Continuation KD: Improved Knowledge Distillation through the Lens of Continuation Optimization

no code implementations • 12 Dec 2022 • Aref Jafari, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart, Ali Ghodsi

Knowledge Distillation (KD) has been extensively used for natural language understanding (NLU) tasks to improve a small model's (a student) generalization by transferring the knowledge from a larger model (a teacher).

Knowledge Distillation Natural Language Understanding

Paper
Add Code

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging

no code implementations • 12 Dec 2022 • Peng Lu, Ivan Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, Ali Ghodsi, Philippe Langlais

Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.

Knowledge Distillation Question Answering +2

Paper
Add Code

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

no code implementations • 27 Jan 2023 • Aref Jafari, Mehdi Rezagholizadeh, Ali Ghodsi

Augmenting the training set by adding this auxiliary improves the performance of KD significantly and leads to a closer match between the student and the teacher.

Knowledge Distillation Model Compression

Paper
Add Code

Recurrent Neural Networks and Long Short-Term Memory Networks: Tutorial and Survey

no code implementations • 22 Apr 2023 • Benyamin Ghojogh, Ali Ghodsi

Then, we introduce LSTM gates and cells, history and variants of LSTM, and Gated Recurrent Units (GRU).

Language Modelling Speech Recognition

Paper
Add Code

SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

no code implementations • 1 Sep 2023 • Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Parsa Kavehzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi

Deep neural networks (DNNs) must cater to a variety of users with different performance needs and budgets, leading to the costly practice of training, storing, and maintaining numerous specific models.

Image Classification Model Selection

Paper
Add Code

Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference

no code implementations • 16 Sep 2023 • Parsa Kavehzadeh, Mojtaba Valipour, Marzieh Tahaei, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

We extend SortedNet to generative NLP tasks, making large language models dynamic without any Pre-Training and by only replacing Standard Fine-Tuning (SFT) with Sorted Fine-Tuning (SoFT).

Instruction Following Question Answering +1

Paper
Add Code

WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization

no code implementations • 14 Feb 2024 • Ali Saheb Pasand, Reza Moravej, Mahdi Biparva, Ali Ghodsi

A common phenomena confining the representation quality in Self-Supervised Learning (SSL) is dimensional collapse (also known as rank degeneration), where the learned representations are mapped to a low dimensional subspace of the representation space.

Data Augmentation Self-Supervised Learning

Paper
Add Code

Scalable Graph Self-Supervised Learning

no code implementations • 14 Feb 2024 • Ali Saheb Pasand, Reza Moravej, Mahdi Biparva, Raika Karimi, Ali Ghodsi

Our experiments demonstrate that the cost associated with the loss computation can be reduced via node or dimension sampling without lowering the downstream performance.

Self-Supervised Learning

Paper
Add Code

QDyLoRA: Quantized Dynamic Low-Rank Adaptation for Efficient Large Language Model Tuning

no code implementations • 16 Feb 2024 • Hossein Rajabzadeh, Mojtaba Valipour, Tianshu Zhu, Marzieh Tahaei, Hyock Ju Kwon, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

Finetuning large language models requires huge GPU memory, restricting the choice to acquire Larger models.

Language Modelling Large Language Model +1

Paper
Add Code

Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling

no code implementations • 28 Feb 2024 • Mahdi Karami, Ali Ghodsi

In the rapidly evolving landscape of deep learning, the quest for models that balance expressivity with computational efficiency has never been more critical.

Computational Efficiency Image Classification +2

Paper
Add Code

Learning Chemotherapy Drug Action via Universal Physics-Informed Neural Networks

no code implementations • 11 Apr 2024 • Lena Podina, Ali Ghodsi, Mohammad Kohandel

Quantitative systems pharmacology (QSP) is widely used to assess drug effects and toxicity before the drug goes to clinical trial.

Paper
Add Code

Universal-KD: Attention-based Output-Grounded Intermediate Layer Knowledge Distillation

no code implementations • EMNLP 2021 • Yimeng Wu, Mehdi Rezagholizadeh, Abbas Ghaddar, Md Akmal Haidar, Ali Ghodsi

Intermediate layer matching is shown as an effective approach for improving knowledge distillation (KD).

Knowledge Distillation

Paper
Add Code

How to Select One Among All ? An Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding

no code implementations • Findings (EMNLP) 2021 • Tianda Li, Ahmad Rashid, Aref Jafari, Pranav Sharma, Ali Ghodsi, Mehdi Rezagholizadeh

Knowledge Distillation (KD) is a model compression algorithm that helps transfer the knowledge in a large neural network into a smaller one.

Adversarial Robustness Data Augmentation +4

Paper
Add Code

RW-KD: Sample-wise Loss Terms Re-Weighting for Knowledge Distillation

no code implementations • Findings (EMNLP) 2021 • Peng Lu, Abbas Ghaddar, Ahmad Rashid, Mehdi Rezagholizadeh, Ali Ghodsi, Philippe Langlais

Knowledge Distillation (KD) is extensively used in Natural Language Processing to compress the pre-training and task-specific fine-tuning phases of large neural language models.

Knowledge Distillation

Paper
Add Code

KroneckerBERT: Significant Compression of Pre-trained Language Models Through Kronecker Decomposition and Knowledge Distillation

no code implementations • NAACL 2022 • Marzieh Tahaei, Ella Charlaix, Vahid Nia, Ali Ghodsi, Mehdi Rezagholizadeh

We push the limits of state-of-the-art Transformer-based pre-trained language model compression using Kronecker decomposition.

Knowledge Distillation Language Modelling +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.