Search Results for author: Shiwei Liu

Found 42 papers, 29 papers with code

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

no code implementations • 5 Apr 2024 • Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.

Attribute Hallucination +1

Paper
Add Code

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

1 code implementation • 5 Mar 2024 • Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang

To address this problem, this paper introduces Multi-scale Positional Encoding (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of LLMs to handle the relevant information located in the middle of the context, without fine-tuning or introducing any additional overhead.

Language Modelling

Paper
Code

ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters

no code implementations • 31 Jan 2024 • Shiwei Liu, Guanchen Tao, Yifei Zou, Derek Chow, Zichen Fan, Kauna Lei, Bangfei Pan, Dennis Sylvester, Gregory Kielian, Mehdi Saligane

Compared to state-of-the-art Softmax hardware, ConSmax results in 14. 5x energy and 14. 0x area savings with a comparable accuracy on a GPT-2 model and the WikiText103 dataset.

Language Modelling Large Language Model

Paper
Add Code

The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need

no code implementations • 9 Dec 2023 • Tianjin Huang, Tianlong Chen, Zhangyang Wang, Shiwei Liu

Therefore, it remains unclear whether the self-attention operation is crucial for the recent advances in SSL - or CNNs can deliver the same excellence with more advanced designs, too?

Self-Supervised Learning

Paper
Add Code

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation

1 code implementation • 7 Dec 2023 • Boqian Wu, Qiao Xiao, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Decebal Constantin Mocanu, Maurice van Keulen, Elena Mocanu

E2ENet achieves comparable accuracy on the large-scale challenge AMOS-CT, while saving over 68\% parameter count and 29\% FLOPs in the inference phase, compared with the previous best-performing method.

Brain Tumor Segmentation Image Segmentation +2

Paper
Code

REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training

1 code implementation • 5 Dec 2023 • Jiaxu Zhao, Lu Yin, Shiwei Liu, Meng Fang, Mykola Pechenizkiy

These bias attributes are strongly spuriously correlated with the target variable, causing the models to be biased towards spurious correlations (i. e., \textit{bias-conflicting}).

Paper
Code

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

1 code implementation • 3 Dec 2023 • Can Jin, Tianjin Huang, Yihua Zhang, Mykola Pechenizkiy, Sijia Liu, Shiwei Liu, Tianlong Chen

The rapid development of large-scale deep learning models questions the affordability of hardware platforms, which necessitates the pruning to reduce their computational and memory footprints.

Image Classification Visual Prompting

Paper
Code

Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model

1 code implementation • NeurIPS 2023 • Shiwei Liu, Tian Zhu, Milong Ren, Chungong Yu, Dongbo Bu, Haicang Zhang

Many crucial biological processes rely on networks of protein-protein interactions.

Representation Learning

Paper
Code

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

1 code implementation • 13 Oct 2023 • Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, Rongrong Ji

Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs.

Network Pruning

Paper
Code

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

1 code implementation • 8 Oct 2023 • Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Mykola Pechenizkiy, Yi Liang, Zhangyang Wang, Shiwei Liu

Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size.

Network Pruning

Paper
Code

AdaMerging: Adaptive Model Merging for Multi-Task Learning

1 code implementation • 4 Oct 2023 • Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, DaCheng Tao

This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.

Multi-Task Learning

Paper
Code

Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

1 code implementation • 29 Sep 2023 • Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, Zhangyang Wang

Contrary to this belief, this paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks - manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude.

Quantization

Paper
Code

Enhancing Adversarial Training via Reweighting Optimization Trajectory

1 code implementation • 25 Jun 2023 • Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei, Mykola Pechenizkiy

Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization.

Adversarial Robustness

Paper
Code

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication

1 code implementation • 18 Jun 2023 • Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

By dividing giant graph data, we build multiple independently and parallelly trained weaker GNNs (soup ingredient) without any intermediate communication, and combine their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance.

graph partitioning Graph Sampling

Paper
Code

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

1 code implementation • 18 Jun 2023 • Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine.

Paper
Code

Are Large Kernels Better Teachers than Transformers for ConvNets?

1 code implementation • 30 May 2023 • Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu

We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures.

Knowledge Distillation

254

Paper
Code

Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks

1 code implementation • 10 Mar 2023 • Zahra Atashgahi, Xuhao Zhang, Neil Kichler, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu

Feature selection that selects an informative subset of variables from data not only enhances the model interpretability and performance but also alleviates the resource demands.

feature selection

Paper
Code

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

1 code implementation • 3 Mar 2023 • Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, Zhangyang Wang

In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce "Sparsity May Cry" Benchmark (SMC-Bench), a collection of carefully-curated 4 diverse tasks with 10 datasets, that accounts for capturing a wide range of domain-specific and sophisticated knowledge.

Paper
Code

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

1 code implementation • 2 Mar 2023 • Tianlong Chen, Zhenyu Zhang, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang

Despite their remarkable achievement, gigantic transformers encounter significant drawbacks, including exorbitant computational and memory footprints during training, as well as severe collapse evidenced by a high degree of parameter redundancy.

Paper
Code

Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers

no code implementations • 6 Feb 2023 • Shiwei Liu, Zhangyang Wang

In response, we summarize ten Q\&As of SNNs from many key aspects, including dense vs. sparse, unstructured sparse vs. structured sparse, pruning vs. sparse training, dense-to-sparse training vs. sparse-to-sparse training, static sparsity vs. dynamic sparsity, before-training/during-training vs. post-training sparsity, and many more.

General Knowledge

Paper
Add Code

Data Augmented Flatness-aware Gradient Projection for Continual Learning

no code implementations • ICCV 2023 • Enneng Yang, Li Shen, Zhenyi Wang, Shiwei Liu, Guibing Guo, Xingwei Wang

In this paper, we first revisit the gradient projection method from the perspective of flatness of loss surface, and find that unflatness of the loss surface leads to catastrophic forgetting of the old tasks when the projection constraint is reduced to improve the performance of new tasks.

Continual Learning

Paper
Add Code

Dynamic Sparse Network for Time Series Classification: Learning What to "see''

1 code implementation • 19 Dec 2022 • Qiao Xiao, Boqian Wu, Yu Zhang, Shiwei Liu, Mykola Pechenizkiy, Elena Mocanu, Decebal Constantin Mocanu

The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC).

Time Series Time Series Analysis +1

Paper
Code

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

1 code implementation • 28 Nov 2022 • Tianjin Huang, Tianlong Chen, Meng Fang, Vlado Menkovski, Jiaxu Zhao, Lu Yin, Yulong Pei, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy, Shiwei Liu

Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i. e., untrained networks).

Out-of-Distribution Detection

Paper
Code

Lottery Pools: Winning More by Interpolating Tickets without Increasing Training or Inference Cost

1 code implementation • 23 Aug 2022 • Lu Yin, Shiwei Liu, Meng Fang, Tianjin Huang, Vlado Menkovski, Mykola Pechenizkiy

We call our method Lottery Pools.

Paper
Code

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity

1 code implementation • 7 Jul 2022 • Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang

Transformers have quickly shined in the computer vision world since the emergence of Vision Transformers (ViTs).

Object Detection Semantic Segmentation

254

Paper
Code

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training

no code implementations • 30 May 2022 • Lu Yin, Vlado Menkovski, Meng Fang, Tianjin Huang, Yulong Pei, Mykola Pechenizkiy, Decebal Constantin Mocanu, Shiwei Liu

Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch.

Paper
Add Code

Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

no code implementations • 5 Mar 2022 • Shiwei Liu, Yuesong Tian, Tianlong Chen, Li Shen

Even more unconventionally, our proposed method enables directly training sparse unbalanced GANs with an extremely sparse generator from scratch.

Model Compression

Paper
Add Code

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

1 code implementation • ICLR 2022 • Shiwei Liu, Tianlong Chen, Xiaohan Chen, Li Shen, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy

In this paper, we focus on sparse training and highlight a perhaps counter-intuitive finding, that random pruning at initialization can be quite powerful for the sparse training of modern neural networks.

Adversarial Robustness Out-of-Distribution Detection

Paper
Code

Achieving Personalized Federated Learning with Sparse Local Models

no code implementations • 27 Jan 2022 • Tiansheng Huang, Shiwei Liu, Li Shen, Fengxiang He, Weiwei Lin, DaCheng Tao

To counter this issue, personalized FL (PFL) was proposed to produce dedicated local models for each individual user.

Personalized Federated Learning

Paper
Add Code

On Heterogeneously Distributed Data, Sparsity Matters

no code implementations • 29 Sep 2021 • Tiansheng Huang, Shiwei Liu, Li Shen, Fengxiang He, Weiwei Lin, DaCheng Tao

Federated learning (FL) is particularly vulnerable to heterogeneously distributed data, since a common global model in FL may not adapt to the heterogeneous data distribution of each user.

Personalized Federated Learning

Paper
Add Code

Sparse Unbalanced GAN Training with In-Time Over-Parameterization

no code implementations • 29 Sep 2021 • Shiwei Liu, Yuesong Tian, Tianlong Chen, Li Shen

Perhaps most importantly, we find instead of inheriting parameters from expensive pre-trained GANs, directly training sparse GANs from scratch can be a much more efficient solution.

Model Compression

Paper
Add Code

Hierarchical Semantic Segmentation using Psychometric Learning

no code implementations • 7 Jul 2021 • Lu Yin, Vlado Menkovski, Shiwei Liu, Mykola Pechenizkiy

One of the major challenges in the supervised learning approaches is expressing and collecting the rich knowledge that experts have with respect to the meaning present in the image data.

Image Segmentation Metric Learning +2

Paper
Add Code

Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

2 code implementations • ICLR 2022 • Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elena Mocanu, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu

Our framework, FreeTickets, is defined as the ensemble of these relatively cheap sparse subnetworks.

Ensemble Learning

Paper
Code

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

2 code implementations • NeurIPS 2021 • Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu

Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of attention currently on post-training pruning (iterative magnitude pruning), and before-training pruning (pruning at initialization).

Ranked #3 on Sparse Learning on ImageNet

Network Pruning Sparse Learning

Paper
Code

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

4 code implementations • 4 Feb 2021 • Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, Mykola Pechenizkiy

By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization in the space-time manifold, closing the gap in the expressibility between sparse training and dense training.

Ranked #4 on Sparse Learning on ImageNet

Image Classification Sparse Learning

Paper
Code

Mitigating the effect of atmospheric turbulence on orbital angular momentum-based quantum key distribution using real-time adaptive optics with phase unwrapping

no code implementations • 2 Feb 2021 • Zhiwei Tao, Yichong Ren, Azezigul Abdukirim, Shiwei Liu, Ruizhong Rao

Quantum key distribution (QKD) employed orbital angular momentum (OAM) for high-dimensional encoding enhances the system security and information capacity between two communication parties.

Quantum Physics

Paper
Add Code

Selfish Sparse RNN Training

1 code implementation • 22 Jan 2021 • Shiwei Liu, Decebal Constantin Mocanu, Yulong Pei, Mykola Pechenizkiy

Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks.

Paper
Code

Topological Insights into Sparse Neural Networks

3 code implementations • 24 Jun 2020 • Shiwei Liu, Tim Van der Lee, Anil Yaman, Zahra Atashgahi, Davide Ferraro, Ghada Sokar, Mykola Pechenizkiy, Decebal Constantin Mocanu

However, comparing different sparse topologies and determining how sparse topologies evolve during training, especially for the situation in which the sparse structure optimization is involved, remain as challenging open questions.

236

Paper
Code

On improving deep learning generalization with adaptive sparse connectivity

1 code implementation • 27 Jun 2019 • Shiwei Liu, Decebal Constantin Mocanu, Mykola Pechenizkiy

Large neural networks are very successful in various tasks.

236

Paper
Code

A Brain-inspired Algorithm for Training Highly Sparse Neural Networks

2 code implementations • 17 Mar 2019 • Zahra Atashgahi, Joost Pieterse, Shiwei Liu, Decebal Constantin Mocanu, Raymond Veldhuis, Mykola Pechenizkiy

Concretely, by exploiting the cosine similarity metric to measure the importance of the connections, our proposed method, Cosine similarity-based and Random Topology Exploration (CTRE), evolves the topology of sparse neural networks by adding the most important connections to the network without calculating dense gradient in the backward.

Learning Theory

Paper
Code

Intrinsically Sparse Long Short-Term Memory Networks

no code implementations • 26 Jan 2019 • Shiwei Liu, Decebal Constantin Mocanu, Mykola Pechenizkiy

However, LSTMs are prone to be memory-bandwidth limited in realistic applications and need an unbearable period of training and inference time as the model size is ever-increasing.

Model Compression Sentiment Analysis

Paper
Add Code

Sparse evolutionary Deep Learning with over one million artificial neurons on commodity hardware

4 code implementations • 26 Jan 2019 • Shiwei Liu, Decebal Constantin Mocanu, Amarsagar Reddy Ramapuram Matavalam, Yulong Pei, Mykola Pechenizkiy

Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the unprecedented growth in the data volumes.

236

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.