Search Results for author: Shiwei Liu

Found 68 papers, 47 papers with code

Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning

no code implementations26 Jun 2025 Xin Xu, Tianhao Chen, Fan Zhang, Wanlong Liu, Pengxiang Li, Ajay Kumar Jaiswal, Yuchen Yan, Jishan Hu, Yang Wang, Hao Chen, Shiwei Liu, Shizhe Diao, Can Yang, Lu Yin

While slow-thinking large language models (LLMs) exhibit reflection-like reasoning, commonly referred to as the "aha moment:, their ability to generate informative critiques and refine prior solutions remains limited.

AlphaDecay: Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

1 code implementation17 Jun 2025 Di He, Ajay Jaiswal, Songjun Tu, Li Shen, Ganzhao Yuan, Shiwei Liu, Lu Yin

While it is common to assign a uniform decay rate to every layer, this approach overlooks the structural diversity of LLMs and the varying spectral properties across modules.

Diversity

A Technical Study into Small Reasoning Language Models

no code implementations16 Jun 2025 Xialie Zhuang, Peixian Ma, Zhikai Jia, Zheng Cao, Shiwei Liu

The ongoing evolution of language models has led to the development of large-scale architectures that demonstrate exceptional performance across a wide range of tasks.

Code Generation Computational Efficiency +3

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

1 code implementation29 May 2025 Qiao Xiao, Alan Ansell, Boqian Wu, Lu Yin, Mykola Pechenizkiy, Shiwei Liu, Decebal Constantin Mocanu

Large language models (LLMs) have achieved remarkable success across various tasks but face deployment challenges due to their massive computational demands.

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

1 code implementation24 Feb 2025 Tianjin Huang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Tianlong Chen, Lu Liu, Qingsong Wen, Zhangyang Wang, Shiwei Liu

This paper comprehensively evaluates several recently proposed optimizers for 4-bit training, revealing that low-bit precision amplifies sensitivity to learning rates and often causes unstable gradient norms, leading to divergence at higher learning rates.

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

1 code implementation11 Feb 2025 Xialie Zhuang, Zhikai Jia, Jianjin Li, Zhenyu Zhang, Li Shen, Zheng Cao, Shiwei Liu

To address this, we propose Mask-Enhanced Autoregressive Prediction (MEAP), a simple yet effective training paradigm that seamlessly integrates Masked Language Modeling (MLM) into Next-Token Prediction (NTP) to enhance the latter's in-context retrieval capabilities.

Decoder Information Retrieval +4

The Curse of Depth in Large Language Models

no code implementations9 Feb 2025 Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu

While Pre-LN stabilizes the training of Transformer LLMs, its output variance exponentially grows with the model depth, which undesirably causes the derivative of the deep Transformer blocks to be an identity matrix, and therefore barely contributes to the training.

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

1 code implementation22 Jan 2025 Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, DaCheng Tao

Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge.

Mathematical Reasoning

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

1 code implementation12 Jan 2025 Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu

Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks, yet their training remains highly resource-intensive and susceptible to critical challenges such as training instability.

Time Series Forecasting

Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN

1 code implementation18 Dec 2024 Pengxiang Li, Lu Yin, Shiwei Liu

In contrast, Post-Layer Normalization (Post-LN) preserves larger gradient norms in deeper layers but suffers from vanishing gradients in earlier layers.

Model Compression

Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning

1 code implementation26 Nov 2024 Mingyu Cao, Gen Li, Jie Ji, JiaQi Zhang, Xiaolong Ma, Shiwei Liu, Lu Yin

Mixture-of-Experts (MOE) has garnered significant attention for their ability to scale up neural networks while utilizing the same or even fewer active parameters.

Mixture-of-Experts

AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

1 code implementation14 Oct 2024 Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang

Existing LLM pruning strategies typically assign uniform pruning ratios across layers, limiting overall pruning ability; and recent work on layerwise pruning of LLMs is often based on heuristics that can easily lead to suboptimal performance.

Full-Rank No More: Low-Rank Weight Training for Modern Speech Recognition Models

no code implementations10 Oct 2024 Adriana Fernandez-Lopez, Shiwei Liu, Lu Yin, Stavros Petridis, Maja Pantic

This paper investigates the under-explored area of low-rank weight training for large-scale Conformer-based speech recognition models from scratch.

speech-recognition Speech Recognition

Is C4 Dataset Optimal for Pruning? An Investigation of Calibration Data for LLM Pruning

1 code implementation9 Oct 2024 Abhinav Bandari, Lu Yin, Cheng-Yu Hsieh, Ajay Kumar Jaiswal, Tianlong Chen, Li Shen, Ranjay Krishna, Shiwei Liu

In this study, we evaluate the choice of calibration data on LLM pruning, across a wide range of datasets that are most commonly used in LLM training and evaluation, including four pertaining datasets as well as three categories of downstream tasks encompassing nine datasets.

In-Context Learning Network Pruning

(PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork

no code implementations24 Jul 2024 Tianjin Huang, Fang Meng, Li Shen, Fan Liu, Yulong Pei, Mykola Pechenizkiy, Shiwei Liu, Tianlong Chen

In this paper, we investigate a charming possibility - \textit{leveraging visual prompts to capture the channel importance and derive high-quality structural sparsity}.

From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients

1 code implementation15 Jul 2024 Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

Modern Large Language Models (LLMs) are composed of matrices with billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage.

GPU

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

2 code implementations11 Jul 2024 Zhenyu Zhang, Ajay Jaiswal, Lu Yin, Shiwei Liu, Jiawei Zhao, Yuandong Tian, Zhangyang Wang

To address these limitations, we introduce Q-Galore, a novel approach that substantially reduces memory usage by combining quantization and low-rank projection, surpassing the benefits of GaLore.

Quantization

Composable Interventions for Language Models

1 code implementation9 Jul 2024 Arinbjorn Kolbeinsson, Kyle O'Brien, Tianjin Huang, ShangHua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen

Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining.

knowledge editing Machine Unlearning +1

Multi-branch CNN and grouping cascade attention for medical image classification

no code implementations Sci Rep 14, 15013 2024 Shiwei Liu, Wenwen Yue, Zhiqing Guo, Liejun Wang

In addition, we propose an efficient CNN (EC) module to enhance the ability of the model and extract the local detail information in medical images.

image-classification Image Classification +2

MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization

no code implementations25 Jun 2024 Adriana Fernandez-Lopez, Honglie Chen, Pingchuan Ma, Lu Yin, Qiao Xiao, Stavros Petridis, Shiwei Liu, Maja Pantic

In this study, we propose a regularization technique that facilitates the training of visual and audio-visual speech recognition models (VSR and AVSR) from scratch.

Audio-Visual Speech Recognition speech-recognition +1

Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion

1 code implementation14 Jun 2024 Anke Tang, Li Shen, Yong Luo, Shiwei Liu, Han Hu, Bo Du

Once the routers are learned and a preference vector is set, the MoE module can be unloaded, thus no additional computational cost is introduced during inference.

Mixture-of-Experts Multi-Task Learning

OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning

2 code implementations28 May 2024 Pengxiang Li, Lu Yin, Xiaowei Gao, Shiwei Liu

The rapid advancements in Large Language Models (LLMs) have revolutionized various natural language processing tasks.

MMLU

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

no code implementations5 Apr 2024 Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.

Attribute Hallucination +1

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

1 code implementation5 Mar 2024 Zhenyu Zhang, Runjin Chen, Shiwei Liu, Zhewei Yao, Olatunji Ruwase, Beidi Chen, Xiaoxia Wu, Zhangyang Wang

To address this problem, this paper introduces Multi-scale Positional Encoding (Ms-PoE) which is a simple yet effective plug-and-play approach to enhance the capacity of LLMs to handle the relevant information located in the middle of the context, without fine-tuning or introducing any additional overhead.

Language Modeling Language Modelling

ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters

1 code implementation31 Jan 2024 Shiwei Liu, Guanchen Tao, Yifei Zou, Derek Chow, Zichen Fan, Kauna Lei, Bangfei Pan, Dennis Sylvester, Gregory Kielian, Mehdi Saligane

Experimental results show that ConSmax achieves a minuscule power consumption of 0. 2mW and an area of 0. 0008mm^2 at 1250MHz working frequency in 16nm FinFET technology.

Language Modelling Large Language Model

The Counterattack of CNNs in Self-Supervised Learning: Larger Kernel Size might be All You Need

no code implementations9 Dec 2023 Tianjin Huang, Tianlong Chen, Zhangyang Wang, Shiwei Liu

Therefore, it remains unclear whether the self-attention operation is crucial for the recent advances in SSL - or CNNs can deliver the same excellence with more advanced designs, too?

All Self-Supervised Learning

E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation

1 code implementation7 Dec 2023 Boqian Wu, Qiao Xiao, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Decebal Constantin Mocanu, Maurice van Keulen, Elena Mocanu

E2ENet achieves comparable accuracy on the large-scale challenge AMOS-CT, while saving over 68\% parameter count and 29\% FLOPs in the inference phase, compared with the previous best-performing method.

Brain Tumor Segmentation Image Segmentation +2

REST: Enhancing Group Robustness in DNNs through Reweighted Sparse Training

1 code implementation5 Dec 2023 Jiaxu Zhao, Lu Yin, Shiwei Liu, Meng Fang, Mykola Pechenizkiy

These bias attributes are strongly spuriously correlated with the target variable, causing the models to be biased towards spurious correlations (i. e., \textit{bias-conflicting}).

Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective

1 code implementation3 Dec 2023 Can Jin, Tianjin Huang, Yihua Zhang, Mykola Pechenizkiy, Sijia Liu, Shiwei Liu, Tianlong Chen

The rapid development of large-scale deep learning models questions the affordability of hardware platforms, which necessitates the pruning to reduce their computational and memory footprints.

Image Classification Visual Prompting

Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs

1 code implementation13 Oct 2023 Yuxin Zhang, Lirui Zhao, Mingbao Lin, Yunyun Sun, Yiwu Yao, Xingjia Han, Jared Tanner, Shiwei Liu, Rongrong Ji

Inspired by the Dynamic Sparse Training, DSnoT minimizes the reconstruction error between the dense and sparse LLMs, in the fashion of performing iterative weight pruning-and-growing on top of sparse LLMs.

Network Pruning

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

1 code implementation8 Oct 2023 Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Gen Li, Ajay Jaiswal, Mykola Pechenizkiy, Yi Liang, Michael Bendersky, Zhangyang Wang, Shiwei Liu

Large Language Models (LLMs), renowned for their remarkable performance across diverse domains, present a challenge when it comes to practical deployment due to their colossal model size.

Network Pruning

AdaMerging: Adaptive Model Merging for Multi-Task Learning

1 code implementation4 Oct 2023 Enneng Yang, Zhenyi Wang, Li Shen, Shiwei Liu, Guibing Guo, Xingwei Wang, DaCheng Tao

This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data.

model Task Arithmetic

Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

1 code implementation29 Sep 2023 Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, Zhangyang Wang

Contrary to this belief, this paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks - manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude.

Quantization

Enhancing Adversarial Training via Reweighting Optimization Trajectory

1 code implementation25 Jun 2023 Tianjin Huang, Shiwei Liu, Tianlong Chen, Meng Fang, Li Shen, Vlaod Menkovski, Lu Yin, Yulong Pei, Mykola Pechenizkiy

Despite the fact that adversarial training has become the de facto method for improving the robustness of deep neural networks, it is well-known that vanilla adversarial training suffers from daunting robust overfitting, resulting in unsatisfactory robust generalization.

Adversarial Robustness

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

1 code implementation18 Jun 2023 Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine.

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication

1 code implementation18 Jun 2023 Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

By dividing giant graph data, we build multiple independently and parallelly trained weaker GNNs (soup ingredient) without any intermediate communication, and combine their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance.

graph partitioning Graph Sampling

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter

1 code implementation NeurIPS 2023 Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Zhangyang Wang

Large pre-trained transformers are show-stealer in modern-day deep learning, and it becomes crucial to comprehend the parsimonious patterns that exist within them as they grow in scale.

Self-Supervised Learning

Are Large Kernels Better Teachers than Transformers for ConvNets?

1 code implementation30 May 2023 Tianjin Huang, Lu Yin, Zhenyu Zhang, Li Shen, Meng Fang, Mykola Pechenizkiy, Zhangyang Wang, Shiwei Liu

We hereby carry out a first-of-its-kind study unveiling that modern large-kernel ConvNets, a compelling competitor to Vision Transformers, are remarkably more effective teachers for small-kernel ConvNets, due to more similar architectures.

Knowledge Distillation

Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks

1 code implementation10 Mar 2023 Zahra Atashgahi, Xuhao Zhang, Neil Kichler, Shiwei Liu, Lu Yin, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu

Feature selection that selects an informative subset of variables from data not only enhances the model interpretability and performance but also alleviates the resource demands.

feature selection

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

1 code implementation3 Mar 2023 Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, Zhangyang Wang

In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce "Sparsity May Cry" Benchmark (SMC-Bench), a collection of carefully-curated 4 diverse tasks with 10 datasets, that accounts for capturing a wide range of domain-specific and sophisticated knowledge.

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

1 code implementation2 Mar 2023 Tianlong Chen, Zhenyu Zhang, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang

Despite their remarkable achievement, gigantic transformers encounter significant drawbacks, including exorbitant computational and memory footprints during training, as well as severe collapse evidenced by a high degree of parameter redundancy.

Mixture-of-Experts

Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers

no code implementations6 Feb 2023 Shiwei Liu, Zhangyang Wang

In response, we summarize ten Q\&As of SNNs from many key aspects, including dense vs. sparse, unstructured sparse vs. structured sparse, pruning vs. sparse training, dense-to-sparse training vs. sparse-to-sparse training, static sparsity vs. dynamic sparsity, before-training/during-training vs. post-training sparsity, and many more.

General Knowledge

Data Augmented Flatness-aware Gradient Projection for Continual Learning

no code implementations ICCV 2023 Enneng Yang, Li Shen, Zhenyi Wang, Shiwei Liu, Guibing Guo, Xingwei Wang

In this paper, we first revisit the gradient projection method from the perspective of flatness of loss surface, and find that unflatness of the loss surface leads to catastrophic forgetting of the old tasks when the projection constraint is reduced to improve the performance of new tasks.

Continual Learning

Dynamic Sparse Network for Time Series Classification: Learning What to "see''

1 code implementation19 Dec 2022 Qiao Xiao, Boqian Wu, Yu Zhang, Shiwei Liu, Mykola Pechenizkiy, Elena Mocanu, Decebal Constantin Mocanu

The receptive field (RF), which determines the region of time series to be ``seen'' and used, is critical to improve the performance for time series classification (TSC).

Time Series Time Series Analysis +1

You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets

1 code implementation28 Nov 2022 Tianjin Huang, Tianlong Chen, Meng Fang, Vlado Menkovski, Jiaxu Zhao, Lu Yin, Yulong Pei, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy, Shiwei Liu

Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i. e., untrained networks).

All Out-of-Distribution Detection

Superposing Many Tickets into One: A Performance Booster for Sparse Neural Network Training

no code implementations30 May 2022 Lu Yin, Vlado Menkovski, Meng Fang, Tianjin Huang, Yulong Pei, Mykola Pechenizkiy, Decebal Constantin Mocanu, Shiwei Liu

Recent works on sparse neural network training (sparse training) have shown that a compelling trade-off between performance and efficiency can be achieved by training intrinsically sparse neural networks from scratch.

Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

no code implementations5 Mar 2022 Shiwei Liu, Yuesong Tian, Tianlong Chen, Li Shen

Even more unconventionally, our proposed method enables directly training sparse unbalanced GANs with an extremely sparse generator from scratch.

Model Compression

The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training

1 code implementation ICLR 2022 Shiwei Liu, Tianlong Chen, Xiaohan Chen, Li Shen, Decebal Constantin Mocanu, Zhangyang Wang, Mykola Pechenizkiy

In this paper, we focus on sparse training and highlight a perhaps counter-intuitive finding, that random pruning at initialization can be quite powerful for the sparse training of modern neural networks.

Adversarial Robustness Out-of-Distribution Detection

Achieving Personalized Federated Learning with Sparse Local Models

no code implementations27 Jan 2022 Tiansheng Huang, Shiwei Liu, Li Shen, Fengxiang He, Weiwei Lin, DaCheng Tao

To counter this issue, personalized FL (PFL) was proposed to produce dedicated local models for each individual user.

Personalized Federated Learning

Sparse Unbalanced GAN Training with In-Time Over-Parameterization

no code implementations29 Sep 2021 Shiwei Liu, Yuesong Tian, Tianlong Chen, Li Shen

Perhaps most importantly, we find instead of inheriting parameters from expensive pre-trained GANs, directly training sparse GANs from scratch can be a much more efficient solution.

Model Compression

On Heterogeneously Distributed Data, Sparsity Matters

no code implementations29 Sep 2021 Tiansheng Huang, Shiwei Liu, Li Shen, Fengxiang He, Weiwei Lin, DaCheng Tao

Federated learning (FL) is particularly vulnerable to heterogeneously distributed data, since a common global model in FL may not adapt to the heterogeneous data distribution of each user.

Personalized Federated Learning

Hierarchical Semantic Segmentation using Psychometric Learning

no code implementations7 Jul 2021 Lu Yin, Vlado Menkovski, Shiwei Liu, Mykola Pechenizkiy

One of the major challenges in the supervised learning approaches is expressing and collecting the rich knowledge that experts have with respect to the meaning present in the image data.

Image Segmentation Metric Learning +2

Sparse Training via Boosting Pruning Plasticity with Neuroregeneration

2 code implementations NeurIPS 2021 Shiwei Liu, Tianlong Chen, Xiaohan Chen, Zahra Atashgahi, Lu Yin, Huanyu Kou, Li Shen, Mykola Pechenizkiy, Zhangyang Wang, Decebal Constantin Mocanu

Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of attention currently on post-training pruning (iterative magnitude pruning), and before-training pruning (pruning at initialization).

Network Pruning Sparse Learning

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

4 code implementations4 Feb 2021 Shiwei Liu, Lu Yin, Decebal Constantin Mocanu, Mykola Pechenizkiy

By starting from a random sparse network and continuously exploring sparse connectivities during training, we can perform an Over-Parameterization in the space-time manifold, closing the gap in the expressibility between sparse training and dense training.

Image Classification Sparse Learning

Mitigating the effect of atmospheric turbulence on orbital angular momentum-based quantum key distribution using real-time adaptive optics with phase unwrapping

no code implementations2 Feb 2021 Zhiwei Tao, Yichong Ren, Azezigul Abdukirim, Shiwei Liu, Ruizhong Rao

Quantum key distribution (QKD) employed orbital angular momentum (OAM) for high-dimensional encoding enhances the system security and information capacity between two communication parties.

Quantum Physics

Selfish Sparse RNN Training

1 code implementation22 Jan 2021 Shiwei Liu, Decebal Constantin Mocanu, Yulong Pei, Mykola Pechenizkiy

Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks.

Topological Insights into Sparse Neural Networks

2 code implementations24 Jun 2020 Shiwei Liu, Tim Van der Lee, Anil Yaman, Zahra Atashgahi, Davide Ferraro, Ghada Sokar, Mykola Pechenizkiy, Decebal Constantin Mocanu

However, comparing different sparse topologies and determining how sparse topologies evolve during training, especially for the situation in which the sparse structure optimization is involved, remain as challenging open questions.

A Brain-inspired Algorithm for Training Highly Sparse Neural Networks

2 code implementations17 Mar 2019 Zahra Atashgahi, Joost Pieterse, Shiwei Liu, Decebal Constantin Mocanu, Raymond Veldhuis, Mykola Pechenizkiy

Concretely, by exploiting the cosine similarity metric to measure the importance of the connections, our proposed method, Cosine similarity-based and Random Topology Exploration (CTRE), evolves the topology of sparse neural networks by adding the most important connections to the network without calculating dense gradient in the backward.

Learning Theory

Sparse evolutionary Deep Learning with over one million artificial neurons on commodity hardware

4 code implementations26 Jan 2019 Shiwei Liu, Decebal Constantin Mocanu, Amarsagar Reddy Ramapuram Matavalam, Yulong Pei, Mykola Pechenizkiy

Despite the success of ANNs, it is challenging to train and deploy modern ANNs on commodity hardware due to the ever-increasing model size and the unprecedented growth in the data volumes.

GPU

Intrinsically Sparse Long Short-Term Memory Networks

no code implementations26 Jan 2019 Shiwei Liu, Decebal Constantin Mocanu, Mykola Pechenizkiy

However, LSTMs are prone to be memory-bandwidth limited in realistic applications and need an unbearable period of training and inference time as the model size is ever-increasing.

Model Compression Sentiment Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.