Search Results for author: Ajay Jaiswal

Found 24 papers, 12 papers with code

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping

no code implementations5 Apr 2024 Ajay Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Shiwei Liu, Tianlong Chen, Aditya Akella

In this work, we observed the saturation of computationally expensive feed-forward blocks of LLM layers and proposed FFN-SkipLLM, which is a novel fine-grained skip strategy of autoregressive LLMs.

Attribute Hallucination +1

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

no code implementations18 Mar 2024 Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li

While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected.

Ethics Fairness +1

LLaGA: Large Language and Graph Assistant

2 code implementations13 Feb 2024 Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, Zhangyang Wang

Graph Neural Networks (GNNs) have empowered the advance in graph-structured data analysis.

Compressing LLMs: The Truth is Rarely Pure and Never Simple

1 code implementation2 Oct 2023 Ajay Jaiswal, Zhe Gan, Xianzhi Du, BoWen Zhang, Zhangyang Wang, Yinfei Yang

Recently, several works have shown significant success in training-free and data-free compression (pruning and quantization) of LLMs that achieve 50 - 60% sparsity and reduce the bit width to 3 or 4 bits per weight, with negligible degradation of perplexity over the uncompressed baseline.

Quantization Retrieval

Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs

1 code implementation29 Sep 2023 Lu Yin, Ajay Jaiswal, Shiwei Liu, Souvik Kundu, Zhangyang Wang

Contrary to this belief, this paper presents a counter-argument: small-magnitude weights of pre-trained model weights encode vital knowledge essential for tackling difficult downstream tasks - manifested as the monotonic relationship between the performance drop of downstream tasks across the difficulty spectrum, as we prune more pre-trained weights by magnitude.

Quantization

Physics-Driven Turbulence Image Restoration with Stochastic Refinement

1 code implementation ICCV 2023 Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan, Zhangyang Wang

Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs.

Image Restoration

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models

1 code implementation18 Jun 2023 Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

Motivated by the recent observations of model soups, which suggest that fine-tuned weights of multiple models can be merged to a better minima, we propose Instant Soup Pruning (ISP) to generate lottery ticket quality subnetworks, using a fraction of the original IMP cost by replacing the expensive intermediate pruning stages of IMP with computationally efficient weak mask generation and aggregation routine.

Graph Ladling: Shockingly Simple Parallel GNN Training without Intermediate Communication

1 code implementation18 Jun 2023 Ajay Jaiswal, Shiwei Liu, Tianlong Chen, Ying Ding, Zhangyang Wang

By dividing giant graph data, we build multiple independently and parallelly trained weaker GNNs (soup ingredient) without any intermediate communication, and combine their strength using a greedy interpolation soup procedure to achieve state-of-the-art performance.

graph partitioning Graph Sampling

CancerGPT: Few-shot Drug Pair Synergy Prediction using Large Pre-trained Language Models

no code implementations18 Apr 2023 TianHao Li, Sandesh Shetty, Advaith Kamath, Ajay Jaiswal, Xianqian Jiang, Ying Ding, Yejin Kim

Large pre-trained language models (LLMs) have been shown to have significant potential in few-shot learning across various fields, even with minimal training data.

Few-Shot Learning

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

1 code implementation3 Mar 2023 Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, Zhangyang Wang

In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce "Sparsity May Cry" Benchmark (SMC-Bench), a collection of carefully-curated 4 diverse tasks with 10 datasets, that accounts for capturing a wide range of domain-specific and sophisticated knowledge.

Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers

1 code implementation2 Mar 2023 Tianlong Chen, Zhenyu Zhang, Ajay Jaiswal, Shiwei Liu, Zhangyang Wang

Despite their remarkable achievement, gigantic transformers encounter significant drawbacks, including exorbitant computational and memory footprints during training, as well as severe collapse evidenced by a high degree of parameter redundancy.

RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging

no code implementations15 Oct 2022 Ajay Jaiswal, Kumar Ashutosh, Justin F Rousseau, Yifan Peng, Zhangyang Wang, Ying Ding

Our extensive experiments on popular medical imaging classification tasks (cardiopulmonary disease and lesion classification) using real-world datasets, show the performance benefit of RoS-KD, its ability to distill knowledge from many popular large networks (ResNet-50, DenseNet-121, MobileNet-V2) in a comparatively small network, and its robustness to adversarial attacks (PGD, FSGM).

Classification Knowledge Distillation +1

Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again

1 code implementation14 Oct 2022 Ajay Jaiswal, Peihao Wang, Tianlong Chen, Justin F. Rousseau, Ying Ding, Zhangyang Wang

In this paper, firstly, we provide a new perspective of gradient flow to understand the substandard performance of deep GCNs and hypothesize that by facilitating healthy gradient flow, we can significantly improve their trainability, as well as achieve state-of-the-art (SOTA) level performance from vanilla-GCNs.

Single Frame Atmospheric Turbulence Mitigation: A Benchmark Study and A New Physics-Inspired Transformer Model

1 code implementation20 Jul 2022 Zhiyuan Mao, Ajay Jaiswal, Zhangyang Wang, Stanley H. Chan

Image restoration algorithms for atmospheric turbulence are known to be much more challenging to design than traditional ones such as blur or noise because the distortion caused by the turbulence is an entanglement of spatially varying blur, geometric distortion, and sensor noise.

Image Restoration SSIM

Training Your Sparse Neural Network Better with Any Mask

1 code implementation26 Jun 2022 Ajay Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang

Pruning large neural networks to create high-quality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity.

RadBERT-CL: Factually-Aware Contrastive Learning For Radiology Report Classification

no code implementations28 Oct 2021 Ajay Jaiswal, Liyan Tang, Meheli Ghosh, Justin Rousseau, Yifan Peng, Ying Ding

Radiology reports are unstructured and contain the imaging findings and corresponding diagnoses transcribed by radiologists which include clinical facts and negated and/or uncertain statements.

Classification Contrastive Learning

SCALP -- Supervised Contrastive Learning for Cardiopulmonary Disease Classification and Localization in Chest X-rays using Patient Metadata

no code implementations27 Oct 2021 Ajay Jaiswal, TianHao Li, Cyprian Zander, Yan Han, Justin F. Rousseau, Yifan Peng, Ying Ding

In this paper, we proposed a novel and simple data augmentation method based on patient metadata and supervised knowledge to create clinically accurate positive and negative augmentations for chest X-rays.

Contrastive Learning Data Augmentation

Solomon at SemEval-2020 Task 11: Ensemble Architecture for Fine-Tuned Propaganda Detection in News Articles

no code implementations SEMEVAL 2020 Mayank Raj, Ajay Jaiswal, Rohit R. R, Ankita Gupta, Sudeep Kumar Sahoo, Vertika Srivastava, Yeon Hyang Kim

This paper describes our system (Solomon) details and results of participation in the SemEval 2020 Task 11 "Detection of Propaganda Techniques in News Articles"\cite{DaSanMartinoSemeval20task11}.

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.