Search Results for author: Peijie Dong

Found 22 papers, 11 papers with code

Can LLMs Maintain Fundamental Abilities under KV Cache Compression?

no code implementations4 Feb 2025 Xiang Liu, Zhenheng Tang, Hong Chen, Peijie Dong, Zeyu Li, Xiuze Zhou, Bo Li, Xuming Hu, Xiaowen Chu

We present a comprehensive empirical study evaluating prominent KV cache compression methods across diverse tasks, spanning world knowledge, commonsense reasoning, arithmetic reasoning, code generation, safety, and long-context understanding and generation. Our analysis reveals that KV cache compression methods exhibit task-specific performance degradation.

FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion

1 code implementation27 Oct 2024 Zhenheng Tang, Yonggang Zhang, Peijie Dong, Yiu-ming Cheung, Amelie Chi Zhou, Bo Han, Xiaowen Chu

In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity.

Federated Learning

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

1 code implementation24 Oct 2024 Qi Li, Xiang Liu, Zhenheng Tang, Peijie Dong, Zeyu Li, Xinglin Pan, Xiaowen Chu

Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within language models, which motivates further research on more practical and reliable editing methods.

General Knowledge Model Editing

LPZero: Language Model Zero-cost Proxy Search from Zero

no code implementations7 Oct 2024 Peijie Dong, Lujun Li, Xiang Liu, Zhenheng Tang, Xuebo Liu, Qiang Wang, Xiaowen Chu

Specifically, we model the ZC proxy as a symbolic equation and incorporate a unified proxy search space that encompasses existing ZC proxies, which are composed of a predefined set of mathematical symbols.

Language Modeling Language Modelling +1

LongGenBench: Long-context Generation Benchmark

1 code implementation5 Oct 2024 Xiang Liu, Peijie Dong, Xuming Hu, Xiaowen Chu

Current long-context benchmarks primarily focus on retrieval-based tests, requiring Large Language Models (LLMs) to locate specific information within extensive input contexts, such as the needle-in-a-haystack (NIAH) benchmark.

Language Modelling Retrieval

Multi-Task Domain Adaptation for Language Grounding with 3D Objects

no code implementations3 Jul 2024 Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang, Zhixu Li, Tiefeng Li, Xiaowen Chu

However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field.

Domain Adaptation Multi-Task Learning

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

1 code implementation5 Jun 2024 Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

In particular, we devise an elaborate search space encompassing the existing pruning metrics to discover the potential symbolic pruning metric.

Diversity Language Modeling +1

VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting

1 code implementation25 Mar 2024 Yujin Tang, Peijie Dong, Zhenheng Tang, Xiaowen Chu, Junwei Liang

Combining CNNs or ViTs, with RNNs for spatiotemporal forecasting, has yielded unparalleled results in predicting temporal and spatial dynamics.

Mamba

ParZC: Parametric Zero-Cost Proxies for Efficient NAS

no code implementations3 Feb 2024 Peijie Dong, Lujun Li, Xinglin Pan, Zimian Wei, Xiang Liu, Qiang Wang, Xiaowen Chu

Recent advancements in Zero-shot Neural Architecture Search (NAS) highlight the efficacy of zero-cost proxies in various NAS benchmarks.

Neural Architecture Search

TVT: Training-Free Vision Transformer Search on Tiny Datasets

no code implementations24 Nov 2023 Zimian Wei, Hengyue Pan, Lujun Li, Peijie Dong, Zhiliang Tian, Xin Niu, Dongsheng Li

In this paper, for the first time, we investigate how to search in a training-free manner with the help of teacher models and devise an effective Training-free ViT (TVT) search framework.

Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models

no code implementations7 Nov 2023 Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu

For end users, our benchmark and findings help better understand different optimization techniques, training and inference frameworks, together with hardware platforms in choosing configurations for deploying LLMs.

Quantization

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

1 code implementation ICCV 2023 Peijie Dong, Lujun Li, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan

In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy.

Quantization

DisWOT: Student Architecture Search for Distillation WithOut Training

1 code implementation CVPR 2023 Peijie Dong, Lujun Li, Zimian Wei

In this way, our student architecture search for Distillation WithOut Training (DisWOT) significantly improves the performance of the model in the distillation stage with at least 180$\times$ training acceleration.

Knowledge Distillation

Progressive Meta-Pooling Learning for Lightweight Image Classification Model

no code implementations24 Jan 2023 Peijie Dong, Xin Niu, Zhiliang Tian, Lujun Li, Xiaodong Wang, Zimian Wei, Hengyue Pan, Dongsheng Li

Practical networks for edge devices adopt shallow depth and small convolutional kernels to save memory and computational cost, which leads to a restricted receptive field.

Classification Image Classification

RD-NAS: Enhancing One-shot Supernet Ranking Ability via Ranking Distillation from Zero-cost Proxies

1 code implementation24 Jan 2023 Peijie Dong, Xin Niu, Lujun Li, Zhiliang Tian, Xiaodong Wang, Zimian Wei, Hengyue Pan, Dongsheng Li

In this paper, we propose Ranking Distillation one-shot NAS (RD-NAS) to enhance ranking consistency, which utilizes zero-cost proxies as the cheap teacher and adopts the margin ranking loss to distill the ranking knowledge.

Computational Efficiency Neural Architecture Search

Prior-Guided One-shot Neural Architecture Search

1 code implementation27 Jun 2022 Peijie Dong, Xin Niu, Lujun Li, Linzhen Xie, Wenbin Zou, Tian Ye, Zimian Wei, Hengyue Pan

In this paper, we present Prior-Guided One-shot NAS (PGONAS) to strengthen the ranking correlation of supernets.

Neural Architecture Search

Cannot find the paper you are looking for? You can Submit a new open access paper.