no code implementations • 31 Oct 2024 • Shuyang Yu, Runxue Bao, Parminder Bhatia, Taha Kass-Hout, Jiayu Zhou, Cao Xiao
Large language models (LLMs) can learn vast amounts of knowledge from diverse domains during pre-training.
1 code implementation • 3 Oct 2024 • Guodong Du, Junlin Lee, Jing Li, Runhua Jiang, Yifei Guo, Shuyang Yu, Hanting Liu, Sim Kuan Goh, Ho-Kin Tang, Daojing He, Min Zhang
Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model.
1 code implementation • 18 Jun 2024 • Guodong Du, Jing Li, Hanting Liu, Runhua Jiang, Shuyang Yu, Yifei Guo, Sim Kuan Goh, Ho-Kin Tang
Fine-tuning pre-trained language models, particularly large language models, demands extensive computing resources and can result in varying performance outcomes across different domains and datasets.
1 code implementation • 4 Jun 2024 • Runhua Jiang, Guodong Du, Shuyang Yu, Yifei Guo, Sim Kuan Goh, Ho-Kin Tang
This paper attempts to tackle the challenges by introducing Cosine Annealing Differential Evolution (CADE), designed to modulate the mutation factor (F) and crossover rate (CR) of differential evolution (DE) for the SNN model, i. e., Spiking Element Wise (SEW) ResNet.
no code implementations • 7 May 2024 • Yijiang Pang, Shuyang Yu, Bao Hoang, Jiayu Zhou
To tackle this challenge, in this paper, we propose a novel parameter-free optimizer, \textsc{AdamG} (Adam with the golden step size), designed to automatically adapt to diverse optimization problems without manual tuning.
1 code implementation • 4 Sep 2023 • Shuyang Yu, Junyuan Hong, Haobo Zhang, Haotao Wang, Zhangyang Wang, Jiayu Zhou
Training a high-performance deep neural network requires large amounts of data and computational resources.
1 code implementation • 4 Jun 2023 • Junyuan Hong, Yi Zeng, Shuyang Yu, Lingjuan Lyu, Ruoxi Jia, Jiayu Zhou
Data-free knowledge distillation (KD) helps transfer knowledge from a pre-trained model (known as the teacher model) to a smaller model (known as the student model) without access to the original training data used for training the teacher model.
Backdoor Defense for Data-Free Distillation with Poisoned Teachers Data-free Knowledge Distillation
1 code implementation • ICLR 2023 • Shuyang Yu, Junyuan Hong, Haotao Wang, Zhangyang Wang, Jiayu Zhou
We propose to take advantage of such heterogeneity and turn the curse into a blessing that facilitates OoD detection in FL.
1 code implementation • the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining 2021 • Junyuan Hong, Zhuangdi Zhu, Shuyang Yu, Zhangyang Wang, Hiroko Dodge, Jiayu Zhou
While adversarial learning is commonly used in centralized learning for mitigating bias, there are significant barriers when extending it to the federated framework.