1 code implementation • 22 Feb 2024 • Xuxi Chen, Zhendong Wang, Daouda Sow, Junjie Yang, Tianlong Chen, Yingbin Liang, Mingyuan Zhou, Zhangyang Wang
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets, with a specific focus on selective retention of samples that incur moderately high losses.
1 code implementation • 3 Dec 2023 • Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang
Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.
no code implementations • 18 Nov 2023 • Arindam Mitra, Luciano del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadallah
Research on training small LMs has often relied on imitation learning to replicate the output of more capable models.
Ranked #1 on Crass AI on BIG-bench
no code implementations • 10 Oct 2023 • Xuxi Chen, Yu Yang, Zhangyang Wang, Baharan Mirzasoleiman
Dataset distillation aims to minimize the time and memory needed for training deep networks on large datasets, by creating a small set of synthetic images that has a similar generalization performance to that of the full dataset.
1 code implementation • 3 Mar 2023 • Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, Zhangyang Wang
In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce "Sparsity May Cry" Benchmark (SMC-Bench), a collection of carefully-curated 4 diverse tasks with 10 datasets, that accounts for capturing a wide range of domain-specific and sophisticated knowledge.
1 code implementation • 28 Feb 2023 • Junjie Yang, Xuxi Chen, Tianlong Chen, Zhangyang Wang, Yingbin Liang
This data-driven procedure yields L2O that can efficiently solve problems similar to those seen in training, that is, drawn from the same ``task distribution".
1 code implementation • ICCV 2023 • Tianlong Chen, Xuxi Chen, Xianzhi Du, Abdullah Rashwan, Fan Yang, Huizhong Chen, Zhangyang Wang, Yeqing Li
Instead of compressing multiple tasks' knowledge into a single model, MoE separates the parameter space and only utilizes the relevant model pieces given task type and its input, which provides stabilized MTL training and ultra-efficient inference.
1 code implementation • NIPS 2022 • Mukund Varma T, Xuxi Chen, Zhenyu Zhang, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang
Improving the performance of deep networks in data-limited regimes has warranted much attention.
1 code implementation • 27 Jul 2022 • Mukund Varma T, Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang
While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers at two stages.
Ranked #1 on Generalizable Novel View Synthesis on LLFF
1 code implementation • 7 Jul 2022 • Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang
Transformers have quickly shined in the computer vision world since the emergence of Vision Transformers (ViTs).
1 code implementation • 9 Feb 2022 • Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang
The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i. e., winning tickets) that can be trained in isolation to match full accuracy.
1 code implementation • CVPR 2024 • Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, Lei Xing
Extensive experiments demonstrate that our method effectively mitigates the challenges of noisy labels, often necessitating few to no validation samples, and is well generalized to other tasks such as image segmentation.
Ranked #8 on Image Classification on Clothing1M (using clean data) (using extra training data)
1 code implementation • NeurIPS 2021 • Xuxi Chen, Tianlong Chen, Zhenyu Zhang, Zhangyang Wang
The lottery ticket hypothesis (LTH) emerges as a promising framework to leverage a special sparse subnetwork (i. e., winning ticket) instead of a full model for both training and inference, that can lower both costs without sacrificing the performance.
1 code implementation • 30 Oct 2021 • Xuxi Chen, Tianlong Chen, Weizhu Chen, Ahmed Hassan Awadallah, Zhangyang Wang, Yu Cheng
To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
no code implementations • 29 Sep 2021 • Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang
The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i. e., $\textit{winning tickets}$) that can be trained in isolation to match full accuracy.
2 code implementations • NeurIPS 2021 • Xiaolong Ma, Geng Yuan, Xuan Shen, Tianlong Chen, Xuxi Chen, Xiaohan Chen, Ning Liu, Minghai Qin, Sijia Liu, Zhangyang Wang, Yanzhi Wang
Based on our analysis, we summarize a guideline for parameter settings in regards of specific architecture characteristics, which we hope to catalyze the research progress on the topic of lottery ticket hypothesis.
1 code implementation • 6 Jun 2021 • Zhenyu Zhang, Xuxi Chen, Tianlong Chen, Zhangyang Wang
We observe that a high-quality winning ticket can be found with training and pruning the dense network on the very compact PrAC set, which can substantially save training iterations for the ticket finding process.
1 code implementation • ICLR 2021 • Xuxi Chen, Zhenyu Zhang, Yongduo Sui, Tianlong Chen
In this work, we for the first time study the existence of such trainable matching subnetworks in deep GANs.
2 code implementations • 12 Feb 2021 • Tianlong Chen, Yongduo Sui, Xuxi Chen, Aston Zhang, Zhangyang Wang
With graphs rapidly growing in size and deeper graph neural networks (GNNs) emerging, the training and inference of GNNs become increasingly expensive.
1 code implementation • ICML 2020 • Xuxi Chen, Wuyang Chen, Tianlong Chen, Ye Yuan, Chen Gong, Kewei Chen, Zhangyang Wang
Many real-world applications have to tackle the Positive-Unlabeled (PU) learning problem, i. e., learning binary classifiers from a large amount of unlabeled data and a few labeled positive examples.