Search Results for author: Xuxi Chen

Found 20 papers, 17 papers with code

Take the Bull by the Horns: Hard Sample-Reweighted Continual Training Improves LLM Generalization

1 code implementation • 22 Feb 2024 • Xuxi Chen, Zhendong Wang, Daouda Sow, Junjie Yang, Tianlong Chen, Yingbin Liang, Mingyuan Zhou, Zhangyang Wang

Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets, with a specific focus on selective retention of samples that incur moderately high losses.

Paper
Code

Rethinking PGD Attack: Is Sign Function Necessary?

1 code implementation • 3 Dec 2023 • Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang

Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign.

Paper
Code

Orca 2: Teaching Small Language Models How to Reason

no code implementations • 18 Nov 2023 • Arindam Mitra, Luciano del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, Ahmed Awadallah

Research on training small LMs has often relied on imitation learning to replicate the output of more capable models.

Ranked #1 on Crass AI on BIG-bench

Arithmetic Reasoning counterfactual +7

Paper
Add Code

Data Distillation Can Be Like Vodka: Distilling More Times For Better Quality

no code implementations • 10 Oct 2023 • Xuxi Chen, Yu Yang, Zhangyang Wang, Baharan Mirzasoleiman

Dataset distillation aims to minimize the time and memory needed for training deep networks on large datasets, by creating a small set of synthetic images that has a similar generalization performance to that of the full dataset.

Paper
Add Code

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!

1 code implementation • 3 Mar 2023 • Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen, Tianjin Huang, Ajay Jaiswal, Zhangyang Wang

In pursuit of a more general evaluation and unveiling the true potential of sparse algorithms, we introduce "Sparsity May Cry" Benchmark (SMC-Bench), a collection of carefully-curated 4 diverse tasks with 10 datasets, that accounts for capturing a wide range of domain-specific and sophisticated knowledge.

Paper
Code

M-L2O: Towards Generalizable Learning-to-Optimize by Test-Time Fast Self-Adaptation

1 code implementation • 28 Feb 2023 • Junjie Yang, Xuxi Chen, Tianlong Chen, Zhangyang Wang, Yingbin Liang

This data-driven procedure yields L2O that can efficiently solve problems similar to those seen in training, that is, drawn from the same ``task distribution".

Paper
Code

AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts

1 code implementation • ICCV 2023 • Tianlong Chen, Xuxi Chen, Xianzhi Du, Abdullah Rashwan, Fan Yang, Huizhong Chen, Zhangyang Wang, Yeqing Li

Instead of compressing multiple tasks' knowledge into a single model, MoE separates the parameter space and only utilizes the relevant model pieces given task type and its input, which provides stabilized MTL training and ultra-efficient inference.

Instance Segmentation Multi-Task Learning +3

32,808

Paper
Code

Sparse Winning Tickets are Data-Efficient Image Recognizers

1 code implementation • NIPS 2022 • Mukund Varma T, Xuxi Chen, Zhenyu Zhang, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

Improving the performance of deep networks in data-limited regimes has warranted much attention.

Paper
Code

Is Attention All That NeRF Needs?

1 code implementation • 27 Jul 2022 • Mukund Varma T, Peihao Wang, Xuxi Chen, Tianlong Chen, Subhashini Venugopalan, Zhangyang Wang

While prior works on NeRFs optimize a scene representation by inverting a handcrafted rendering equation, GNT achieves neural representation and rendering that generalizes across scenes using transformers at two stages.

Ranked #1 on Generalizable Novel View Synthesis on LLFF

Generalizable Novel View Synthesis Inductive Bias +1

327

Paper
Code

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity

1 code implementation • 7 Jul 2022 • Shiwei Liu, Tianlong Chen, Xiaohan Chen, Xuxi Chen, Qiao Xiao, Boqian Wu, Tommi Kärkkäinen, Mykola Pechenizkiy, Decebal Mocanu, Zhangyang Wang

Transformers have quickly shined in the computer vision world since the emergence of Vision Transformers (ViTs).

Object Detection Semantic Segmentation

253

Paper
Code

L2B: Learning to Bootstrap Robust Models for Combating Label Noise

1 code implementation • 9 Feb 2022 • Yuyin Zhou, Xianhang Li, Fengze Liu, Qingyue Wei, Xuxi Chen, Lequan Yu, Cihang Xie, Matthew P. Lungren, Lei Xing

Extensive experiments demonstrate that our method effectively mitigates the challenges of noisy labels, often necessitating few to no validation samples, and is well generalized to other tasks such as image segmentation.

Ranked #8 on Image Classification on Clothing1M (using clean data) (using extra training data)

Image Segmentation Learning with noisy labels +3

Paper
Code

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

1 code implementation • 9 Feb 2022 • Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang

The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i. e., winning tickets) that can be trained in isolation to match full accuracy.

Paper
Code

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

1 code implementation • 30 Oct 2021 • Xuxi Chen, Tianlong Chen, Weizhu Chen, Ahmed Hassan Awadallah, Zhangyang Wang, Yu Cheng

To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.

Paper
Code

You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership

1 code implementation • NeurIPS 2021 • Xuxi Chen, Tianlong Chen, Zhenyu Zhang, Zhangyang Wang

The lottery ticket hypothesis (LTH) emerges as a promising framework to leverage a special sparse subnetwork (i. e., winning ticket) instead of a full model for both training and inference, that can lower both costs without sacrificing the performance.

Paper
Code

Lottery Tickets can have Structural Sparsity

no code implementations • 29 Sep 2021 • Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang

The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i. e., $\textit{winning tickets}$) that can be trained in isolation to match full accuracy.

Paper
Add Code

Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?

2 code implementations • NeurIPS 2021 • Xiaolong Ma, Geng Yuan, Xuan Shen, Tianlong Chen, Xuxi Chen, Xiaohan Chen, Ning Liu, Minghai Qin, Sijia Liu, Zhangyang Wang, Yanzhi Wang

Based on our analysis, we summarize a guideline for parameter settings in regards of specific architecture characteristics, which we hope to catalyze the research progress on the topic of lottery ticket hypothesis.

139

Paper
Code

Efficient Lottery Ticket Finding: Less Data is More

1 code implementation • 6 Jun 2021 • Zhenyu Zhang, Xuxi Chen, Tianlong Chen, Zhangyang Wang

We observe that a high-quality winning ticket can be found with training and pruning the dense network on the very compact PrAC set, which can substantially save training iterations for the ticket finding process.

Paper
Code

GANs Can Play Lottery Tickets Too

1 code implementation • ICLR 2021 • Xuxi Chen, Zhenyu Zhang, Yongduo Sui, Tianlong Chen

In this work, we for the first time study the existence of such trainable matching subnetworks in deep GANs.

Image-to-Image Translation

Paper
Code

A Unified Lottery Ticket Hypothesis for Graph Neural Networks

2 code implementations • 12 Feb 2021 • Tianlong Chen, Yongduo Sui, Xuxi Chen, Aston Zhang, Zhangyang Wang

With graphs rapidly growing in size and deeper graph neural networks (GNNs) emerging, the training and inference of GNNs become increasingly expensive.

Link Prediction Node Classification

Paper
Code

Self-PU: Self Boosted and Calibrated Positive-Unlabeled Training

1 code implementation • ICML 2020 • Xuxi Chen, Wuyang Chen, Tianlong Chen, Ye Yuan, Chen Gong, Kewei Chen, Zhangyang Wang

Many real-world applications have to tackle the Positive-Unlabeled (PU) learning problem, i. e., learning binary classifiers from a large amount of unlabeled data and a few labeled positive examples.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.