no code implementations • 2 Dec 2024 • Cong Xie, Han Zou, Ruiqi Yu, Yan Zhang, Zhenpeng Zhan
In this work, we are interested in achieving both high text controllability and overall appearance consistency in the generation of personalized human characters.
no code implementations • 26 Nov 2024 • Shuhua Yu, Ding Zhou, Cong Xie, An Xu, Zhi Zhang, Xin Liu, Soummya Kar
Pre-training Transformer models is resource-intensive, and recent studies have shown that sign momentum is an efficient technique for training large-scale deep learning models, particularly Transformers.
no code implementations • 20 Oct 2024 • Jinda Jia, Cong Xie, Hanlin Lu, Daoce Wang, Hao Feng, Chengming Zhang, Baixi Sun, Haibin Lin, Zhi Zhang, Xin Liu, Dingwen Tao
Recent years have witnessed a clear trend towards language models with an ever-increasing number of parameters, as well as the growing training overhead and memory usage.
no code implementations • 15 Oct 2024 • Yanyue Xie, Zhi Zhang, Ding Zhou, Cong Xie, Ziang Song, Xin Liu, Yanzhi Wang, Xue Lin, An Xu
Experimental results demonstrate that the Mixtral-8x7B model with 50% sparsity maintains 99% of the performance of the original model after the expert-wise knowledge distillation.
1 code implementation • 23 Feb 2024 • Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao, Liang Xiang, Zherui Liu, Zhe Li, Xiaoying Jia, Jianxi Ye, Xin Jin, Xin Liu
Training LLMs at this scale brings unprecedented challenges to training efficiency and stability.
no code implementations • 28 Jan 2024 • Jianxiang Lu, Cong Xie, Hui Guo
Our proposed method aims to address the challenges of generalizability and fidelity in an object-driven way, using only a single input image and the object-specific regions of interest.
1 code implementation • 12 Oct 2023 • Yite Wang, Jiahao Su, Hanlin Lu, Cong Xie, Tianyi Liu, Jianbo Yuan, Haibin Lin, Ruoyu Sun, Hongxia Yang
Our empirical results demonstrate that LEMON reduces computational costs by 56. 7% for Vision Transformers and 33. 2% for BERT when compared to training from scratch.
no code implementations • 20 Jan 2023 • Beomyeol Jeon, Linda Cai, Chirag Shetty, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta
While these result in model placements that train fast on data (i. e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices.
1 code implementation • Algorithms 2022 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
Distributed machine learning is primarily motivated by the promise of increased computation power for accelerating training and mitigating privacy concerns.
no code implementations • 23 Apr 2022 • Cong Xie, Hualuo Liu, Shilei Cao, Dong Wei, Kai Ma, Liansheng Wang, Yefeng Zheng
A cosine similarity based attention module is proposed to fuse the information from both encoders, to utilize both types of prior information encoded by the template-encoder and model the inter-subject similarity for each foreground class.
no code implementations • 19 Jul 2021 • Cong Xie, Shilei Cao, Dong Wei, HongYu Zhou, Kai Ma, Xianli Zhang, Buyue Qian, Liansheng Wang, Yefeng Zheng
Universal lesion detection in computed tomography (CT) images is an important yet challenging task due to the large variations in lesion type, size, shape, and appearance.
1 code implementation • 17 May 2021 • Yuchen Zhong, Cong Xie, Shuai Zheng, Haibin Lin
Recently, there has been a growing interest in using gradient compression to reduce the communication overhead of the distributed training.
no code implementations • 28 Sep 2020 • Anjul Tyagi, Cong Xie, Klaus Mueller
To deal with the problem, we formulate the task of neural network architecture optimization as a graph space exploration, based on the one-shot architecture search technique.
no code implementations • 30 Jul 2020 • Jiazhi Xia, Tianxiang Chen, Lei Zhang, Wei Chen, Yang Chen, Xiaolong Zhang, Cong Xie, Tobias Schreck
We build a prototype system based on our method, SMAP, to support the organization, computation, and exploration of secure joint embedding.
no code implementations • NeurIPS 2020 • Cong Xie, Shuai Zheng, Oluwasanmi Koyejo, Indranil Gupta, Mu Li, Haibin Lin
The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks.
1 code implementation • 20 Nov 2019 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta, Haibin Lin
When scaling distributed training, the communication overhead is often the bottleneck.
1 code implementation • ICML 2020 • Cong Xie, Sanmi Koyejo, Indranil Gupta
We propose Zeno++, a new robust asynchronous Stochastic Gradient Descent~(SGD) procedure which tolerates Byzantine failures of the workers.
no code implementations • 16 Mar 2019 • Cong Xie, Sanmi Koyejo, Indranil Gupta
We consider distributed on-device learning with limited communication and security requirements.
1 code implementation • 10 Mar 2019 • Cong Xie, Sanmi Koyejo, Indranil Gupta
Federated learning enables training on a massive number of edge devices.
4 code implementations • 10 Mar 2019 • Cong Xie, Sanmi Koyejo, Indranil Gupta
Recently, new defense techniques have been developed to tolerate Byzantine failures for distributed machine learning.
1 code implementation • 25 May 2018 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
We present Zeno, a technique to make distributed machine learning, particularly Stochastic Gradient Descent (SGD), tolerant to an arbitrary number of faulty workers.
no code implementations • 23 May 2018 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
We propose a novel robust aggregation rule for distributed synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model.
no code implementations • 27 Feb 2018 • Cong Xie, Oluwasanmi Koyejo, Indranil Gupta
We propose three new robust aggregation rules for distributed synchronous Stochastic Gradient Descent~(SGD) under a general Byzantine failure model.
1 code implementation • 14 Feb 2018 • Cong Xie
In this manuscript, we briefly introduce several tricks to climb the leaderboards which use RMSE for evaluation without exploiting any training data.
no code implementations • ICLR 2018 • Cong Xie, Oluwasanmi O. Koyejo, Indranil Gupta
Distributed training of deep learning is widely conducted with large neural networks and large datasets.
no code implementations • 18 Nov 2015 • Wuxuan Jiang, Cong Xie, Zhihua Zhang
We propose a new input perturbation mechanism for publishing a covariance matrix to achieve $(\epsilon, 0)$-differential privacy.
no code implementations • 9 Nov 2015 • Cong Xie, Wu-Jun Li, Zhihua Zhang
Normalized graph cut (NGC) has become a popular research topic due to its wide applications in a large variety of areas like machine learning and very large scale integration (VLSI) circuit design.
no code implementations • 8 Sep 2015 • Shenjian Zhao, Cong Xie, Zhihua Zhang
In many learning tasks, structural models usually lead to better interpretability and higher generalization performance.
no code implementations • NeurIPS 2014 • Cong Xie, Ling Yan, Wu-Jun Li, Zhihua Zhang
We theoretically prove that DBH can achieve lower communication cost than existing methods and can simultaneously guarantee good workload balance.