1 code implementation • 19 Dec 2024 • Ziteng Wang, Jianfei Chen, Jun Zhu
Sparsely activated Mixture-of-Experts (MoE) models are widely adopted to scale up model capacity without increasing the computation budget.
1 code implementation • 17 Nov 2024 • Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia Wei, Jun Zhu, Jianfei Chen
Second, we propose a method to smooth $Q$, enhancing the accuracy of INT4 $QK$.
no code implementations • 30 Oct 2024 • Guande He, Kaiwen Zheng, Jianfei Chen, Fan Bao, Jun Zhu
Recently, diffusion denoising bridge models (DDBMs), a new formulation of generative modeling that builds stochastic processes between fixed data endpoints based on a reference diffusion process, have achieved empirical success across tasks with coupled data distribution, such as image-to-image translation.
1 code implementation • 25 Oct 2024 • Haocheng Xi, Han Cai, Ligeng Zhu, Yao Lu, Kurt Keutzer, Jianfei Chen, Song Han
FP8 training has emerged as a promising method for improving training efficiency.
no code implementations • 21 Oct 2024 • Kang Zhao, Tao Yuan, Han Bao, Zhenfeng Su, Chang Gao, Zhaofeng Sun, Zichen Liang, Liping Jing, Jianfei Chen
In this study, we thoroughly investigate the application of V:N:M sparsity in vision models and LLMs across multiple tasks, from pertaining to downstream tasks.
no code implementations • 20 Oct 2024 • Yuji Wang, Zehua Chen, Xiaoyu Chen, Jun Zhu, Jianfei Chen
By formulating I2V synthesis as a frames-to-frames generation task and modelling it with a data-to-data process, we fully exploit the information in input image and facilitate the generative model to learn the image animation process.
no code implementations • 7 Oct 2024 • Bingrui Li, Wei Huang, Andi Han, Zhanpeng Zhou, Taiji Suzuki, Jun Zhu, Jianfei Chen
We also show that Adam behaves similarly to SignGD in terms of both optimization and generalization in this setting.
1 code implementation • 3 Oct 2024 • Jintao Zhang, Jia Wei, Haofeng Huang, Pengle Zhang, Jun Zhu, Jianfei Chen
Although quantization has proven to be an effective method for accelerating model inference, existing quantization methods primarily focus on optimizing the linear layer.
2 code implementations • 13 Sep 2024 • Yuezhou Hu, Jun Zhu, Jianfei Chen
Training deep neural networks (DNNs) is costly.
1 code implementation • 26 Aug 2024 • Chang Gao, Jianfei Chen, Kang Zhao, Jiaqi Wang, Liping Jing
The strategy leverages the heterogeneity of gradients by pruning less informative gradients and enhancing the numerical precision of remaining gradients to mitigate gradient variance.
1 code implementation • 30 Jul 2024 • Weiyu Huang, Yuezhou Hu, Guohao Jian, Jun Zhu, Jianfei Chen
However, these methods often suffer from considerable performance degradation on complex language understanding tasks, raising concerns about the feasibility of pruning in LLMs.
2 code implementations • 24 May 2024 • Kaiwen Zheng, Guande He, Jianfei Chen, Fan Bao, Jun Zhu
In this work, we take the first step in fast sampling of DDBMs without extra training, motivated by the well-established recipes in diffusion models.
no code implementations • 16 Apr 2024 • Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, Jun Zhu
Diffusion models have been extensively used in data generation tasks and are recognized as one of the best generative models.
2 code implementations • 2 Apr 2024 • Yuezhou Hu, Kang Zhao, Weiyu Huang, Jianfei Chen, Jun Zhu
Utilizing this metric, we propose three techniques to preserve accuracy: to modify the sparse-refined straight-through estimator by applying the masked decay term on gradients, to determine a feasible decay factor in warm-up stage, and to enhance the model's quality by a dense fine-tuning procedure near the end of pre-training.
1 code implementation • 19 Mar 2024 • Haocheng Xi, Yuxiang Chen, Kang Zhao, Kai Jun Teh, Jianfei Chen, Jun Zhu
Pretraining transformers are generally time-consuming.
1 code implementation • 27 Feb 2024 • Ziteng Wang, Jianfei Chen, Jun Zhu
On all the tasks, VCAS can preserve the original training loss trajectory and validation accuracy with an up to 73. 87% FLOPs reduction of BP and 49. 58% FLOPs reduction of the whole training process.
no code implementations • 26 Feb 2024 • Tianjiao Luo, Tim Pearce, Huayu Chen, Jianfei Chen, Jun Zhu
Generative Adversarial Imitation Learning (GAIL) trains a generative policy to mimic a demonstrator.
1 code implementation • NeurIPS 2023 • Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
In this work, we propose a novel formulation towards the optimal parameterization during sampling that minimizes the first-order discretization error of the ODE solution.
no code implementations • 18 Oct 2023 • Guande He, Peng Cui, Jianfei Chen, WenBo Hu, Jun Zhu
Despite the significant progress made in practical applications of aligned language models (LMs), they tend to be overconfident in output answers compared to the corresponding pre-trained LMs.
1 code implementation • NeurIPS 2023 • Haocheng Xi, Changhao Li, Jianfei Chen, Jun Zhu
To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them.
no code implementations • 18 Jun 2023 • Tianjiao Luo, Ziyu Zhu, Jianfei Chen, Jun Zhu
We theoretically prove that the training process of DiracGANs-BMC is globally exponential stable and derive bounds on the rate of convergence.
1 code implementation • 30 May 2023 • Guande He, Jianfei Chen, Jun Zhu
In light of these observations, we evaluate the calibration of several methods that preserve pre-trained features and show that preserving pre-trained features can improve the calibration of fine-tuned language models.
1 code implementation • 6 May 2023 • Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
The probability flow ordinary differential equation (ODE) of diffusion models (i. e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation.
Ranked #2 on Density Estimation on ImageNet 32x32
3 code implementations • 25 Apr 2023 • Cheng Lu, Huayu Chen, Jianfei Chen, Hang Su, Chongxuan Li, Jun Zhu
The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate.
3 code implementations • 2 Nov 2022 • Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
The commonly-used fast sampler for guided sampling is DDIM, a first-order diffusion ODE solver that generally needs 100 to 250 steps for high-quality samples.
1 code implementation • 22 Jun 2022 • Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung
Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint.
1 code implementation • 17 Jun 2022 • Siyu Wang, Jianfei Chen, Chongxuan Li, Jun Zhu, Bo Zhang
In this work, we propose Integer-only Discrete Flows (IODF), an efficient neural compressor with integer-only arithmetic.
1 code implementation • 16 Jun 2022 • Cheng Lu, Kaiwen Zheng, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
To fill up this gap, we show that the negative likelihood of the ODE can be bounded by controlling the first, second, and third-order score matching errors; and we further present a novel high-order denoising score matching method to enable maximum likelihood training of score-based diffusion ODEs.
2 code implementations • 2 Jun 2022 • Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu
In this work, we propose an exact formulation of the solution of diffusion ODEs.
no code implementations • 30 Apr 2022 • Zhijie Deng, Feng Zhou, Jianfei Chen, Guoqiang Wu, Jun Zhu
In this way, we relate DE to Bayesian inference to enjoy reliable Bayesian uncertainty.
1 code implementation • 14 Mar 2022 • Ning Ding, Yujia Qin, Guang Yang, Fuchao Wei, Zonghan Yang, Yusheng Su, Shengding Hu, Yulin Chen, Chi-Min Chan, Weize Chen, Jing Yi, Weilin Zhao, Xiaozhi Wang, Zhiyuan Liu, Hai-Tao Zheng, Jianfei Chen, Yang Liu, Jie Tang, Juanzi Li, Maosong Sun
This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, dubbed as delta tuning in this paper.
no code implementations • 29 Sep 2021 • Zhijie Deng, Feng Zhou, Jianfei Chen, Guoqiang Wu, Jun Zhu
Deep Ensemble (DE) is a flexible, feasible, and effective alternative to Bayesian neural networks (BNNs) for uncertainty estimation in deep learning.
4 code implementations • 29 Apr 2021 • Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael W. Mahoney, Joseph E. Gonzalez
On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss.
1 code implementation • ICLR 2021 • Cheng Lu, Jianfei Chen, Chongxuan Li, Qiuhao Wang, Jun Zhu
Through theoretical analysis, we show that the function space of ImpFlow is strictly richer than that of ResFlows.
no code implementations • 21 Nov 2020 • Tianchen Zhao, Xuefei Ning, Xiangsheng Shi, Songyi Yang, Shuang Liang, Peng Lei, Jianfei Chen, Huazhong Yang, Yu Wang
We also design the micro-level search space to strengthen the information flow for BNN.
2 code implementations • NeurIPS 2020 • Jianfei Chen, Yu Gai, Zhewei Yao, Michael W. Mahoney, Joseph E. Gonzalez
We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance.
Ranked #9 on Semantic Textual Similarity on STS Benchmark
no code implementations • 19 Jun 2020 • Mong H. Ng, Kaahan Radia, Jianfei Chen, Dequan Wang, Ionel Gog, Joseph E. Gonzalez
Bird's-eye-view (BEV) is a powerful and widely adopted representation for road scenes that captures surrounding objects and their spatial locations, along with overall context in the scene.
1 code implementation • ICML 2020 • Jianfei Chen, Cheng Lu, Biqi Chenli, Jun Zhu, Tian Tian
Generative flows are promising tractable models for density modeling that define probabilistic distributions with invertible transformations.
1 code implementation • NeurIPS 2018 • Jianfei Chen, Jun Zhu, Yee Whye Teh, Tong Zhang
However, sEM has a slower asymptotic convergence rate than batch EM, and requires a decreasing sequence of step sizes, which is difficult to tune.
no code implementations • 10 Apr 2018 • Zihao Xiao, Jianfei Chen, Jun Zhu
We also propose an extension to train pLSI and a method to prune the network to obey the limited fan-in of some NMSs.
no code implementations • ICLR 2018 • Jianfei Chen, Jun Zhu
Previous attempts on reducing the receptive field size by subsampling neighbors do not have any convergence guarantee, and their receptive field size per node is still in the order of hundreds.
no code implementations • NeurIPS 2017 • Jianfei Chen, Chongxuan Li, Yizhong Ru, Jun Zhu
In this paper, we propose population matching discrepancy (PMD) for estimating the distribution distance based on samples, as well as an algorithm to learn the parameters of the distributions using PMD as an objective.
2 code implementations • ICML 2018 • Jianfei Chen, Jun Zhu, Le Song
Previous attempts on reducing the receptive field size by subsampling neighbors do not have a convergence guarantee, and their receptive field size per node is still in the order of hundreds.
1 code implementation • 18 Sep 2017 • Jiaxin Shi, Jianfei Chen, Jun Zhu, Shengyang Sun, Yucen Luo, Yihong Gu, Yuhao Zhou
In this paper we introduce ZhuSuan, a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning.
no code implementations • 23 Feb 2017 • Jianfei Chen, Jun Zhu, Jie Lu, Shixia Liu
Finally, we propose an efficient distributed implementation of PCGS through vectorization, pre-processing, and a careful design of the concurrent data structures and communication strategy.
no code implementations • 8 Oct 2016 • Kaiwei Li, Jianfei Chen, WenGuang Chen, Jun Zhu
Latent Dirichlet Allocation (LDA) is a popular tool for analyzing discrete count data such as text and images.
1 code implementation • 19 Feb 2016 • Arnab Bhadury, Jianfei Chen, Jun Zhu, Shixia Liu
Dynamic topic models (DTMs) are very effective in discovering topics and capturing their evolution trends in time series data.
no code implementations • 6 Jan 2016 • Yang Gao, Jianfei Chen, Jun Zhu
Streaming variational Bayes (SVB) is successful in learning LDA models in an online manner.
no code implementations • 29 Oct 2015 • Jianfei Chen, Kaiwei Li, Jun Zhu, WenGuang Chen
We then develop WarpLDA, an LDA sampler which achieves both the best O(1) time complexity per token and the best O(K) scope of random access.
no code implementations • 10 Aug 2015 • Ning Chen, Jun Zhu, Jianfei Chen, Ting Chen
Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs.
no code implementations • 24 Nov 2014 • Jun Zhu, Jianfei Chen, Wen-Bo Hu, Bo Zhang
Explosive growth in data and availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems, and applications with Big Data.
no code implementations • 16 Apr 2014 • Ning Chen, Jun Zhu, Jianfei Chen, Bo Zhang
To deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques.
no code implementations • NeurIPS 2013 • Jianfei Chen, Jun Zhu, Zi Wang, Xun Zheng, Bo Zhang
Logistic-normal topic models can effectively discover correlation structures among latent topics.