1 code implementation • 22 May 2023 • Long Yang, Zhixiong Huang, Fenghao Lei, Yucun Zhong, Yiming Yang, Cong Fang, Shiting Wen, Binbin Zhou, Zhouchen Lin
Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration.
no code implementations • 6 Mar 2023 • Jianqing Fan, Cong Fang, Yihong Gu, Tong Zhang
The joint distribution of the response variable and covariate may vary across different environments, yet the conditional expectation of $y$ given the unknown set of important variables are invariant across environments.
no code implementations • 2 Mar 2023 • Shihong Ding, Hanze Dong, Cong Fang, Zhouchen Lin, Tong Zhang
We consider the general nonconvex nonconcave minimax problem over continuous variables.
no code implementations • 26 Mar 2022 • Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han, Zhenghao Liu, Ning Ding, Yongming Rao, Yizhao Gao, Liang Zhang, Ming Ding, Cong Fang, Yisen Wang, Mingsheng Long, Jing Zhang, Yinpeng Dong, Tianyu Pang, Peng Cui, Lingxiao Huang, Zheng Liang, HuaWei Shen, HUI ZHANG, Quanshi Zhang, Qingxiu Dong, Zhixing Tan, Mingxuan Wang, Shuo Wang, Long Zhou, Haoran Li, Junwei Bao, Yingwei Pan, Weinan Zhang, Zhou Yu, Rui Yan, Chence Shi, Minghao Xu, Zuobai Zhang, Guoqiang Wang, Xiang Pan, Mengjie Li, Xiaoyu Chu, Zijun Yao, Fangwei Zhu, Shulin Cao, Weicheng Xue, Zixuan Ma, Zhengyan Zhang, Shengding Hu, Yujia Qin, Chaojun Xiao, Zheni Zeng, Ganqu Cui, Weize Chen, Weilin Zhao, Yuan YAO, Peng Li, Wenzhao Zheng, Wenliang Zhao, Ziyi Wang, Borui Zhang, Nanyi Fei, Anwen Hu, Zenan Ling, Haoyang Li, Boxi Cao, Xianpei Han, Weidong Zhan, Baobao Chang, Hao Sun, Jiawen Deng, Chujie Zheng, Juanzi Li, Lei Hou, Xigang Cao, Jidong Zhai, Zhiyuan Liu, Maosong Sun, Jiwen Lu, Zhiwu Lu, Qin Jin, Ruihua Song, Ji-Rong Wen, Zhouchen Lin, LiWei Wang, Hang Su, Jun Zhu, Zhifang Sui, Jiajun Zhang, Yang Liu, Xiaodong He, Minlie Huang, Jian Tang, Jie Tang
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.
1 code implementation • 29 Jan 2021 • Cong Fang, Hangfeng He, Qi Long, Weijie J. Su
More importantly, when moving to the imbalanced case, our analysis of the Layer-Peeled Model reveals a hitherto unknown phenomenon that we term \textit{Minority Collapse}, which fundamentally limits the performance of deep learning models on the minority classes.
1 code implementation • 27 Dec 2020 • Cong Fang, Hanze Dong, Tong Zhang
Deep learning has received considerable empirical successes in recent years.
1 code implementation • NeurIPS 2020 • Yihong Gu, Weizhong Zhang, Cong Fang, Jason D. Lee, Tong Zhang
With the help of a new technique called {\it neural network grafting}, we demonstrate that even during the entire training process, feature distributions of differently initialized networks remain similar at each layer.
1 code implementation • NeurIPS 2020 • Bohang Zhang, Jikai Jin, Cong Fang, LiWei Wang
Gradient clipping is commonly used in training deep neural networks partly due to its practicability in relieving the exploding gradient problem.
no code implementations • 3 Jul 2020 • Cong Fang, Jason D. Lee, Pengkun Yang, Tong Zhang
This new representation overcomes the degenerate situation where all the hidden units essentially have only one meaningful hidden unit in each middle layer, and further leads to a simpler representation of DNNs, for which the training objective can be reformulated as a convex optimization problem via suitable re-parameterization.
no code implementations • 18 Nov 2019 • Cong Fang, Yihong Gu, Weizhong Zhang, Tong Zhang
This new analysis is consistent with empirical observations that deep neural networks are capable of learning efficient feature representations.
no code implementations • 25 Oct 2019 • Cong Fang, Hanze Dong, Tong Zhang
Recently, over-parameterized neural networks have been extensively analyzed in the literature.
no code implementations • ICLR 2020 • Zebang Shen, Pan Zhou, Cong Fang, Alejandro Ribeiro
We target the problem of finding a local minimum in non-convex finite-sum minimization.
no code implementations • 1 Feb 2019 • Cong Fang, Zhouchen Lin, Tong Zhang
In this paper, we give a sharp analysis for Stochastic Gradient Descent (SGD) and prove that SGD is able to efficiently escape from saddle points and find an $(\epsilon, O(\epsilon^{0. 5}))$-approximate second-order stationary point in $\tilde{O}(\epsilon^{-3. 5})$ stochastic gradient computations for generic nonconvex optimization problems, when the objective function satisfies gradient-Lipschitz, Hessian-Lipschitz, and dispersive noise assumptions.
no code implementations • 29 Dec 2018 • Haishan Ye, Zhichao Huang, Cong Fang, Chris Junchi Li, Tong Zhang
Zeroth-order optimization is an important research topic in machine learning.
no code implementations • NeurIPS 2018 • Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang
Specially, we prove that the SPIDER-SFO algorithm achieves a gradient computation cost of $\mathcal{O}\left( \min( n^{1/2} \epsilon^{-2}, \epsilon^{-3} ) \right)$ to find an $\epsilon$-approximate first-order stationary point.
no code implementations • 5 Nov 2018 • Jia Li, Cong Fang, Zhouchen Lin
LPOM is block multi-convex in all layer-wise weights and activations.
no code implementations • NeurIPS 2018 • Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang
For stochastic first-order method, combining SPIDER with normalized gradient descent, we propose two new algorithms, namely SPIDER-SFO and SPIDER-SFO\textsuperscript{+}, that solve non-convex stochastic optimization problems using stochastic gradients only.
no code implementations • 27 Feb 2018 • Cong Fang, Yameng Huang, Zhouchen Lin
$O(1/\epsilon)$) convergence rate for non-strongly convex functions, and $O(\sqrt{\kappa}\log(1/\epsilon))$ (v. s.