no code implementations • NeurIPS 2021 • Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun
Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime under given assumptions; 2) we propose ``angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings.
no code implementations • 15 Jun 2020 • Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun
In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD).
1 code implementation • ECCV 2020 • Yiming Hu, Yuding Liang, Zichao Guo, Ruosi Wan, Xiangyu Zhang, Yichen Wei, Qingyi Gu, Jian Sun
Comprehensive experiments show that ABS can dramatically enhance existing NAS approaches by providing a promising shrunk search space.
1 code implementation • ICLR 2020 • Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei zhang, Yichen Wei, Jian Sun
Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption.
no code implementations • 18 Nov 2019 • Ruosi Wan, Haoyi Xiong, Xingjian Li, Zhanxing Zhu, Jun Huan
The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks.
no code implementations • 1 Jun 2018 • Ruosi Wan, Mingjun Zhong, Haoyi Xiong, Zhanxing Zhu
In statistics and machine learning, approximation of an intractable integration is often achieved by using the unbiased Monte Carlo estimator, but the variances of the estimation are generally high in many applications.