Search Results for author: Ruosi Wan

Found 6 papers, 2 papers with code

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay

no code implementations NeurIPS 2021 Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun

Specifically, 1) we introduce the assumptions that can lead to equilibrium state in SMD, and prove equilibrium can be reached in a linear rate regime under given assumptions; 2) we propose ``angular update" as a substitute for effective learning rate to depict the state of SMD, and derive the theoretical value of angular update in equilibrium state; 3) we verify our assumptions and theoretical results on various large-scale computer vision tasks including ImageNet and MSCOCO with standard settings.

Spherical Motion Dynamics: Learning Dynamics of Neural Network with Normalization, Weight Decay, and SGD

no code implementations15 Jun 2020 Ruosi Wan, Zhanxing Zhu, Xiangyu Zhang, Jian Sun

In this work, we comprehensively reveal the learning dynamics of neural network with normalization, weight decay (WD), and SGD (with momentum), named as Spherical Motion Dynamics (SMD).

Angle-based Search Space Shrinking for Neural Architecture Search

1 code implementation ECCV 2020 Yiming Hu, Yuding Liang, Zichao Guo, Ruosi Wan, Xiangyu Zhang, Yichen Wei, Qingyi Gu, Jian Sun

Comprehensive experiments show that ABS can dramatically enhance existing NAS approaches by providing a promising shrunk search space.

Neural Architecture Search

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization

1 code implementation ICLR 2020 Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei zhang, Yichen Wei, Jian Sun

Therefore many modified normalization techniques have been proposed, which either fail to restore the performance of BN completely, or have to introduce additional nonlinear operations in inference procedure and increase huge consumption.

Towards Making Deep Transfer Learning Never Hurt

no code implementations18 Nov 2019 Ruosi Wan, Haoyi Xiong, Xingjian Li, Zhanxing Zhu, Jun Huan

The empirical results show that the proposed descent direction estimation strategy DTNH can always improve the performance of deep transfer learning tasks based on all above regularizers, even when transferring pre-trained weights from inappropriate networks.

Knowledge Distillation Transfer Learning

Neural Control Variates for Variance Reduction

no code implementations1 Jun 2018 Ruosi Wan, Mingjun Zhong, Haoyi Xiong, Zhanxing Zhu

In statistics and machine learning, approximation of an intractable integration is often achieved by using the unbiased Monte Carlo estimator, but the variances of the estimation are generally high in many applications.

Cannot find the paper you are looking for? You can Submit a new open access paper.