no code implementations • 22 Mar 2024 • Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen
This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.
no code implementations • 4 Mar 2024 • Bowen Gao, Minsi Ren, Yuyan Ni, Yanwen Huang, Bo Qiang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan
In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score.
no code implementations • 23 Feb 2024 • Jiajun Ma, Shuchen Xue, Tianyang Hu, Wenjia Wang, Zhaoqiang Liu, Zhenguo Li, Zhi-Ming Ma, Kenji Kawaguchi
Surprisingly, the improvement persists when we increase the number of sampling steps and can even surpass the best result from EDM-2 (1. 58) with only 39 NFEs (1. 57).
2 code implementations • 9 Dec 2023 • Peiyan Hu, Yue Wang, Zhi-Ming Ma
Based on DMM, to efficiently and accurately model dynamic systems, we develop a moving mesh based neural PDE solver (MM-PDE) that embeds the moving mesh with a two-branch architecture and a learnable interpolation framework to preserve information within the data.
1 code implementation • 24 Nov 2023 • Rui Zhang, Qi Meng, Zhi-Ming Ma
To this end, we propose Physical Invariant Attention Neural Operator (PIANO) to decipher and integrate the physical invariants (PI) for operator learning from the PDE series with various physical mechanisms.
no code implementations • 3 Nov 2023 • Yuyan Ni, Shikun Feng, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan
By aligning with physical principles, SliDe shows a 42\% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods, and thus outperforms traditional baselines on various molecular property prediction tasks.
1 code implementation • NeurIPS 2023 • Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma
Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality.
Ranked #11 on Image Generation on ImageNet 512x512
1 code implementation • 20 Jul 2023 • Shikun Feng, Yuyan Ni, Yanyan Lan, Zhi-Ming Ma, Wei-Ying Ma
Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks.
no code implementations • 16 Jun 2023 • Wei Chen, Weitao Du, Zhi-Ming Ma, Qi Meng
We study a kind of new SDE that was arisen from the research on optimization in machine learning, we call it power-law dynamic because its stationary distribution cannot have sub-Gaussian tail and obeys power-law.
no code implementations • 29 May 2023 • Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen
We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions.
1 code implementation • NeurIPS 2023 • Weitao Du, Yuanqi Du, Limei Wang, Dieqiao Feng, Guifeng Wang, Shuiwang Ji, Carla Gomes, Zhi-Ming Ma
Geometric deep learning enables the encoding of physical symmetries in modeling 3D objects.
1 code implementation • 10 Feb 2023 • Rui Zhang, Qi Meng, Rongchan Zhu, Yue Wang, Wenlei Shi, Shihua Zhang, Zhi-Ming Ma, Tie-Yan Liu
To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
no code implementations • 21 Aug 2022 • Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen
In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.
no code implementations • 14 Jul 2022 • Mingyang Yi, Ruoyu Wang, Jiachen Sun, Zhenguo Li, Zhi-Ming Ma
The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data.
no code implementations • 20 Jun 2022 • Rui Zhang, Peiyan Hu, Qi Meng, Yue Wang, Rongchan Zhu, Bingguang Chen, Zhi-Ming Ma, Tie-Yan Liu
To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation.
1 code implementation • 13 Apr 2022 • Peiyan Hu, Qi Meng, Bingguang Chen, Shiqi Gong, Yue Wang, Wei Chen, Rongchan Zhu, Zhi-Ming Ma, Tie-Yan Liu
Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.
no code implementations • 8 Oct 2021 • Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
The momentum acceleration technique is widely adopted in many optimization algorithms.
2 code implementations • ICLR 2022 • Chongchong Li, Yue Wang, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu
Then we proposed a two-model-based learning method to control the prediction error and the gradient error.
no code implementations • 8 Jun 2021 • Shiqi Gong, Qi Meng, Yue Wang, Lijun Wu, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.
no code implementations • 24 May 2021 • Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.
1 code implementation • ICLR 2021 • Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma
Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).
no code implementations • 8 Jan 2021 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.
1 code implementation • 4 Dec 2020 • Mingyang Yi, Ruoyu Wang, Zhi-Ming Ma
Our bounds underscore that with locally strongly convex population risk, the models trained by any proper iterative algorithm can generalize well, even for non-convex problems, and $d$ is large.
no code implementations • 24 Jun 2020 • Qi Meng, Shiqi Gong, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Specifically, we show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state.
no code implementations • 1 Jun 2020 • Linfang Hou, Liang Pang, Xin Hong, Yanyan Lan, Zhi-Ming Ma, Dawei Yin
Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics.
no code implementations • 18 Oct 2019 • Juanping Zhu, Qi Meng, Wei Chen, Zhi-Ming Ma
Based on basis path set, G-SGD algorithm significantly outperforms conventional SGD algorithm in optimizing neural networks.
no code implementations • 25 Sep 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.
no code implementations • 25 Sep 2019 • Yue Wang, Qi Meng, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu
Optimization algorithms like stochastic gradient descent that optimize the neural networks in the vector space of weights, which are not positively scale-invariant.
no code implementations • 23 Jul 2019 • Li He, Long Xia, Wei Zeng, Zhi-Ming Ma, Yihong Zhao, Dawei Yin
To make full use of such historical data, learning policies from multiple loggers becomes necessary.
no code implementations • ICLR 2019 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?
no code implementations • ICLR 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.
no code implementations • 6 Mar 2019 • Mingyang Yi, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better.
no code implementations • 21 Sep 2018 • Yue Wang, Qi Meng, Wei Cheng, Yuting Liug, Zhi-Ming Ma, Tie-Yan Liu
In this paper, we propose to transfer the Q-function learned in the source task to the target of the Q-learning in the new task when certain safe conditions are satisfied.
no code implementations • NeurIPS 2017 • Yue Wang, Wei Chen, Yu-Ting Liu, Zhi-Ming Ma, Tie-Yan Liu
(2) The convergence rate is determined by the step size, with the mixing time of the Markov process as the coefficient.
no code implementations • 8 May 2018 • Li He, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Then we conduct theoretical analysis on the convergence rates of ASGD algorithm based on the continuous approximation.
no code implementations • 11 Feb 2018 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?
no code implementations • 29 Sep 2017 • Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, Tie-Yan Liu
First, we give a mathematical formulation for the practical data processing procedure in distributed machine learning, which we call data partition with global/local shuffling.
no code implementations • NeurIPS 2016 • Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu
After partitioning the training data onto a number of (e. g., $M$) machines, this algorithm performs both local voting and global voting in each iteration.
no code implementations • ICML 2017 • Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu
We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD.
no code implementations • 27 Sep 2016 • Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG).
no code implementations • 27 Sep 2016 • Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu
The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.
no code implementations • NeurIPS 2010 • Wei Chen, Tie-Yan Liu, Zhi-Ming Ma
sampling of queries and the conditional i. i. d sampling of documents per query.
no code implementations • NeurIPS 2009 • Wei Chen, Tie-Yan Liu, Yanyan Lan, Zhi-Ming Ma, Hang Li
We show that these loss functions are upper bounds of the measure-based ranking errors.