Search Results for author: Zhi-Ming Ma

Found 43 papers, 10 papers with code

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations • 22 Mar 2024 • Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Paper
Add Code

Rethinking Specificity in SBDD: Leveraging Delta Score and Energy-Guided Diffusion

no code implementations • 4 Mar 2024 • Bowen Gao, Minsi Ren, Yuyan Ni, Yanwen Huang, Bo Qiang, Zhi-Ming Ma, Wei-Ying Ma, Yanyan Lan

In the field of Structure-based Drug Design (SBDD), deep learning-based generative models have achieved outstanding performance in terms of docking score.

Contrastive Learning Specificity

Paper
Add Code

The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling

no code implementations • 23 Feb 2024 • Jiajun Ma, Shuchen Xue, Tianyang Hu, Wenjia Wang, Zhaoqiang Liu, Zhenguo Li, Zhi-Ming Ma, Kenji Kawaguchi

Surprisingly, the improvement persists when we increase the number of sampling steps and can even surpass the best result from EDM-2 (1. 58) with only 39 NFEs (1. 57).

Image Generation

Paper
Add Code

Better Neural PDE Solvers Through Data-Free Mesh Movers

2 code implementations • 9 Dec 2023 • Peiyan Hu, Yue Wang, Zhi-Ming Ma

Based on DMM, to efficiently and accurately model dynamic systems, we develop a moving mesh based neural PDE solver (MM-PDE) that embeds the moving mesh with a two-branch architecture and a learnable interpolation framework to preserve information within the data.

Paper
Code

Deciphering and integrating invariants for neural operator learning with various physical mechanisms

1 code implementation • 24 Nov 2023 • Rui Zhang, Qi Meng, Zhi-Ming Ma

To this end, we propose Physical Invariant Attention Neural Operator (PIANO) to decipher and integrate the physical invariants (PI) for operator learning from the PDE series with various physical mechanisms.

Operator learning Self-Supervised Learning

Paper
Code

Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

no code implementations • 3 Nov 2023 • Yuyan Ni, Shikun Feng, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan

By aligning with physical principles, SliDe shows a 42\% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods, and thus outperforms traditional baselines on various molecular property prediction tasks.

Denoising Drug Discovery +2

Paper
Add Code

SA-Solver: Stochastic Adams Solver for Fast Sampling of Diffusion Models

1 code implementation • NeurIPS 2023 • Shuchen Xue, Mingyang Yi, Weijian Luo, Shifeng Zhang, Jiacheng Sun, Zhenguo Li, Zhi-Ming Ma

Based on our analysis, we propose SA-Solver, which is an improved efficient stochastic Adams method for solving diffusion SDE to generate data with high quality.

Ranked #11 on Image Generation on ImageNet 512x512

Image Generation

Paper
Code

Fractional Denoising for 3D Molecular Pre-training

1 code implementation • 20 Jul 2023 • Shikun Feng, Yuyan Ni, Yanyan Lan, Zhi-Ming Ma, Wei-Ying Ma

Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks.

Denoising Drug Discovery +1

Paper
Code

Power-law Dynamic arising from machine learning

no code implementations • 16 Jun 2023 • Wei Chen, Weitao Du, Zhi-Ming Ma, Qi Meng

We study a kind of new SDE that was arisen from the research on optimization in machine learning, we call it power-law dynamic because its stationary distribution cannot have sub-Gaussian tail and obeys power-law.

Paper
Add Code

Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions

no code implementations • 29 May 2023 • Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen

We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions.

Paper
Add Code

A new perspective on building efficient and expressive 3D equivariant graph neural networks

1 code implementation • NeurIPS 2023 • Weitao Du, Yuanqi Du, Limei Wang, Dieqiao Feng, Guifeng Wang, Shuiwang Ji, Carla Gomes, Zhi-Ming Ma

Geometric deep learning enables the encoding of physical symmetries in modeling 3D objects.

Molecular Property Prediction Property Prediction

Paper
Code

Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation

1 code implementation • 10 Feb 2023 • Rui Zhang, Qi Meng, Rongchan Zhu, Yue Wang, Wenlei Shi, Shihua Zhang, Zhi-Ming Ma, Tie-Yan Liu

To address these limitations, we propose the Monte Carlo Neural PDE Solver (MCNP Solver) for training unsupervised neural solvers via the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.

Paper
Code

Provable Adaptivity in Adam

no code implementations • 21 Aug 2022 • Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen

In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.

Attribute

Paper
Add Code

Breaking Correlation Shift via Conditional Invariant Regularizer

no code implementations • 14 Jul 2022 • Mingyang Yi, Ruoyu Wang, Jiachen Sun, Zhenguo Li, Zhi-Ming Ma

The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data.

Paper
Add Code

Deep Random Vortex Method for Simulation and Inference of Navier-Stokes Equations

no code implementations • 20 Jun 2022 • Rui Zhang, Peiyan Hu, Qi Meng, Yue Wang, Rongchan Zhu, Bingguang Chen, Zhi-Ming Ma, Tie-Yan Liu

To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation.

Paper
Add Code

Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs

1 code implementation • 13 Apr 2022 • Peiyan Hu, Qi Meng, Bingguang Chen, Shiqi Gong, Yue Wang, Wei Chen, Rongchan Zhu, Zhi-Ming Ma, Tie-Yan Liu

Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics.

Paper
Code

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations • 8 Oct 2021 • Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Paper
Add Code

Gradient Information Matters in Policy Optimization by Back-propagating through Model

2 code implementations • ICLR 2022 • Chongchong Li, Yue Wang, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu

Then we proposed a two-model-based learning method to control the prediction error and the gradient error.

Continuous Control Model-based Reinforcement Learning

2,513

Paper
Code

Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics

no code implementations • 8 Jun 2021 • Shiqi Gong, Qi Meng, Yue Wang, Lijun Wu, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

In this paper, to reduce the reliance on the numerical solver, we propose to enhance the supervised signal in the training of NODE.

Paper
Add Code

Improved OOD Generalization via Adversarial Training and Pre-training

no code implementations • 24 May 2021 • Mingyang Yi, Lu Hou, Jiacheng Sun, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

In this paper, after defining OOD generalization via Wasserstein distance, we theoretically show that a model robust to input perturbation generalizes well on OOD data.

Image Classification Natural Language Understanding

Paper
Add Code

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

1 code implementation • ICLR 2021 • Mingyang Yi, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Zhi-Ming Ma

Inspired by adversarial training, we minimize this maximal expected loss (MMEL) and obtain a simple and interpretable closed-form solution: more attention should be paid to augmented samples with large loss values (i. e., harder examples).

Image Augmentation Image Classification +1

Paper
Code

BN-invariant sharpness regularizes the training model to better generalization

no code implementations • 8 Jan 2021 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

However, it has been pointed out that the usual definitions of sharpness, which consider either the maxima or the integral of loss over a $\delta$ ball of parameters around minima, cannot give consistent measurement for scale invariant neural networks, e. g., networks with batch normalization layer.

Paper
Add Code

Characterization of Excess Risk for Locally Strongly Convex Population Risk

1 code implementation • 4 Dec 2020 • Mingyang Yi, Ruoyu Wang, Zhi-Ming Ma

Our bounds underscore that with locally strongly convex population risk, the models trained by any proper iterative algorithm can generalize well, even for non-convex problems, and $d$ is large.

5,742

Paper
Code

Dynamic of Stochastic Gradient Descent with State-Dependent Noise

no code implementations • 24 Jun 2020 • Qi Meng, Shiqi Gong, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Specifically, we show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state.

Paper
Add Code

Robust Reinforcement Learning with Wasserstein Constraint

no code implementations • 1 Jun 2020 • Linfang Hou, Liang Pang, Xin Hong, Yanyan Lan, Zhi-Ming Ma, Dawei Yin

Robust Reinforcement Learning aims to find the optimal policy with some extent of robustness to environmental dynamics.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Interpreting Basis Path Set in Neural Networks

no code implementations • 18 Oct 2019 • Juanping Zhu, Qi Meng, Wei Chen, Zhi-Ming Ma

Based on basis path set, G-SGD algorithm significantly outperforms conventional SGD algorithm in optimizing neural networks.

Paper
Add Code

THE EFFECT OF ADVERSARIAL TRAINING: A THEORETICAL CHARACTERIZATION

no code implementations • 25 Sep 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

It has widely shown that adversarial training (Madry et al., 2018) is effective in defending adversarial attack empirically.

Adversarial Attack

Paper
Add Code

Path Space for Recurrent Neural Networks with ReLU Activations

no code implementations • 25 Sep 2019 • Yue Wang, Qi Meng, Wei Chen, YuTing Liu, Zhi-Ming Ma, Tie-Yan Liu

Optimization algorithms like stochastic gradient descent that optimize the neural networks in the vector space of weights, which are not positively scale-invariant.

Paper
Add Code

Off-policy Learning for Multiple Loggers

no code implementations • 23 Jul 2019 • Li He, Long Xia, Wei Zeng, Zhi-Ming Ma, Yihong Zhao, Dawei Yin

To make full use of such historical data, learning policies from multiple loggers becomes necessary.

counterfactual

Paper
Add Code

G-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • ICLR 2019 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then, a natural question is: \emph{can we construct a new vector space that is positively scale-invariant and sufficient to represent ReLU neural networks so as to better facilitate the optimization process }?

Paper
Add Code

Optimization on Multiple Manifolds

no code implementations • ICLR 2019 • Mingyang Yi, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Optimization on manifold has been widely used in machine learning, to handle optimization problems with constraint.

Paper
Add Code

Positively Scale-Invariant Flatness of ReLU Neural Networks

no code implementations • 6 Mar 2019 • Mingyang Yi, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

That is to say, the minimum with balanced values of basis paths will more likely to be flatter and generalize better.

Paper
Add Code

Target Transfer Q-Learning and Its Convergence Analysis

no code implementations • 21 Sep 2018 • Yue Wang, Qi Meng, Wei Cheng, Yuting Liug, Zhi-Ming Ma, Tie-Yan Liu

In this paper, we propose to transfer the Q-function learned in the source task to the target of the Q-learning in the new task when certain safe conditions are satisfied.

Q-Learning Reinforcement Learning (RL) +1

Paper
Add Code

Finite Sample Analysis of the GTD Policy Evaluation Algorithms in Markov Setting

no code implementations • NeurIPS 2017 • Yue Wang, Wei Chen, Yu-Ting Liu, Zhi-Ming Ma, Tie-Yan Liu

(2) The convergence rate is determined by the step size, with the mixing time of the Markov process as the coefficient.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Differential Equations for Modeling Asynchronous Algorithms

no code implementations • 8 May 2018 • Li He, Qi Meng, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Then we conduct theoretical analysis on the convergence rates of ASGD algorithm based on the continuous approximation.

Paper
Add Code

$\mathcal{G}$-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space

no code implementations • 11 Feb 2018 • Qi Meng, Shuxin Zheng, Huishuai Zhang, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

Paper
Add Code

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

no code implementations • 29 Sep 2017 • Qi Meng, Wei Chen, Yue Wang, Zhi-Ming Ma, Tie-Yan Liu

First, we give a mathematical formulation for the practical data processing procedure in distributed machine learning, which we call data partition with global/local shuffling.

BIG-bench Machine Learning

Paper
Add Code

A Communication-Efficient Parallel Algorithm for Decision Tree

no code implementations • NeurIPS 2016 • Qi Meng, Guolin Ke, Taifeng Wang, Wei Chen, Qiwei Ye, Zhi-Ming Ma, Tie-Yan Liu

After partitioning the training data onto a number of (e. g., $M$) machines, this algorithm performs both local voting and global voting in each iteration.

2k Attribute

Paper
Add Code

Asynchronous Stochastic Gradient Descent with Delay Compensation

no code implementations • ICML 2017 • Shuxin Zheng, Qi Meng, Taifeng Wang, Wei Chen, Nenghai Yu, Zhi-Ming Ma, Tie-Yan Liu

We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD.

Paper
Add Code

Generalization Error Bounds for Optimization Algorithms via Stability

no code implementations • 27 Sep 2016 • Qi Meng, Yue Wang, Wei Chen, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), and solved by optimization algorithms such as gradient descent (GD), stochastic gradient descent (SGD), and stochastic variance reduction (SVRG).

BIG-bench Machine Learning

Paper
Add Code

Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction

no code implementations • 27 Sep 2016 • Qi Meng, Wei Chen, Jingcheng Yu, Taifeng Wang, Zhi-Ming Ma, Tie-Yan Liu

The results verified our theoretical findings and demonstrated the practical efficiency of the asynchronous stochastic proximal algorithms with variance reduction.

Paper
Add Code

Two-Layer Generalization Analysis for Ranking Using Rademacher Average

no code implementations • NeurIPS 2010 • Wei Chen, Tie-Yan Liu, Zhi-Ming Ma

sampling of queries and the conditional i. i. d sampling of documents per query.

Generalization Bounds Information Retrieval +3

Paper
Add Code

Ranking Measures and Loss Functions in Learning to Rank

no code implementations • NeurIPS 2009 • Wei Chen, Tie-Yan Liu, Yanyan Lan, Zhi-Ming Ma, Hang Li

We show that these loss functions are upper bounds of the measure-based ranking errors.

General Classification Learning-To-Rank

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.