Search Results for author: Bohan Wang

Found 26 papers, 5 papers with code

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

no code implementations22 Mar 2024 Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.

Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space

no code implementations24 Jan 2024 Mingyang Yi, Bohan Wang

In this paper, we aim to enrich the continuous optimization methods in the Wasserstein space by extending the gradient flow into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow.

Stochastic Optimization

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study

no code implementations25 Nov 2023 Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun

Although gradient descent with momentum is widely used in modern deep learning, a concrete understanding of its effects on the training trajectory still remains elusive.

Closing the Gap Between the Upper Bound and the Lower Bound of Adam's Iteration Complexity

no code implementations27 Oct 2023 Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, Wei Chen

Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption.

LEMMA valid

Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions

no code implementations NeurIPS 2023 Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu

However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave.

When and Why Momentum Accelerates SGD:An Empirical Study

no code implementations15 Jun 2023 Jingwen Fu, Bohan Wang, Huishuai Zhang, Zhizheng Zhang, Wei Chen, Nanning Zheng

In the comparison of SGDM and SGD with the same effective learning rate and the same batch size, we observe a consistent pattern: when $\eta_{ef}$ is small, SGDM and SGD experience almost the same empirical training losses; when $\eta_{ef}$ surpasses a certain threshold, SGDM begins to perform better than SGD.

ALO-VC: Any-to-any Low-latency One-shot Voice Conversion

no code implementations1 Jun 2023 Bohan Wang, Damien Ronssin, Milos Cernak

This paper presents ALO-VC, a non-parallel low-latency one-shot phonetic posteriorgrams (PPGs) based voice conversion method.

Voice Conversion

Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions

no code implementations29 May 2023 Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen

We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions.

On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training

no code implementations NeurIPS 2023 Jieyu Zhang, Bohan Wang, Zhengyu Hu, Pang Wei Koh, Alexander Ratner

Pre-training datasets are critical for building state-of-the-art machine learning models, motivating rigorous study on their impact on downstream tasks.

O-GNN: Incorporating Ring Priors into Molecular Modeling

1 code implementation ICLR 2023 Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.

 Ranked #1 on Graph Regression on PCQM4M-LSC (Validation MAE metric)

Graph Regression Molecular Property Prediction +3

Regularization of polynomial networks for image recognition

no code implementations CVPR 2023 Grigorios G Chrysos, Bohan Wang, Jiankang Deng, Volkan Cevher

We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks.

DiGress: Discrete Denoising diffusion for graph generation

1 code implementation29 Sep 2022 Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, Pascal Frossard

This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes.

Denoising Edge Classification +1

Provable Adaptivity in Adam

no code implementations21 Aug 2022 Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen

In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.

Attribute

Optimizing Information-theoretical Generalization Bounds via Anisotropic Noise in SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Does Momentum Change the Implicit Regularization on Separable Data?

no code implementations8 Oct 2021 Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu

The momentum acceleration technique is widely adopted in many optimization algorithms.

Optimizing Information-theoretical Generalization Bound via Anisotropic Noise of SGLD

no code implementations NeurIPS 2021 Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu

We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.

Generalization Bounds

Robustness, Privacy, and Generalization of Adversarial Training

1 code implementation25 Dec 2020 Fengxiang He, Shaopeng Fu, Bohan Wang, DaCheng Tao

This measure can be approximate empirically by an asymptotically consistent empirical estimator, {\it empirical robustified intensity}.

Generalization Bounds Privacy Preserving

The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks

1 code implementation11 Dec 2020 Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu

Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process.

Tighter Generalization Bounds for Iterative Differentially Private Learning Algorithms

no code implementations18 Jul 2020 Fengxiang He, Bohan Wang, DaCheng Tao

This paper studies the relationship between generalization and privacy preservation in iterative learning algorithms by two sequential steps.

Federated Learning Generalization Bounds

Piecewise linear activations substantially shape the loss surfaces of neural networks

no code implementations ICLR 2020 Fengxiang He, Bohan Wang, DaCheng Tao

This result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice.

Cannot find the paper you are looking for? You can Submit a new open access paper.