no code implementations • 22 Mar 2024 • Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen
This paper aims to clearly distinguish between Stochastic Gradient Descent with Momentum (SGDM) and Adam in terms of their convergence rates.
no code implementations • 24 Jan 2024 • Mingyang Yi, Bohan Wang
In this paper, we aim to enrich the continuous optimization methods in the Wasserstein space by extending the gradient flow into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow.
no code implementations • 25 Nov 2023 • Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun
Although gradient descent with momentum is widely used in modern deep learning, a concrete understanding of its effects on the training trajectory still remains elusive.
no code implementations • 27 Oct 2023 • Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, Wei Chen
Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption.
no code implementations • 25 Jul 2023 • Liane Makatura, Michael Foshey, Bohan Wang, Felix HähnLein, Pingchuan Ma, Bolei Deng, Megan Tjandrasuwita, Andrew Spielberg, Crystal Elaine Owens, Peter Yichen Chen, Allan Zhao, Amy Zhu, Wil J Norton, Edward Gu, Joshua Jacob, Yifei Li, Adriana Schulz, Wojciech Matusik
The advancement of Large Language Models (LLMs), including GPT-4, provides exciting new opportunities for generative design.
no code implementations • NeurIPS 2023 • Xiang Cheng, Bohan Wang, Jingzhao Zhang, Yusong Zhu
However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave.
no code implementations • 15 Jun 2023 • Jingwen Fu, Bohan Wang, Huishuai Zhang, Zhizheng Zhang, Wei Chen, Nanning Zheng
In the comparison of SGDM and SGD with the same effective learning rate and the same batch size, we observe a consistent pattern: when $\eta_{ef}$ is small, SGDM and SGD experience almost the same empirical training losses; when $\eta_{ef}$ surpasses a certain threshold, SGDM begins to perform better than SGD.
no code implementations • 1 Jun 2023 • Bohan Wang, Damien Ronssin, Milos Cernak
This paper presents ALO-VC, a non-parallel low-latency one-shot phonetic posteriorgrams (PPGs) based voice conversion method.
no code implementations • 29 May 2023 • Bohan Wang, Huishuai Zhang, Zhi-Ming Ma, Wei Chen
We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions.
no code implementations • NeurIPS 2023 • Jieyu Zhang, Bohan Wang, Zhengyu Hu, Pang Wei Koh, Alexander Ratner
Pre-training datasets are critical for building state-of-the-art machine learning models, motivating rigorous study on their impact on downstream tasks.
1 code implementation • ICLR 2023 • Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu
Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models.
Ranked #1 on Graph Regression on PCQM4M-LSC (Validation MAE metric)
no code implementations • 29 Apr 2023 • Walter Zimmer, Joseph Birkner, Marcel Brucker, Huu Tung Nguyen, Stefan Petrovski, Bohan Wang, Alois C. Knoll
We evaluate our results on the A9 infrastructure dataset and achieve 68. 48 mAP on the test set.
no code implementations • CVPR 2023 • Grigorios G Chrysos, Bohan Wang, Jiankang Deng, Volkan Cevher
We introduce a class of PNs, which are able to reach the performance of ResNet across a range of six benchmarks.
1 code implementation • 29 Sep 2022 • Clement Vignac, Igor Krawczuk, Antoine Siraudin, Bohan Wang, Volkan Cevher, Pascal Frossard
This work introduces DiGress, a discrete denoising diffusion model for generating graphs with categorical node and edge attributes.
no code implementations • 21 Aug 2022 • Bohan Wang, Yushun Zhang, Huishuai Zhang, Qi Meng, Zhi-Ming Ma, Tie-Yan Liu, Wei Chen
In particular, the existing analysis of Adam cannot clearly demonstrate the advantage of Adam over SGD.
no code implementations • NeurIPS 2021 • Gongwei Chen, Xinhang Song, Bohan Wang, Shuqiang Jiang
In this paper, we propose to understand scene images and the scene classification CNN models in terms of the focus area.
no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.
no code implementations • 8 Oct 2021 • Bohan Wang, Qi Meng, Huishuai Zhang, Ruoyu Sun, Wei Chen, Zhi-Ming Ma, Tie-Yan Liu
The momentum acceleration technique is widely adopted in many optimization algorithms.
no code implementations • ICLR 2022 • Jieyu Zhang, Bohan Wang, Xiangchen Song, Yujing Wang, Yaming Yang, Jing Bai, Alexander Ratner
Creating labeled training sets has become one of the major roadblocks in machine learning.
no code implementations • 31 May 2021 • Ziming Liu, Bohan Wang, Qi Meng, Wei Chen, Max Tegmark, Tie-Yan Liu
Energy conservation is a basic physics principle, the breakdown of which often implies new physics.
no code implementations • NeurIPS 2021 • Bohan Wang, Huishuai Zhang, Jieyu Zhang, Qi Meng, Wei Chen, Tie-Yan Liu
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized.
1 code implementation • CVPR 2021 • Cheng Zou, Bohan Wang, Yue Hu, Junqi Liu, Qian Wu, Yu Zhao, Boxun Li, Chenguang Zhang, Chi Zhang, Yichen Wei, Jian Sun
We propose HOI Transformer to tackle human object interaction (HOI) detection in an end-to-end manner.
Ranked #30 on Human-Object Interaction Detection on HICO-DET (using extra training data)
1 code implementation • 25 Dec 2020 • Fengxiang He, Shaopeng Fu, Bohan Wang, DaCheng Tao
This measure can be approximate empirically by an asymptotically consistent empirical estimator, {\it empirical robustified intensity}.
1 code implementation • 11 Dec 2020 • Bohan Wang, Qi Meng, Wei Chen, Tie-Yan Liu
Except GD, adaptive algorithms such as AdaGrad, RMSProp and Adam are popular owing to their rapid training process.
no code implementations • 18 Jul 2020 • Fengxiang He, Bohan Wang, DaCheng Tao
This paper studies the relationship between generalization and privacy preservation in iterative learning algorithms by two sequential steps.
no code implementations • ICLR 2020 • Fengxiang He, Bohan Wang, DaCheng Tao
This result holds for any neural network with arbitrary depth and arbitrary piecewise linear activation functions (excluding linear functions) under most loss functions in practice.