1 code implementation • 12 Mar 2024 • Mingze Wang, Lili Su, Cilin Yan, Sheng Xu, Pengcheng Yuan, XiaoLong Jiang, Baochang Zhang
RSBuilding is designed to enhance cross-scene generalization and task universality.
no code implementations • 11 Feb 2024 • Liu Ziyin, Mingze Wang, Lei Wu
For one class of symmetry, SGD naturally converges to solutions that have a balanced and aligned gradient noise.
no code implementations • 1 Feb 2024 • Mingze Wang, Weinan E
We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory.
no code implementations • 24 Nov 2023 • Mingze Wang, Zeping Min, Lei Wu
Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}.
no code implementations • 1 Oct 2023 • Mingze Wang, Lei Wu
In this paper, we provide a theoretical study of noise geometry for minibatch stochastic gradient descent (SGD), a phenomenon where noise aligns favorably with the geometry of local landscape.
no code implementations • 1 Jul 2023 • Mingze Wang, Huixin Sun, Jun Shi, Xuhui Liu, Baochang Zhang, Xianbin Cao
Real-time object detection plays a vital role in various computer vision applications.
1 code implementation • NeurIPS 2023 • Mingze Wang, Chao Ma
The training process of ReLU neural networks often exhibits complicated nonlinear phenomena.
no code implementations • 6 Jul 2022 • Lei Wu, Mingze Wang, Weijie Su
In this paper, we provide an explanation of this striking phenomenon by relating the particular noise structure of SGD to its \emph{linear stability} (Wu et al., 2018).
no code implementations • 21 Jun 2022 • Mingze Wang, Ziyang Zhang, Grace Hui Yang
This paper presents a novel approach that supports natural language voice instructions to guide deep reinforcement learning (DRL) algorithms when training self-driving cars.
Model-based Reinforcement Learning reinforcement-learning +2
no code implementations • 7 Jun 2022 • Mingze Wang, Chao Ma
Generalization error bounds for deep neural networks trained by stochastic gradient descent (SGD) are derived by combining a dynamical control of an appropriate parameter norm and the Rademacher complexity estimate based on parameter norms.
1 code implementation • 5 Jun 2022 • Mingze Wang, Chao Ma
The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied.