Search Results for author: Mingze Wang

Found 11 papers, 3 papers with code

The Implicit Bias of Gradient Noise: A Symmetry Perspective

no code implementations11 Feb 2024 Liu Ziyin, Mingze Wang, Lei Wu

For one class of symmetry, SGD naturally converges to solutions that have a balanced and aligned gradient noise.

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

no code implementations1 Feb 2024 Mingze Wang, Weinan E

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory.

Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

no code implementations24 Nov 2023 Mingze Wang, Zeping Min, Lei Wu

Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}.

A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent

no code implementations1 Oct 2023 Mingze Wang, Lei Wu

In this paper, we provide a theoretical study of noise geometry for minibatch stochastic gradient descent (SGD), a phenomenon where noise aligns favorably with the geometry of local landscape.

Navigate

The alignment property of SGD noise and how it helps select flat minima: A stability analysis

no code implementations6 Jul 2022 Lei Wu, Mingze Wang, Weijie Su

In this paper, we provide an explanation of this striking phenomenon by relating the particular noise structure of SGD to its \emph{linear stability} (Wu et al., 2018).

Incorporating Voice Instructions in Model-Based Reinforcement Learning for Self-Driving Cars

no code implementations21 Jun 2022 Mingze Wang, Ziyang Zhang, Grace Hui Yang

This paper presents a novel approach that supports natural language voice instructions to guide deep reinforcement learning (DRL) algorithms when training self-driving cars.

Model-based Reinforcement Learning reinforcement-learning +2

Generalization Error Bounds for Deep Neural Networks Trained by SGD

no code implementations7 Jun 2022 Mingze Wang, Chao Ma

Generalization error bounds for deep neural networks trained by stochastic gradient descent (SGD) are derived by combining a dynamical control of an appropriate parameter norm and the Rademacher complexity estimate based on parameter norms.

Early Stage Convergence and Global Convergence of Training Mildly Parameterized Neural Networks

1 code implementation5 Jun 2022 Mingze Wang, Chao Ma

The convergence of GD and SGD when training mildly parameterized neural networks starting from random initialization is studied.

Cannot find the paper you are looking for? You can Submit a new open access paper.