no code implementations • 20 Dec 2023 • Yifei Duan, Yongqiang Cai
We prove that the control family $\mathcal{F}_1 = \mathcal{F}_0 \cup \{ \text{ReLU}(\cdot)\} $ is enough to generate flow maps that can uniformly approximate diffeomorphisms of $\mathbb{R}^d$ on any compact domain, where $\mathcal{F}_0 = \{x \mapsto Ax+b: A\in \mathbb{R}^{d\times d}, b \in \mathbb{R}^d\}$ is the set of linear maps and the dimension $d\ge2$.
no code implementations • 29 May 2023 • Li'ang Li, Yifei Duan, Guanghua Ji, Yongqiang Cai
In contrast, when the depth is unlimited, the width for UAP needs to be not less than the critical width $w^*_{\min}=\max(d_x, d_y)$, where $d_x$ and $d_y$ are the dimensions of the input and output, respectively.
no code implementations • 20 May 2023 • Yongqiang Cai
In recent years, deep learning-based sequence modelings, such as language models, have received much attention and success, which pushes researchers to explore the possibility of transforming non-sequential problems into a sequential form.
no code implementations • 23 Sep 2022 • Yongqiang Cai
The universal approximation property (UAP) of neural networks is fundamental for deep learning, and it is well known that wide neural networks are universal approximators of continuous functions within both the $L^p$ norm and the continuous/uniform norm.
no code implementations • 22 Sep 2022 • Yifei Duan, Li'ang Li, Guanghua Ji, Yongqiang Cai
In this paper, we back to the classical network structure and prove that the vanilla feedforward networks could also be a numerical discretization of dynamic systems, where the width of the network is equal to the dimension of the input and output.
no code implementations • 18 Apr 2020 • Yongqiang Cai, Qianxiao Li, Zuowei Shen
We present the viewpoint that optimization problems encountered in machine learning can often be interpreted as minimizing a convex functional over a function space, but with a non-convex constraint set introduced by model parameterization.
no code implementations • ICLR 2019 • Yongqiang Cai, Qianxiao Li, Zuowei Shen
Despite its empirical success, the theoretical underpinnings of the stability, convergence and acceleration properties of batch normalization (BN) remain elusive.
no code implementations • ICLR 2019 • Yongqiang Cai, Qianxiao Li, Zuowei Shen
Despite its empirical success and recent theoretical progress, there generally lacks a quantitative analysis of the effect of batch normalization (BN) on the convergence and stability of gradient descent.