1 code implementation • 28 Mar 2024 • Xuan Zhang, Jacob Helwig, Yuchao Lin, Yaochen Xie, Cong Fu, Stephan Wojtowytsch, Shuiwang Ji
While the U-Net architecture with skip connections is commonly used by prior studies to enable multi-scale processing, our analysis shows that the need for features to evolve across layers results in temporally misaligned features in skip connections, which limits the model's performance.
no code implementations • 26 Oct 2023 • Jonathan W. Siegel, Stephan Wojtowytsch
In the case of stochastic gradient descent, the summability of $\mathbb E[f(x_n) - \inf f]$ is used to prove that $f(x_n)\to \inf f$ almost surely - an improvement on the convergence almost surely up to a subsequence which follows from the $O(1/n)$ decay estimate.
1 code implementation • 9 Jun 2023 • Jacob Helwig, Xuan Zhang, Cong Fu, Jerry Kurtin, Stephan Wojtowytsch, Shuiwang Ji
We consider solving partial differential equations (PDEs) with Fourier neural operators (FNOs), which operate in the frequency domain.
no code implementations • 10 Feb 2023 • Kanan Gupta, Jonathan Siegel, Stephan Wojtowytsch
We present a generalization of Nesterov's accelerated gradient descent algorithm.
no code implementations • 2 Sep 2022 • Stephan Wojtowytsch
In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball.
no code implementations • 25 Mar 2022 • Josiah Park, Stephan Wojtowytsch
We prove for both real and complex networks with non-polynomial activation that the closure of the class of neural networks coincides with the closure of the space of polynomials.
no code implementations • 4 Jun 2021 • Stephan Wojtowytsch
The representation of functions by artificial neural networks depends on a large number of parameters in a non-linear fashion.
no code implementations • 4 May 2021 • Stephan Wojtowytsch
Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine learning.
no code implementations • 10 Dec 2020 • Weinan E, Stephan Wojtowytsch
A recent numerical study observed that neural network classifiers enjoy a large degree of symmetry in the penultimate layer.
no code implementations • 2 Dec 2020 • Weinan E, Stephan Wojtowytsch
We use explicit representation formulas to show that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces.
no code implementations • 28 Sep 2020 • Weinan E, Stephan Wojtowytsch
We consider binary and multi-class classification problems using hypothesis classes of neural networks.
no code implementations • 22 Sep 2020 • Weinan E, Chao Ma, Stephan Wojtowytsch, Lei Wu
The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning.
no code implementations • 30 Jul 2020 • Weinan E, Stephan Wojtowytsch
The key to this work is a new way of representing functions in some form of expectations, motivated by multi-layer neural networks.
no code implementations • 10 Jun 2020 • Weinan E, Stephan Wojtowytsch
We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae.
no code implementations • 27 May 2020 • Stephan Wojtowytsch
The condition does not depend on the initalization of parameters and concerns only the weak convergence of the realization of the neural network, not its parameter distribution.
no code implementations • 21 May 2020 • Weinan E, Stephan Wojtowytsch
We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces.
no code implementations • 21 May 2020 • Stephan Wojtowytsch, Weinan E
Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality.