Search Results for author: Arnulf Jentzen

Found 50 papers, 7 papers with code

An Overview on Machine Learning Methods for Partial Differential Equations: from Physics Informed Neural Networks to Deep Operator Learning

no code implementations23 Aug 2024 Lukas Gonon, Arnulf Jentzen, Benno Kuckuck, Siyu Liang, Adrian Riekert, Philippe von Wurstemberger

While approximation methods for PDEs using ANNs have first been proposed in the 1990s they have only gained wide popularity in the last decade with the rise of deep learning.

Operator learning

Convergence rates for the Adam optimizer

no code implementations29 Jul 2024 Steffen Dereich, Arnulf Jentzen

In practically relevant training problems, usually not the plain vanilla standard SGD method is the employed optimization scheme but instead suitably accelerated and adaptive SGD optimization methods are applied.

Stochastic Optimization

Non-convergence of Adam and other adaptive stochastic gradient descent optimization methods for non-vanishing learning rates

no code implementations11 Jul 2024 Steffen Dereich, Robin Graeber, Arnulf Jentzen

Deep learning algorithms - typically consisting of a class of deep neural networks trained by a stochastic gradient descent (SGD) optimization method - are nowadays the key ingredients in many artificial intelligence (AI) systems and have revolutionized our ways of working and living in modern societies.

Learning rate adaptive stochastic gradient descent optimization methods: numerical simulations for deep learning methods for partial differential equations and convergence analyses

1 code implementation20 Jun 2024 Steffen Dereich, Arnulf Jentzen, Adrian Riekert

In this work we propose and study a learning-rate-adaptive approach for SGD optimization methods in which the learning rate is adjusted based on empirical estimates for the values of the objective function of the considered optimization problem (the function that one intends to minimize).

Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks

no code implementations7 Feb 2024 Arnulf Jentzen, Adrian Riekert

In this work we solve this research problem in the situation of shallow ANNs with the rectified linear unit (ReLU) and related activations with the standard mean square error loss by disproving in the training of such ANNs that SGD methods (such as the plain vanilla SGD, the momentum SGD, the AdaGrad, the RMSprop, and the Adam optimizers) can find a global minimizer with high probability.

Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

1 code implementation31 Oct 2023 Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger

This book aims to provide an introduction to the topic of deep learning algorithms.

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

no code implementations28 Feb 2023 Steffen Dereich, Arnulf Jentzen, Sebastian Kassing

Many mathematical convergence results for gradient descent (GD) based algorithms employ the assumption that the GD process is (almost surely) bounded and, also in concrete numerical simulations, divergence of the GD process may slow down, or even completely rule out, convergence of the error function.

Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

1 code implementation7 Feb 2023 Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

In the tested numerical examples the ADANN methodology significantly outperforms existing traditional approximation algorithms as well as existing deep operator learning methodologies from the literature.

Operator learning

The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality

no code implementations19 Jan 2023 Lukas Gonon, Robin Graeber, Arnulf Jentzen

In particular, it is a key contribution of this work to reveal that for all $a, b\in\mathbb{R}$ with $b-a\geq 7$ we have that the functions $[a, b]^d\ni x=(x_1,\dots, x_d)\mapsto\prod_{i=1}^d x_i\in\mathbb{R}$ for $d\in\mathbb{N}$ as well as the functions $[a, b]^d\ni x =(x_1,\dots, x_d)\mapsto\sin(\prod_{i=1}^d x_i) \in \mathbb{R} $ for $ d \in \mathbb{N} $ can neither be approximated without the curse of dimensionality by means of shallow ANNs nor insufficiently deep ANNs with ReLU activation but can be approximated without the curse of dimensionality by sufficiently deep ANNs with ReLU activation.

Gradient descent provably escapes saddle points in the training of shallow ReLU networks

no code implementations3 Aug 2022 Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms bypass so-called strict saddle points of the loss function.

Normalized gradient flow optimization in the training of ReLU artificial neural networks

no code implementations13 Jul 2022 Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss

The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry.

On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

no code implementations27 Jun 2022 Arnulf Jentzen, Timo Kröger

Furthermore, we prove that this upper bound only holds for sums of powers of the Lipschitz norm with the exponents $ 1/2 $ and $ 1 $ but does not hold for the Lipschitz norm alone.

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

no code implementations17 Dec 2021 Arnulf Jentzen, Adrian Riekert

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum.

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

no code implementations13 Dec 2021 Martin Hutzenthaler, Arnulf Jentzen, Katharina Pohl, Adrian Riekert, Luca Scarpa

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs.

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

no code implementations18 Aug 2021 Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss

In the second main result of this article we prove in the training of such ANNs under the assumption that the target function and the density function of the probability distribution of the input data are piecewise polynomial that every non-divergent GF trajectory converges with an appropriate rate of convergence to a critical point and that the risk of the non-divergent GF trajectory converges with rate 1 to the risk of the critical point.

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

no code implementations10 Aug 2021 Arnulf Jentzen, Adrian Riekert

Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains - even in the simplest situation of the plain vanilla GD optimization method with random initializations and ANNs with one hidden layer - an open problem to prove (or disprove) the conjecture that the risk of the GD optimization method converges in the training of such ANNs to zero as the width of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity.

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

no code implementations9 Jul 2021 Arnulf Jentzen, Adrian Riekert

Finally, in the special situation where there is only one neuron on the hidden layer (1-dimensional hidden layer) we strengthen the above named result for affine linear target functions by proving that that the risk of every (not necessarily bounded) GF trajectory converges to zero if the initial risk is sufficiently small.

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

no code implementations1 Apr 2021 Arnulf Jentzen, Adrian Riekert

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation.

Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions

no code implementations19 Mar 2021 Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation.

Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases

no code implementations23 Feb 2021 Arnulf Jentzen, Timo Kröger

In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches reach their limits.

An overview on deep learning-based approximation methods for partial differential equations

no code implementations22 Dec 2020 Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, Benno Kuckuck

It is one of the most challenging problems in applied mathematics to approximatively solve high-dimensional partial differential equations (PDEs).

Strong overall error analysis for the training of artificial neural networks via random initializations

no code implementations15 Dec 2020 Arnulf Jentzen, Adrian Riekert

Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view.

Stochastic Optimization

Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems

no code implementations2 Dec 2020 Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

In this article we introduce and study a deep learning based approximation algorithm for solutions of stochastic partial differential equations (SPDEs).

Weak error analysis for stochastic gradient descent optimization algorithms

no code implementations3 Jul 2020 Aritz Bercher, Lukas Gonon, Arnulf Jentzen, Diyora Salimova

In applications one is often not only interested in the size of the error with respect to the objective function but also in the size of the error with respect to a test function which is possibly different from the objective function.

Face Recognition Fraud Detection

Non-convergence of stochastic gradient descent in the training of deep neural networks

no code implementations12 Jun 2020 Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent.

Space-time deep neural network approximations for high-dimensional partial differential equations

no code implementations3 Jun 2020 Fabian Hornung, Arnulf Jentzen, Diyora Salimova

Each of these results establishes that DNNs overcome the curse of dimensionality in approximating suitable PDE solutions at a fixed time point $T>0$ and on a compact cube $[a, b]^d$ in space but none of these results provides an answer to the question whether the entire PDE solution on $[0, T]\times [a, b]^d$ can be approximated by DNNs without the curse of dimensionality.

Vocal Bursts Intensity Prediction

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

no code implementations3 Mar 2020 Arnulf Jentzen, Timo Welti

In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations.

Pricing and hedging American-style options with deep learning

1 code implementation23 Dec 2019 Sebastian Becker, Patrick Cheridito, Arnulf Jentzen

In this paper we introduce a deep learning method for pricing and hedging American-style options.

Efficient approximation of high-dimensional functions with neural networks

no code implementations9 Dec 2019 Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

In this paper, we develop a framework for showing that neural networks can overcome the curse of dimensionality in different high-dimensional approximation problems.

Numerical Analysis Numerical Analysis 68T07 I.2.0

Uniform error estimates for artificial neural network approximations for heat equations

no code implementations20 Nov 2019 Lukas Gonon, Philipp Grohs, Arnulf Jentzen, David Kofler, David Šiška

These mathematical results from the scientific literature prove in part that algorithms based on ANNs are capable of overcoming the curse of dimensionality in the numerical approximation of high-dimensional PDEs.

Full error analysis for the training of deep neural networks

no code implementations30 Sep 2019 Christan Beck, Arnulf Jentzen, Benno Kuckuck

In this work we estimate for a certain deep learning algorithm each of these three errors and combine these three error estimates to obtain an overall error analysis for the deep learning algorithm under consideration.

Deep neural network approximations for Monte Carlo algorithms

1 code implementation28 Aug 2019 Philipp Grohs, Arnulf Jentzen, Diyora Salimova

One key argument in most of these results is, first, to use a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove that DNNs are flexible enough to mimic the behaviour of the used approximation scheme.

Space-time error estimates for deep neural network approximations for differential equations

no code implementations11 Aug 2019 Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philipp Zimmermann

It is the subject of the main result of this article to provide space-time error estimates for DNN approximations of Euler approximations of certain perturbed differential equations.

Image Classification speech-recognition +1

Solving high-dimensional optimal stopping problems using deep learning

no code implementations5 Aug 2019 Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Timo Welti

We present numerical results for a large number of example problems, which include the pricing of many high-dimensional American and Bermudan options, such as Bermudan max-call options in up to 5000 dimensions.

Vocal Bursts Intensity Prediction

Deep splitting method for parabolic PDEs

no code implementations8 Jul 2019 Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

In this paper we introduce a numerical method for nonlinear parabolic PDEs that combines operator splitting with deep learning.

Towards a regularity theory for ReLU networks -- chain rule and global error estimates

no code implementations13 May 2019 Julius Berner, Dennis Elbrächter, Philipp Grohs, Arnulf Jentzen

Although for neural networks with locally Lipschitz continuous activation functions the classical derivative exists almost everywhere, the standard chain rule is in general not applicable.

Convergence rates for the stochastic gradient descent method for non-convex objective functions

no code implementations2 Apr 2019 Benjamin Fehrman, Benjamin Gess, Arnulf Jentzen

We prove the local convergence to minima and estimates on the rate of convergence for the stochastic gradient descent method in the case of not necessarily globally convex nor contracting objective functions.

BIG-bench Machine Learning

A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients

no code implementations19 Sep 2018 Arnulf Jentzen, Diyora Salimova, Timo Welti

These numerical simulations indicate that DNNs seem to possess the fundamental flexibility to overcome the curse of dimensionality in the sense that the number of real parameters used to describe the DNN grows at most polynomially in both the reciprocal of the prescribed approximation accuracy $ \varepsilon > 0 $ and the dimension $ d \in \mathbb{N}$ of the function which the DNN aims to approximate in such computational problems.

Face Recognition Fraud Detection

Analysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations

no code implementations9 Sep 2018 Julius Berner, Philipp Grohs, Arnulf Jentzen

It can be concluded that ERM over deep neural network hypothesis classes overcomes the curse of dimensionality for the numerical solution of linear Kolmogorov equations with affine coefficients.

A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations

no code implementations7 Sep 2018 Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philippe von Wurstemberger

Such numerical simulations suggest that ANNs have the capacity to very efficiently approximate high-dimensional functions and, especially, indicate that ANNs seem to admit the fundamental power to overcome the curse of dimensionality when approximating the high-dimensional functions appearing in the above named computational problems.

Image Classification speech-recognition +2

Solving the Kolmogorov PDE by means of deep learning

no code implementations1 Jun 2018 Christian Beck, Sebastian Becker, Philipp Grohs, Nor Jaafari, Arnulf Jentzen

Stochastic differential equations (SDEs) and the Kolmogorov partial differential equations (PDEs) associated to them have been widely used in models from engineering, finance, and the natural sciences.

Strong error analysis for stochastic gradient descent optimization algorithms

no code implementations29 Jan 2018 Arnulf Jentzen, Benno Kuckuck, Ariel Neufeld, Philippe von Wurstemberger

Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications.

Numerical Analysis Probability

Solving high-dimensional partial differential equations using deep learning

6 code implementations9 Jul 2017 Jiequn Han, Arnulf Jentzen, Weinan E

Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality".

Reinforcement Learning Vocal Bursts Intensity Prediction

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

5 code implementations15 Jun 2017 Weinan E, Jiequn Han, Arnulf Jentzen

We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE.

reinforcement-learning Reinforcement Learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.