Search Results for author: Weinan E

Found 79 papers, 20 papers with code

How Transformers Implement Induction Heads: Approximation and Optimization Analysis

no code implementations15 Oct 2024 Mingze Wang, Ruoxi Yu, Weinan E, Lei Wu

Transformers have demonstrated exceptional in-context learning capabilities, yet the theoretical understanding of the underlying mechanisms remain limited.

In-Context Learning

$\text{Memory}^3$: Language Modeling with Explicit Memory

no code implementations1 Jul 2024 Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

The model is named $\text{Memory}^3$, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values).

Language Modelling RAG +1

Uni-Mol2: Exploring Molecular Pretraining Model at Scale

1 code implementation21 Jun 2024 Xiaohong Ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E

In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences.

Improving Generalization and Convergence by Enhancing Implicit Regularization

1 code implementation31 May 2024 Mingze Wang, Jinbo Wang, Haotian He, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Weinan E, Lei Wu

In this work, we propose an Implicit Regularization Enhancement (IRE) framework to accelerate the discovery of flat solutions in deep learning, thereby improving generalization and convergence.

Image Classification

Coarse-graining conformational dynamics with multi-dimensional generalized Langevin equation: how, when, and why

no code implementations20 May 2024 Pinchen Xie, Yunrui Qiu, Weinan E

A data-driven ab initio generalized Langevin equation (AIGLE) approach is developed to learn and simulate high-dimensional, heterogeneous, coarse-grained conformational dynamics.

Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

no code implementations1 Feb 2024 Mingze Wang, Weinan E

We conduct a systematic study of the approximation properties of Transformer for sequence modeling with long, sparse and complicated memory.

Anchor function: a type of benchmark functions for studying language models

no code implementations16 Jan 2024 Zhongwang Zhang, Zhiwei Wang, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu

However, language model research faces significant challenges, especially for academic research groups with constrained resources.

Language Modelling

Invertible Coarse Graining with Physics-Informed Generative Artificial Intelligence

no code implementations2 May 2023 Jun Zhang, Xiaohan Lin, Weinan E, Yi Qin Gao

Multiscale molecular modeling is widely applied in scientific research of molecular properties over large time and length scales.

MAC: A unified framework boosting low resource automatic speech recognition

no code implementations5 Feb 2023 Zeping Min, Qian Ge, Zhong Li, Weinan E

Furthermore, in the ASR task, MAC beats wav2vec2 (with fine-tuning) on common voice datasets of Cantonese and gets really competitive results on common voice datasets of Taiwanese and Japanese.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

A multi-scale sampling method for accurate and robust deep neural network to predict combustion chemical kinetics

no code implementations9 Jan 2022 Tianhan Zhang, Yuxiao Yi, Yifan Xu, Zhi X. Chen, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu

The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be.

A deep learning-based model reduction (DeePMR) method for simplifying chemical kinetics

no code implementations6 Jan 2022 Zhiwei Wang, Yaoyu Zhang, Enhan Zhao, Yiguang Ju, Weinan E, Zhi-Qin John Xu, Tianhan Zhang

The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a species, represents a reduced mechanism.

DeePN$^2$: A deep learning-based non-Newtonian hydrodynamic model

no code implementations29 Dec 2021 Lidong Fang, Pei Ge, Lei Zhang, Weinan E, Huan Lei

A long standing problem in the modeling of non-Newtonian hydrodynamics of polymeric flows is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics.

Deep Learning

DeepHAM: A Global Solution Method for Heterogeneous Agent Models with Aggregate Shocks

no code implementations29 Dec 2021 Jiequn Han, Yucheng Yang, Weinan E

An efficient, reliable, and interpretable global solution method, the Deep learning-based algorithm for Heterogeneous Agent Models (DeepHAM), is proposed for solving high dimensional heterogeneous agent models with aggregate shocks.

Generalization Error of GAN from the Discriminator's Perspective

no code implementations8 Jul 2021 Hongkang Yang, Weinan E

The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood.

Generative Adversarial Network Memorization

MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs

no code implementations8 Jul 2021 Lulu Zhang, Tao Luo, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu, Zheng Ma

In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs.

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

no code implementations15 Apr 2021 Jihao Long, Jiequn Han, Weinan E

Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states.

Reinforcement Learning (RL)

The Phase Diagram of a Deep Potential Water Model

no code implementations9 Feb 2021 Linfeng Zhang, Han Wang, Roberto Car, Weinan E

Using the Deep Potential methodology, we construct a model that reproduces accurately the potential energy surface of the SCAN approximation of density functional theory for water, from low temperature and pressure to about 2400 K and 50 GPa, excluding the vapor stability region.

Chemical Physics

On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers

no code implementations10 Dec 2020 Weinan E, Stephan Wojtowytsch

A recent numerical study observed that neural network classifiers enjoy a large degree of symmetry in the penultimate layer.

Some observations on high-dimensional partial differential equations with Barron data

no code implementations2 Dec 2020 Weinan E, Stephan Wojtowytsch

We use explicit representation formulas to show that solutions to certain partial differential equations lie in Barron spaces or multilayer spaces if the PDE data lie in such function spaces.

Vocal Bursts Intensity Prediction

Generalization and Memorization: The Bias Potential Model

no code implementations29 Nov 2020 Hongkang Yang, Weinan E

Models for learning probability distributions such as generative models and density estimators behave quite differently from models for learning functions.

Memorization

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

no code implementations NeurIPS 2020 Pan Zhou, Jiashi Feng, Chao Ma, Caiming Xiong, Steven Hoi, Weinan E

The result shows that (1) the escaping time of both SGD and ADAM~depends on the Radon measure of the basin positively and the heaviness of gradient noise negatively; (2) for the same basin, SGD enjoys smaller escaping time than ADAM, mainly because (a) the geometry adaptation in ADAM~via adaptively scaling each gradient coordinate well diminishes the anisotropic structure in gradient noise and results in larger Radon measure of a basin; (b) the exponential gradient average in ADAM~smooths its gradient and leads to lighter gradient noise tails than SGD.

Interpretable Neural Networks for Panel Data Analysis in Economics

no code implementations11 Oct 2020 Yucheng Yang, Zhong Zheng, Weinan E

In this paper, we propose a class of interpretable neural network models that can achieve both high prediction accuracy and interpretability.

Time Series Time Series Analysis

The Knowledge Graph for Macroeconomic Analysis with Alternative Big Data

no code implementations11 Oct 2020 Yucheng Yang, Yue Pang, Guanhua Huang, Weinan E

The current knowledge system of macroeconomics is built on interactions among a small number of variables, since traditional macroeconomic models can mostly handle a handful of inputs.

Variable Selection

A priori estimates for classification problems using neural networks

no code implementations28 Sep 2020 Weinan E, Stephan Wojtowytsch

We consider binary and multi-class classification problems using hypothesis classes of neural networks.

Classification General Classification +1

Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't

no code implementations22 Sep 2020 Weinan E, Chao Ma, Stephan Wojtowytsch, Lei Wu

The purpose of this article is to review the achievements made in the last few years towards the understanding of the reasons behind the success and subtleties of neural network-based machine learning.

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

no code implementations ICLR 2021 Zhong Li, Jiequn Han, Weinan E, Qianxiao Li

We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data.

A Qualitative Study of the Dynamic Behavior for Adaptive Gradient Algorithms

no code implementations14 Sep 2020 Chao Ma, Lei Wu, Weinan E

The dynamic behavior of RMSprop and Adam algorithms is studied through a combination of careful numerical experiments and theoretical explanations.

OnsagerNet: Learning Stable and Interpretable Dynamics using a Generalized Onsager Principle

1 code implementation6 Sep 2020 Haijun Yu, Xinyuan Tian, Weinan E, Qianxiao Li

We further apply this method to study Rayleigh-Benard convection and learn Lorenz-like low dimensional autonomous reduced order models that capture both qualitative and quantitative properties of the underlying dynamics.

The Slow Deterioration of the Generalization Error of the Random Feature Model

no code implementations13 Aug 2020 Chao Ma, Lei Wu, Weinan E

The random feature model exhibits a kind of resonance behavior when the number of parameters is close to the training sample size.

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

no code implementations30 Jul 2020 Weinan E, Stephan Wojtowytsch

The key to this work is a new way of representing functions in some form of expectations, motivated by multi-layer neural networks.

Coarse-grained spectral projection (CGSP): a deep learning-assisted approach to quantum unitary dynamics

1 code implementation19 Jul 2020 Pinchen Xie, Weinan E

We propose the coarse-grained spectral projection method (CGSP), a deep learning-assisted approach for tackling quantum unitary dynamic problems with an emphasis on quench dynamics.

The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models

1 code implementation25 Jun 2020 Chao Ma, Lei Wu, Weinan E

A numerical and phenomenological study of the gradient descent (GD) algorithm for training two-layer neural network models is carried out for different parameter regimes when the target function can be accurately approximated by a relatively small number of neurons.

Representation formulas and pointwise properties for Barron functions

no code implementations10 Jun 2020 Weinan E, Stephan Wojtowytsch

We study the natural function space for infinitely wide two-layer neural networks with ReLU activation (Barron space) and establish different representation formulae.

Deep Potential generation scheme and simulation protocol for the Li10GeP2S12-type superionic conductors

no code implementations5 Jun 2020 Jianxing Huang, Linfeng Zhang, Han Wang, Jinbao Zhao, Jun Cheng, Weinan E

It has been a challenge to accurately simulate Li-ion diffusion processes in battery materials at room temperature using {\it ab initio} molecular dynamics (AIMD) due to its high computational cost.

Computational Physics Materials Science Chemical Physics

Integrating Machine Learning with Physics-Based Modeling

no code implementations4 Jun 2020 Weinan E, Jiequn Han, Linfeng Zhang

Machine learning is poised as a very powerful tool that can drastically improve our ability to carry out scientific research.

BIG-bench Machine Learning

Can Shallow Neural Networks Beat the Curse of Dimensionality? A mean field training perspective

no code implementations21 May 2020 Stephan Wojtowytsch, Weinan E

Thus gradient descent training for fitting reasonably smooth, but truly high-dimensional data may be subject to the curse of dimensionality.

Kolmogorov Width Decay and Poor Approximators in Machine Learning: Shallow Neural Networks, Random Feature Models and Neural Tangent Kernels

no code implementations21 May 2020 Weinan E, Stephan Wojtowytsch

We establish a scale separation of Kolmogorov width type between subspaces of a given Banach space under the condition that a sequence of linear maps converges much faster on one of the subspaces.

Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning

1 code implementation1 May 2020 Weile Jia, Han Wang, Mohan Chen, Denghui Lu, Lin Lin, Roberto Car, Weinan E, Linfeng Zhang

For 35 years, {\it ab initio} molecular dynamics (AIMD) has been the method of choice for modeling complex atomistic phenomena from first principles.

Computational Physics

Machine learning based non-Newtonian fluid model with molecular fidelity

no code implementations7 Mar 2020 Huan Lei, Lei Wu, Weinan E

We introduce a machine-learning-based framework for constructing continuum non-Newtonian fluid dynamics model directly from a micro-scale description.

BIG-bench Machine Learning

Machine Learning from a Continuous Viewpoint

no code implementations30 Dec 2019 Weinan E, Chao Ma, Lei Wu

We demonstrate that conventional machine learning models and algorithms, such as the random feature model, the two-layer neural network model and the residual neural network model, can all be recovered (in a scaled form) as particular discretizations of different continuous formulations.

BIG-bench Machine Learning

The Generalization Error of the Minimum-norm Solutions for Over-parameterized Neural Networks

no code implementations15 Dec 2019 Weinan E, Chao Ma, Lei Wu

We study the generalization properties of minimum-norm solutions for three over-parametrized machine learning models including the random feature model, the two-layer neural network model and the residual network model.

BIG-bench Machine Learning

DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models

1 code implementation28 Oct 2019 Yuzhi Zhang, Haidi Wang, WeiJie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, Weinan E

Materials 3, 023804] and is capable of generating uniformly accurate deep learning based PES models in a way that minimizes human intervention and the computational cost for data generation and model training.

Computational Physics

A mathematical model for universal semantics

1 code implementation29 Jul 2019 Weinan E, Yajun Zhou

We characterize the meaning of words with language-independent numerical fingerprints, through a mathematical analysis of recurring patterns in texts.

Question Answering Translation +1

Deep neural network for Wannier function centers

1 code implementation27 Jun 2019 Linfeng Zhang, Mohan Chen, Xifan Wu, Han Wang, Weinan E, Roberto Car

We introduce a deep neural network (DNN) model that assigns the position of the centers of the electronic charge in each atomic configuration on a molecular dynamics trajectory.

Computational Physics Materials Science Chemical Physics

The Barron Space and the Flow-induced Function Spaces for Neural Network Models

no code implementations18 Jun 2019 Weinan E, Chao Ma, Lei Wu

We define the Barron space and show that it is the right space for two-layer neural network models in the sense that optimal direct and inverse approximation theorems hold for functions in the Barron space.

BIG-bench Machine Learning

A Priori Estimates of the Generalization Error for Two-layer Neural Networks

no code implementations ICLR 2019 Lei Wu, Chao Ma, Weinan E

These new estimates are a priori in nature in the sense that the bounds depend only on some norms of the underlying functions to be fitted, not the parameters in the model.

Monge-Amp\`ere Flow for Generative Modeling

no code implementations ICLR 2019 Linfeng Zhang, Weinan E, Lei Wang

We present a deep generative model, named Monge-Amp\`ere flow, which builds on continuous-time gradient flow arising from the Monge-Amp\`ere equation in optimal transport theory.

Density Estimation

Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections

no code implementations10 Apr 2019 Weinan E, Chao Ma, Qingcan Wang, Lei Wu

In addition, it is also shown that the GD path is uniformly close to the functions given by the related random feature model.

A Comparative Analysis of the Optimization and Generalization Property of Two-layer Neural Network and Random Feature Models Under Gradient Descent Dynamics

no code implementations8 Apr 2019 Weinan E, Chao Ma, Lei Wu

In the over-parametrized regime, it is shown that gradient descent dynamics can achieve zero training loss exponentially fast regardless of the quality of the labels.

A Priori Estimates of the Population Risk for Residual Networks

no code implementations6 Mar 2019 Weinan E, Chao Ma, Qingcan Wang

An important part of the regularized model is the usage of a new path norm, called the weighted path norm, as the regularization term.

How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective

1 code implementation NeurIPS 2018 Lei Wu, Chao Ma, Weinan E

The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability.

Stochastic Modified Equations and Dynamics of Stochastic Gradient Algorithms I: Mathematical Foundations

no code implementations5 Nov 2018 Qianxiao Li, Cheng Tai, Weinan E

We develop the mathematical foundations of the stochastic modified equations (SME) framework for analyzing the dynamics of stochastic gradient algorithms, where the latter is approximated by a class of stochastic differential equations with small noise parameters.

Active Learning of Uniformly Accurate Inter-atomic Potentials for Materials Simulation

no code implementations28 Oct 2018 Linfeng Zhang, De-Ye Lin, Han Wang, Roberto Car, Weinan E

An active learning procedure called Deep Potential Generator (DP-GEN) is proposed for the construction of accurate and transferable machine learning-based models of the potential energy surface (PES) for the molecular modeling of materials.

Active Learning BIG-bench Machine Learning

A Priori Estimates of the Population Risk for Two-layer Neural Networks

no code implementations ICLR 2019 Weinan E, Chao Ma, Lei Wu

New estimates for the population risk are established for two-layer neural networks.

Monge-Ampère Flow for Generative Modeling

1 code implementation26 Sep 2018 Linfeng Zhang, Weinan E, Lei Wang

We present a deep generative model, named Monge-Amp\`ere flow, which builds on continuous-time gradient flow arising from the Monge-Amp\`ere equation in optimal transport theory.

Density Estimation

Model Reduction with Memory and the Machine Learning of Dynamical Systems

no code implementations10 Aug 2018 Chao Ma, Jianchun Wang, Weinan E

The well-known Mori-Zwanzig theory tells us that model reduction leads to memory effect.

BIG-bench Machine Learning

Solving Many-Electron Schrödinger Equation Using Deep Neural Networks

no code implementations18 Jul 2018 Jiequn Han, Linfeng Zhang, Weinan E

We introduce a new family of trial wave-functions based on deep neural networks to solve the many-electron Schr\"odinger equation.

Computational Physics Chemical Physics

A Mean-Field Optimal Control Formulation of Deep Learning

no code implementations3 Jul 2018 Weinan E, Jiequn Han, Qianxiao Li

This paper introduces the mathematical formulation of the population risk minimization problem in deep learning as a mean-field optimal control problem.

Deep Learning

Exponential Convergence of the Deep Neural Network Approximation for Analytic Functions

no code implementations1 Jul 2018 Weinan E, Qingcan Wang

We prove that for analytic functions in low dimension, the convergence rate of the deep neural network approximation is exponential.

End-to-end Symmetry Preserving Inter-atomic Potential Energy Model for Finite and Extended Systems

1 code implementation NeurIPS 2018 Linfeng Zhang, Jiequn Han, Han Wang, Wissam A. Saidi, Roberto Car, Weinan E

Machine learning models are changing the paradigm of molecular modeling, which is a fundamental tool for material science, chemistry, and computational biology.

Computational Physics Materials Science Chemical Physics

Understanding and Enhancing the Transferability of Adversarial Examples

no code implementations27 Feb 2018 Lei Wu, Zhanxing Zhu, Cheng Tai, Weinan E

State-of-the-art deep neural networks are known to be vulnerable to adversarial examples, formed by applying small but malicious perturbations to the original inputs.

DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics

2 code implementations11 Dec 2017 Han Wang, Linfeng Zhang, Jiequn Han, Weinan E

Here we describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics.

Deep Learning

Reinforced dynamics for enhanced sampling in large atomic and molecular systems

no code implementations10 Dec 2017 Linfeng Zhang, Han Wang, Weinan E

Like metadynamics, it allows for an efficient exploration of the configuration space by adding an adaptively computed biasing potential to the original dynamics.

Efficient Exploration reinforcement-learning +2

Maximum Principle Based Algorithms for Deep Learning

2 code implementations26 Oct 2017 Qianxiao Li, Long Chen, Cheng Tai, Weinan E

The continuous dynamical system approach to deep learning is explored in order to devise alternative frameworks for training algorithms.

Deep Learning

The Deep Ritz method: A deep learning-based numerical algorithm for solving variational problems

1 code implementation30 Sep 2017 Weinan E, Bing Yu

We propose a deep learning based method, the Deep Ritz Method, for numerically solving variational problems, particularly the ones that arise from partial differential equations.

Deep Learning

Deep Potential Molecular Dynamics: a scalable model with the accuracy of quantum mechanics

5 code implementations30 Jul 2017 Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, Weinan E

We introduce a scheme for molecular simulations, the Deep Potential Molecular Dynamics (DeePMD) method, based on a many-body potential and interatomic forces generated by a carefully crafted deep neural network trained with ab initio data.

Solving high-dimensional partial differential equations using deep learning

6 code implementations9 Jul 2017 Jiequn Han, Arnulf Jentzen, Weinan E

Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality".

Deep Learning Reinforcement Learning +1

Deep Potential: a general representation of a many-body potential energy surface

1 code implementation5 Jul 2017 Jiequn Han, Linfeng Zhang, Roberto Car, Weinan E

When tested on a wide variety of examples, Deep Potential is able to reproduce the original model, whether empirical or quantum mechanics based, within chemical accuracy.

Computational Physics

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

no code implementations30 Jun 2017 Lei Wu, Zhanxing Zhu, Weinan E

It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples.

Deep Learning

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

5 code implementations15 Jun 2017 Weinan E, Jiequn Han, Arnulf Jentzen

We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE.

reinforcement-learning Reinforcement Learning +1

Deep Learning Approximation for Stochastic Control Problems

no code implementations2 Nov 2016 Jiequn Han, Weinan E

Many real world stochastic control problems suffer from the "curse of dimensionality".

Deep Learning

Stochastic modified equations and adaptive stochastic gradient algorithms

no code implementations ICML 2017 Qianxiao Li, Cheng Tai, Weinan E

We develop the method of stochastic modified equations (SME), in which stochastic gradient algorithms are approximated in the weak sense by continuous-time stochastic differential equations.

Functional Frank-Wolfe Boosting for General Loss Functions

no code implementations9 Oct 2015 Chu Wang, Yingfei Wang, Weinan E, Robert Schapire

Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance.

Binary Classification General Classification +1

Multiscale Adaptive Representation of Signals: I. The Basic Framework

no code implementations17 Jul 2015 Cheng Tai, Weinan E

The new framework, called AdaFrame, improves over dictionary learning-based techniques in terms of computational efficiency at inference time.

Computational Efficiency Denoising +2

Cannot find the paper you are looking for? You can Submit a new open access paper.