no code implementations • 19 Jun 2024 • Tan M. Nguyen, Tam Nguyen, Nhat Ho, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher
Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision.
no code implementations • 27 Feb 2024 • Raunak Manekar, Elisa Negrini, Minh Pham, Daniel Jacobs, Jaideep Srivastava, Stanley J. Osher, Jianwei Miao
Phase retrieval (PR) is fundamentally important in scientific imaging and is crucial for nanoscale techniques like coherent diffractive imaging (CDI).
no code implementations • 9 Feb 2024 • Benjamin J. Zhang, Siting Liu, Wuchen Li, Markos A. Katsoulakis, Stanley J. Osher
Via a Cole-Hopf transformation and taking advantage of the fact that the cross-entropy can be related to a linear functional of the density, we show that the HJB equation is an uncontrolled FP equation.
1 code implementation • 14 Jan 2024 • Liu Yang, Stanley J. Osher
We show the positive evidence to the second question, i. e., ICON can generalize well to some PDEs with new forms without any fine-tuning.
1 code implementation • 9 Aug 2023 • Liu Yang, Siting Liu, Stanley J. Osher
In the growing domain of scientific machine learning, in-context operator learning has shown notable potential in building foundation models, as in this framework the model is trained to learn operators and solve differential equations using prompted data, during the inference stage without weight updates.
2 code implementations • 17 Apr 2023 • Liu Yang, Siting Liu, Tingwei Meng, Stanley J. Osher
This paper introduces a new neural-network-based approach, namely In-Context Operator Networks (ICON), to simultaneously learn operators from the prompted data and apply it to new questions during the inference stage, without any weight update.
no code implementations • 1 Aug 2022 • Tan Nguyen, Richard G. Baraniuk, Robert M. Kirby, Stanley J. Osher, Bao Wang
Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence.
no code implementations • 1 Jun 2022 • Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley J. Osher, Nhat Ho
Multi-head attention empowers the recent success of transformers, the state-of-the-art models that have achieved remarkable success in sequence modeling and beyond.
no code implementations • 19 Apr 2022 • Justin Baker, Hedi Xia, Yiwei Wang, Elena Cherkaev, Akil Narayan, Long Chen, Jack Xin, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang
Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers.
no code implementations • 11 Mar 2022 • Alex Tong Lin, Adrian S. Wong, Robert Martin, Stanley J. Osher, Daniel Eckhardt
We provide a method to identify system parameters of dynamical systems, called ID-ODE -- Inference by Differentiation and Observing Delay Embeddings.
1 code implementation • 16 Oct 2021 • Tam Nguyen, Tan M. Nguyen, Dung D. Le, Duy Khuong Nguyen, Viet-Anh Tran, Richard G. Baraniuk, Nhat Ho, Stanley J. Osher
Inspired by this observation, we propose Transformer with a Mixture of Gaussian Keys (Transformer-MGK), a novel transformer architecture that replaces redundant heads in transformers with a mixture of keys at each head.
1 code implementation • NeurIPS 2021 • Hedi Xia, Vai Suliafu, Hangjie Ji, Tan M. Nguyen, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang
We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the continuous limit of the classical momentum accelerated gradient descent, to improve neural ODEs (NODEs) training and inference.
no code implementations • NeurIPS 2021 • Tan M. Nguyen, Vai Suliafu, Stanley J. Osher, Long Chen, Bao Wang
For instance, FMMformers achieve an average classification accuracy of $60. 74\%$ over the five Long Range Arena tasks, which is significantly better than the standard transformer's average accuracy of $58. 70\%$.
2 code implementations • NeurIPS 2020 • Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J. Osher, Bao Wang
Designing deep neural networks is an art that often involves an expensive search over candidate architectures.
no code implementations • 2 Mar 2020 • Thu Dinh, Bao Wang, Andrea L. Bertozzi, Stanley J. Osher
In this paper, we focus on a co-design of efficient DNN compression algorithms and sparse neural architectures for robust and accurate deep learning.
1 code implementation • 24 Feb 2020 • Bao Wang, Tan M. Nguyen, Andrea L. Bertozzi, Richard G. Baraniuk, Stanley J. Osher
Nesterov accelerated gradient (NAG) improves the convergence rate of gradient descent (GD) for convex optimization using a specially designed momentum; however, it accumulates error when an inexact gradient is used (such as in SGD), slowing convergence at best and diverging at worst.
1 code implementation • 24 Feb 2020 • Alex Tong Lin, Samy Wu Fung, Wuchen Li, Levon Nurbekyan, Stanley J. Osher
By phrasing the problem in this manner, solving the MFG can be interpreted as a special case of training a generative adversarial network (GAN).
1 code implementation • 16 Jul 2019 • Bao Wang, Stanley J. Osher
The proposed DNN with graph interpolating activation integrates the advantages of both deep learning and manifold learning.
1 code implementation • 28 Jun 2019 • Bao Wang, Quanquan Gu, March Boedihardjo, Farzin Barekat, Stanley J. Osher
At the core of DP-LSSGD is the Laplacian smoothing, which smooths out the Gaussian noise used in the Gaussian mechanism.
no code implementations • 21 Jan 2019 • Lisa Maria Kreusser, Stanley J. Osher, Bao Wang
First-order methods such as gradient descent are usually the methods of choice for training machine learning models.
5 code implementations • NeurIPS 2019 • Bao Wang, Binjie Yuan, Zuoqiang Shi, Stanley J. Osher
However, both natural and robust accuracies, in classifying clean and adversarial images, respectively, of the trained robust models are far from satisfactory.
no code implementations • 15 Nov 2018 • Zehao Dou, Stanley J. Osher, Bao Wang
In this paper, we analyze efficacy of the fast gradient sign method (FGSM) and the Carlini-Wagner's L2 (CW-L2) attack.
1 code implementation • 23 Sep 2018 • Bao Wang, Alex T. Lin, Wei Zhu, Penghang Yin, Andrea L. Bertozzi, Stanley J. Osher
We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation.
1 code implementation • NeurIPS 2018 • Bao Wang, Xiyang Luo, Zhen Li, Wei Zhu, Zuoqiang Shi, Stanley J. Osher
We replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function.
no code implementations • 23 Nov 2017 • Bao Wang, Penghang Yin, Andrea L. Bertozzi, P. Jeffrey Brantingham, Stanley J. Osher, Jack Xin
In this work, we first present a proper representation of crime data.
no code implementations • 26 Jul 2012 • Braxton Osting, Christoph Brune, Stanley J. Osher
Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking.