no code implementations • 15 Feb 2024 • Zhichao Wang, Denny Wu, Zhou Fan
Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network.
no code implementations • 12 Jun 2023 • Taiji Suzuki, Denny Wu, Atsushi Nitanda
Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.
no code implementations • 6 Mar 2023 • Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki
The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime.
no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang
We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.
no code implementations • 25 Jan 2022 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.
no code implementations • ICLR 2022 • Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang
Stein variational gradient descent (SVGD) is a deterministic inference algorithm that evolves a set of particles to fit a target distribution.
no code implementations • ICLR 2022 • Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu
We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.
no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.
no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki
An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.
no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu
While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.
no code implementations • NeurIPS 2020 • Denny Wu, Ji Xu
Finally, we determine the optimal weighting matrix $\mathbf{\Sigma}_w$ for both the ridgeless ($\lambda\to 0$) and optimally regularized ($\lambda = \lambda_{\rm opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.
no code implementations • ICLR 2020 • Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang
This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.
no code implementations • pproximateinference AABI Symposium 2019 • Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang
Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.
no code implementations • NeurIPS 2019 • Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu
In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme.
no code implementations • 20 Jan 2019 • Denny Wu, Hirofumi Kobayashi, Charles Ding, Lei Cheng, Keisuke Goda Marzyeh Ghassemi
A crucial challenge in image-based modeling of biomedical data is to identify trends and features that separate normality and pathology.
no code implementations • ICLR 2019 • Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Ichiro Takeuchi, Ruslan Salakhutdinov, Kenji Fukumizu
In the paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions.
no code implementations • 15 Feb 2018 • Yao-Hung Hubert Tsai, Makoto Yamada, Denny Wu, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu
"Which Generative Adversarial Networks (GANs) generates the most plausible images?"
no code implementations • 15 Feb 2018 • Denny Wu, Yixiu Zhao, Yao-Hung Hubert Tsai, Makoto Yamada, Ruslan Salakhutdinov
To address this issue, we propose to measure the dependency instead of MI between layers in DNNs.