Search Results for author: Denny Wu

Found 18 papers, 0 papers with code

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

no code implementations15 Feb 2024 Zhichao Wang, Denny Wu, Zhou Fan

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network.

Representation Learning

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

no code implementations12 Jun 2023 Taiji Suzuki, Denny Wu, Atsushi Nitanda

Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.

Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

no code implementations6 Mar 2023 Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki

The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime.

Image Generation

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations3 May 2022 Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Convex Analysis of the Mean Field Langevin Dynamics

no code implementations25 Jan 2022 Atsushi Nitanda, Denny Wu, Taiji Suzuki

In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.

Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization

no code implementations ICLR 2022 Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu

We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.

Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis

no code implementations NeurIPS 2021 Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

no code implementations NeurIPS 2021 Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

When Does Preconditioning Help or Hurt Generalization?

no code implementations ICLR 2021 Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

regression Second-order methods

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

no code implementations NeurIPS 2020 Denny Wu, Ji Xu

Finally, we determine the optimal weighting matrix $\mathbf{\Sigma}_w$ for both the ridgeless ($\lambda\to 0$) and optimally regularized ($\lambda = \lambda_{\rm opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

regression

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

no code implementations ICLR 2020 Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.

Inductive Bias Vocal Bursts Valence Prediction

Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

no code implementations pproximateinference AABI Symposium 2019 Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.

LEMMA

Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond

no code implementations NeurIPS 2019 Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu

In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme.

Numerical Integration

Modeling the Biological Pathology Continuum with HSIC-regularized Wasserstein Auto-encoders

no code implementations20 Jan 2019 Denny Wu, Hirofumi Kobayashi, Charles Ding, Lei Cheng, Keisuke Goda Marzyeh Ghassemi

A crucial challenge in image-based modeling of biomedical data is to identify trends and features that separate normality and pathology.

Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator

no code implementations ICLR 2019 Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Ichiro Takeuchi, Ruslan Salakhutdinov, Kenji Fukumizu

In the paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions.

Binary Classification Change Point Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.