Search Results for author: Denny Wu

Found 18 papers, 0 papers with code

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

no code implementations • 15 Feb 2024 • Zhichao Wang, Denny Wu, Zhou Fan

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network.

Representation Learning

Paper
Add Code

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

no code implementations • 12 Jun 2023 • Taiji Suzuki, Denny Wu, Atsushi Nitanda

Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.

Paper
Add Code

Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

no code implementations • 6 Mar 2023 • Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki

The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime.

Image Generation

Paper
Add Code

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Paper
Add Code

Convex Analysis of the Mean Field Langevin Dynamics

no code implementations • 25 Jan 2022 • Atsushi Nitanda, Denny Wu, Taiji Suzuki

In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.

Paper
Add Code

Understanding the Variance Collapse of SVGD in High Dimensions

no code implementations • ICLR 2022 • Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang

Stein variational gradient descent (SVGD) is a deterministic inference algorithm that evolves a set of particles to fit a target distribution.

Computational Efficiency Vocal Bursts Intensity Prediction

Paper
Add Code

Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization

no code implementations • ICLR 2022 • Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu

We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.

Paper
Add Code

Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis

no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

Paper
Add Code

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

no code implementations • NeurIPS 2021 • Atsushi Nitanda, Denny Wu, Taiji Suzuki

Paper
Add Code

When Does Preconditioning Help or Hurt Generalization?

no code implementations • ICLR 2021 • Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

regression Second-order methods

Paper
Add Code

On the Optimal Weighted $\ell_2$ Regularization in Overparameterized Linear Regression

no code implementations • NeurIPS 2020 • Denny Wu, Ji Xu

Finally, we determine the optimal weighting matrix $\mathbf{\Sigma}_w$ for both the ridgeless ($\lambda\to 0$) and optimally regularized ($\lambda = \lambda_{\rm opt}$) case, and demonstrate the advantage of the weighted objective over standard ridge regression and PCR.

regression

Paper
Add Code

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

no code implementations • ICLR 2020 • Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.

Inductive Bias Vocal Bursts Valence Prediction

Paper
Add Code

Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

no code implementations • pproximateinference AABI Symposium 2019 • Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.

LEMMA

Paper
Add Code

Stochastic Runge-Kutta Accelerates Langevin Monte Carlo and Beyond

no code implementations • NeurIPS 2019 • Xuechen Li, Denny Wu, Lester Mackey, Murat A. Erdogdu

In this paper, we establish the convergence rate of sampling algorithms obtained by discretizing smooth It\^o diffusions exhibiting fast Wasserstein-$2$ contraction, based on local deviation properties of the integration scheme.

Numerical Integration

Paper
Add Code

Modeling the Biological Pathology Continuum with HSIC-regularized Wasserstein Auto-encoders

no code implementations • 20 Jan 2019 • Denny Wu, Hirofumi Kobayashi, Charles Ding, Lei Cheng, Keisuke Goda Marzyeh Ghassemi

A crucial challenge in image-based modeling of biomedical data is to identify trends and features that separate normality and pathology.

Paper
Add Code

Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator

no code implementations • ICLR 2019 • Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Ichiro Takeuchi, Ruslan Salakhutdinov, Kenji Fukumizu

In the paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions.

Binary Classification Change Point Detection +1

Paper
Add Code

Selecting the Best in GANs Family: a Post Selection Inference Framework

no code implementations • 15 Feb 2018 • Yao-Hung Hubert Tsai, Makoto Yamada, Denny Wu, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu

"Which Generative Adversarial Networks (GANs) generates the most plausible images?"

Paper
Add Code

"Dependency Bottleneck" in Auto-encoding Architectures: an Empirical Study

no code implementations • 15 Feb 2018 • Denny Wu, Yixiu Zhao, Yao-Hung Hubert Tsai, Makoto Yamada, Ruslan Salakhutdinov

To address this issue, we propose to measure the dependency instead of MI between layers in DNNs.

Density Estimation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.