Search Results for author: Liu Ziyin

Found 29 papers, 8 papers with code

Loss Symmetry and Noise Equilibrium of Stochastic Gradient Descent

no code implementations11 Feb 2024 Liu Ziyin, Mingze Wang, Hongchao Li, Lei Wu

Symmetries exist abundantly in the loss function of neural networks.

Three Mechanisms of Feature Learning in the Exact Solution of a Latent Variable Model

no code implementations13 Jan 2024 Yizhou Xu, Liu Ziyin

We identify and exactly solve the learning dynamics of a one-hidden-layer linear model at any finite width whose limits exhibit both the kernel phase and the feature learning phase.

Symmetry Induces Structure and Constraint of Learning

no code implementations29 Sep 2023 Liu Ziyin

Due to common architecture designs, symmetries exist extensively in contemporary neural networks.

Law of Balance and Stationary Distribution of Stochastic Gradient Descent

no code implementations13 Aug 2023 Liu Ziyin, Hongchao Li, Masahito Ueda

The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks.

On the Stepwise Nature of Self-Supervised Learning

1 code implementation27 Mar 2023 James B. Simon, Maksis Knutins, Liu Ziyin, Daniel Geisz, Abraham J. Fetterman, Joshua Albrecht

We present a simple picture of the training process of joint embedding self-supervised learning methods.

Self-Supervised Learning

The Probabilistic Stability of Stochastic Gradient Descent

no code implementations23 Mar 2023 Liu Ziyin, Botao Li, Tomer Galanti, Masahito Ueda

Characterizing and understanding the stability of Stochastic Gradient Descent (SGD) remains an open problem in deep learning.

Learning Theory

spred: Solving $L_1$ Penalty with SGD

2 code implementations3 Oct 2022 Liu Ziyin, ZiHao Wang

We propose to minimize a generic differentiable objective with $L_1$ constraint using a simple reparametrization and straightforward stochastic gradient descent.

Inductive Bias Neural Network Compression

What shapes the loss landscape of self-supervised learning?

no code implementations2 Oct 2022 Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka

Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL).

Self-Supervised Learning

Exact Phase Transitions in Deep Learning

no code implementations25 May 2022 Liu Ziyin, Masahito Ueda

This work reports deep-learning-unique first-order and second-order phase transitions, whose phenomenology closely follows that in statistical physics.

Posterior Collapse of a Linear Latent Variable Model

no code implementations9 May 2022 ZiHao Wang, Liu Ziyin

This work identifies the existence and cause of a type of posterior collapse that frequently occurs in the Bayesian deep learning practice.

Exact Solutions of a Deep Linear Network

no code implementations10 Feb 2022 Liu Ziyin, Botao Li, Xiangming Meng

This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks.

Stochastic Neural Networks with Infinite Width are Deterministic

no code implementations30 Jan 2022 Liu Ziyin, HANLIN ZHANG, Xiangming Meng, Yuting Lu, Eric Xing, Masahito Ueda

This work theoretically studies stochastic neural networks, a main type of neural network in use.

Logarithmic landscape and power-law escape rate of SGD

no code implementations29 Sep 2021 Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda

Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.

SGD Can Converge to Local Maxima

no code implementations ICLR 2022 Liu Ziyin, Botao Li, James B Simon, Masahito Ueda

Stochastic gradient descent (SGD) is widely used for the nonlinear, nonconvex problem of training deep neural networks, but its behavior remains poorly understood.

SGD with a Constant Large Learning Rate Can Converge to Local Maxima

no code implementations25 Jul 2021 Liu Ziyin, Botao Li, James B. Simon, Masahito Ueda

Previous works on stochastic gradient descent (SGD) often focus on its success.

Theoretically Motivated Data Augmentation and Regularization for Portfolio Construction

1 code implementation8 Jun 2021 Liu Ziyin, Kentaro Minami, Kentaro Imajo

The task we consider is portfolio construction in a speculative market, a fundamental problem in modern finance.

Data Augmentation

Power-law escape rate of SGD

no code implementations20 May 2021 Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda

Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.

On the Distributional Properties of Adaptive Gradients

no code implementations15 May 2021 Zhang Zhiyi, Liu Ziyin

Adaptive gradient methods have achieved remarkable success in training deep neural networks on a wide variety of tasks.

Strength of Minibatch Noise in SGD

no code implementations ICLR 2022 Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda

The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning.

Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent

no code implementations7 Dec 2020 Kangqiao Liu, Liu Ziyin, Masahito Ueda

In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood.

Bayesian Inference Second-order methods

Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment

1 code implementation4 Dec 2020 Paul Pu Liang, Peter Wu, Liu Ziyin, Louis-Philippe Morency, Ruslan Salakhutdinov

In this work, we propose algorithms for cross-modal generalization: a learning paradigm to train a model that can (1) quickly perform new tasks in a target modality (i. e. meta-learning) and (2) doing so while being trained on a different source modality.

Meta-Learning

An Investigation of how Label Smoothing Affects Generalization

no code implementations23 Oct 2020 Blair Chen, Liu Ziyin, ZiHao Wang, Paul Pu Liang

In this paper, as a step towards understanding why label smoothing is effective, we propose a theoretical framework to show how label smoothing provides in controlling the generalization loss.

Neural Networks Fail to Learn Periodic Functions and How to Fix It

3 code implementations NeurIPS 2020 Liu Ziyin, Tilman Hartwig, Masahito Ueda

Previous literature offers limited clues on how to learn a periodic function using modern neural networks.

Inductive Bias

Volumization as a Natural Generalization of Weight Decay

no code implementations25 Mar 2020 Liu Ziyin, ZiHao Wang, Makoto Yamada, Masahito Ueda

We propose a novel regularization method, called \textit{volumization}, for neural networks.

Memorization

Learning Not to Learn in the Presence of Noisy Labels

no code implementations16 Feb 2020 Liu Ziyin, Blair Chen, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets.

Memorization text-classification +1

LaProp: Separating Momentum and Adaptivity in Adam

1 code implementation12 Feb 2020 Liu Ziyin, Zhikang T. Wang, Masahito Ueda

We also bound the regret of Laprop on a convex problem and show that our bound differs from that of Adam by a key factor, which demonstrates its advantage.

Style Transfer

Think Locally, Act Globally: Federated Learning with Local and Global Representations

4 code implementations6 Jan 2020 Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B. Allen, Randy P. Auerbach, David Brent, Ruslan Salakhutdinov, Louis-Philippe Morency

To this end, we propose a new federated learning algorithm that jointly learns compact local representations on each device and a global model across all devices.

Federated Learning Representation Learning +2

Deep Gamblers: Learning to Abstain with Portfolio Theory

3 code implementations NeurIPS 2019 Liu Ziyin, Zhikang Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

We deal with the \textit{selective classification} problem (supervised-learning problem with a rejection option), where we want to achieve the best performance at a certain level of coverage of the data.

Classification General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.