no code implementations • 24 Jan 2024 • Lennart Dabelow, Masahito Ueda
Machine-learning methods are gradually being adopted in a great variety of social, economic, and scientific contexts, yet they are notorious for struggling with exact mathematics.
no code implementations • 13 Aug 2023 • Liu Ziyin, Hongchao Li, Masahito Ueda
The stochastic gradient descent (SGD) algorithm is the algorithm we use to train neural networks.
no code implementations • 23 Mar 2023 • Liu Ziyin, Botao Li, Tomer Galanti, Masahito Ueda
Characterizing and understanding the stability of Stochastic Gradient Descent (SGD) remains an open problem in deep learning.
no code implementations • 2 Oct 2022 • Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka
Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL).
1 code implementation • 2 Sep 2022 • Lennart Dabelow, Masahito Ueda
Restricted Boltzmann Machines (RBMs) offer a versatile architecture for unsupervised machine learning that can in principle approximate any target probability distribution with arbitrary accuracy.
no code implementations • 25 May 2022 • Liu Ziyin, Masahito Ueda
This work reports deep-learning-unique first-order and second-order phase transitions, whose phenomenology closely follows that in statistical physics.
no code implementations • 30 Jan 2022 • Liu Ziyin, HANLIN ZHANG, Xiangming Meng, Yuting Lu, Eric Xing, Masahito Ueda
This work theoretically studies stochastic neural networks, a main type of neural network in use.
no code implementations • 28 Jan 2022 • Takashi Mori, Masahito Ueda
It has been recognized that heavily overparameterized deep neural networks (DNNs) exhibit surprisingly good generalization performance in various machine-learning tasks.
no code implementations • ICLR 2022 • Liu Ziyin, Botao Li, James B Simon, Masahito Ueda
Stochastic gradient descent (SGD) is widely used for the nonlinear, nonconvex problem of training deep neural networks, but its behavior remains poorly understood.
no code implementations • 29 Sep 2021 • Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda
Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.
no code implementations • ICLR 2022 • Zhikang T. Wang, Masahito Ueda
Despite the empirical success of the deep Q network (DQN) reinforcement learning algorithm and its variants, DQN is still not well understood and it does not guarantee convergence.
no code implementations • 25 Jul 2021 • Liu Ziyin, Botao Li, James B. Simon, Masahito Ueda
Previous works on stochastic gradient descent (SGD) often focus on its success.
1 code implementation • 29 Jun 2021 • Zhikang T. Wang, Masahito Ueda
Despite the empirical success of the deep Q network (DQN) reinforcement learning algorithm and its variants, DQN is still not well understood and it does not guarantee convergence.
no code implementations • 20 May 2021 • Takashi Mori, Liu Ziyin, Kangqiao Liu, Masahito Ueda
Stochastic gradient descent (SGD) undergoes complicated multiplicative noise for the mean-square loss.
no code implementations • ICLR 2022 • Liu Ziyin, Kangqiao Liu, Takashi Mori, Masahito Ueda
The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning.
no code implementations • 24 Dec 2020 • Norifumi Matsumoto, Masaya Nakagawa, Masahito Ueda
The Yang-Lee edge singularity is a quintessential nonunitary critical phenomenon accompanied by anomalous scaling laws.
Statistical Mechanics Quantum Physics
no code implementations • 7 Dec 2020 • Kangqiao Liu, Liu Ziyin, Masahito Ueda
In the vanishing learning rate regime, stochastic gradient descent (SGD) is now relatively well understood.
no code implementations • 28 Sep 2020 • Takashi Mori, Masahito Ueda
Recent studies have demonstrated that noise in stochastic gradient descent (SGD) is closely related to generalization: A larger SGD noise, if not too large, results in better generalization.
no code implementations • 7 Sep 2020 • Takumi Yoshino, Shunsuke Furukawa, Masahito Ueda
We study the entanglement entropy and spectrum between components in binary Bose-Einstein condensates in $d$ spatial dimensions.
Quantum Gases Statistical Mechanics Quantum Physics
3 code implementations • NeurIPS 2020 • Liu Ziyin, Tilman Hartwig, Masahito Ueda
Previous literature offers limited clues on how to learn a periodic function using modern neural networks.
no code implementations • 26 May 2020 • Takashi Mori, Masahito Ueda
It is shown that the NTK does not correctly capture the depth dependence of the generalization performance, which indicates the importance of the feature learning rather than the lazy learning.
no code implementations • 25 Mar 2020 • Liu Ziyin, ZiHao Wang, Makoto Yamada, Masahito Ueda
We propose a novel regularization method, called \textit{volumization}, for neural networks.
no code implementations • 16 Feb 2020 • Liu Ziyin, Blair Chen, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda
Learning in the presence of label noise is a challenging yet important task: it is crucial to design models that are robust in the presence of mislabeled datasets.
1 code implementation • 12 Feb 2020 • Liu Ziyin, Zhikang T. Wang, Masahito Ueda
We also bound the regret of Laprop on a convex problem and show that our bound differs from that of Adam by a key factor, which demonstrates its advantage.
1 code implementation • 21 Oct 2019 • Zhikang T. Wang, Yuto Ashida, Masahito Ueda
We generalize a standard benchmark of reinforcement learning, the classical cartpole balancing problem, to the quantum regime by stabilizing a particle in an unstable potential through measurement and feedback.
no code implementations • 25 Sep 2019 • Liu Ziyin, Ru Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda
Learning in the presence of label noise is a challenging yet important task.
3 code implementations • NeurIPS 2019 • Liu Ziyin, Zhikang Wang, Paul Pu Liang, Ruslan Salakhutdinov, Louis-Philippe Morency, Masahito Ueda
We deal with the \textit{selective classification} problem (supervised-learning problem with a rejection option), where we want to achieve the best performance at a certain level of coverage of the data.