Search Results for author: Taiji Suzuki

Found 96 papers, 8 papers with code

Mechanistic Design and Scaling of Hybrid Architectures

no code implementations26 Mar 2024 Michael Poli, Armin W Thomas, Eric Nguyen, Pragaash Ponnusamy, Björn Deiseroth, Kristian Kersting, Taiji Suzuki, Brian Hie, Stefano Ermon, Christopher Ré, Ce Zhang, Stefano Massaroli

The development of deep learning architectures is a resource-demanding process, due to a vast design space, long prototyping times, and high compute costs associated with at-scale model training and evaluation.

Mean-field Analysis on Two-layer Neural Networks from a Kernel Perspective

no code implementations22 Mar 2024 Shokichi Takakura, Taiji Suzuki

In this paper, we study the feature learning ability of two-layer neural networks in the mean-field regime through the lens of kernel methods.

How do Transformers perform In-Context Autoregressive Learning?

no code implementations8 Feb 2024 Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré

More precisely, focusing on commuting orthogonal matrices $W$, we first show that a trained one-layer linear Transformer implements one step of gradient descent for the minimization of an inner objective function, when considering augmented tokens.

Language Modelling

Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape

no code implementations2 Feb 2024 Juno Kim, Taiji Suzuki

However, existing theoretical studies on how this phenomenon arises are limited to the dynamics of a single layer of attention trained on linear regression tasks.

In-Context Learning

Symmetric Mean-field Langevin Dynamics for Distributional Minimax Problems

no code implementations2 Dec 2023 Juno Kim, Kakei Yamamoto, Kazusato Oko, Zhuoran Yang, Taiji Suzuki

In this paper, we extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.

Scalable Federated Learning for Clients with Different Input Image Sizes and Numbers of Output Categories

no code implementations15 Nov 2023 Shuhei Nitta, Taiji Suzuki, Albert Rodríguez Mulet, Atsushi Yaguchi, Ryusuke Hirai

In this paper, we propose an effective federated learning method named ScalableFL, where the depths and widths of the local models for each client are adjusted according to the clients' input image size and the numbers of output categories.

Federated Learning Image Classification +3

Learning Green's Function Efficiently Using Low-Rank Approximations

1 code implementation1 Aug 2023 Kishan Wimalawarne, Taiji Suzuki, Sophie Langer

Learning the Green's function using deep learning models enables to solve different classes of partial differential equations.

Graph Neural Networks Provably Benefit from Structural Information: A Feature Learning Perspective

no code implementations24 Jun 2023 Wei Huang, Yuan Cao, Haonan Wang, Xin Cao, Taiji Suzuki

Graph neural networks (GNNs) have pioneered advancements in graph representation learning, exhibiting superior feature learning and performance over multilayer perceptrons (MLPs) when handling graph inputs.

Graph Representation Learning Learning Theory +1

Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction

no code implementations12 Jun 2023 Taiji Suzuki, Denny Wu, Atsushi Nitanda

Despite the generality of our results, we achieve an improved convergence rate in both the SGD and SVRG settings when specialized to the standard Langevin dynamics.

Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input

no code implementations30 May 2023 Shokichi Takakura, Taiji Suzuki

Despite the great success of Transformer networks in various applications such as natural language processing and computer vision, their theoretical aspects are not well understood.

Tight and fast generalization error bound of graph embedding in metric space

no code implementations13 May 2023 Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, Jing Wang, Feng Tian, Kenji Yamanishi

However, recent theoretical analyses have shown a much higher upper bound on non-Euclidean graph embedding's generalization error than Euclidean one's, where a high generalization error indicates that the incompleteness and noise in the data can significantly damage learning performance.

Graph Embedding

Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum Problems

no code implementations6 Mar 2023 Atsushi Nitanda, Kazusato Oko, Denny Wu, Nobuhito Takenouchi, Taiji Suzuki

The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures -- such an objective naturally arises in the optimization of a two-layer neural network in the mean-field regime.

Image Generation

Diffusion Models are Minimax Optimal Distribution Estimators

no code implementations3 Mar 2023 Kazusato Oko, Shunta Akiyama, Taiji Suzuki

While efficient distribution learning is no doubt behind the groundbreaking success of diffusion modeling, its theoretical guarantees are quite limited.

Koopman-based generalization bound: New aspect for full-rank weights

no code implementations12 Feb 2023 Yuka Hashimoto, Sho Sonoda, Isao Ishikawa, Atsushi Nitanda, Taiji Suzuki

Our bound is tighter than existing norm-based bounds when the condition numbers of weight matrices are small.

DIFF2: Differential Private Optimization via Gradient Differences for Nonconvex Distributed Learning

no code implementations8 Feb 2023 Tomoya Murata, Taiji Suzuki

In the previous work, the best known utility bound is $\widetilde O(\sqrt{d}/(n\varepsilon_\mathrm{DP}))$ in terms of the squared full gradient norm, which is achieved by Differential Private Gradient Descent (DP-GD) as an instance, where $n$ is the sample size, $d$ is the problem dimensionality and $\varepsilon_\mathrm{DP}$ is the differential privacy parameter.

Graph Polynomial Convolution Models for Node Classification of Non-Homophilous Graphs

no code implementations12 Sep 2022 Kishan Wimalawarne, Taiji Suzuki

Additionally, we propose adaptive learning between directly graph polynomial convolution models and learning directly from the adjacency matrix.

Generalization Bounds Node Classification

Versatile Single-Loop Method for Gradient Estimator: First and Second Order Optimality, and its Application to Federated Learning

no code implementations1 Sep 2022 Kazusato Oko, Shunta Akiyama, Tomoya Murata, Taiji Suzuki

While variance reduction methods have shown great success in solving large scale optimization problems, many of them suffer from accumulated errors and, therefore, should periodically require the full gradient computation.

Federated Learning

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

no code implementations30 May 2022 Shunta Akiyama, Taiji Suzuki

While deep learning has outperformed other methods for various tasks, theoretical frameworks that explain its reason have not been fully established.

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations3 May 2022 Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

1 code implementation30 Mar 2022 Yuri Kinoshita, Taiji Suzuki

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications.

Convergence Error Analysis of Reflected Gradient Langevin Dynamics for Globally Optimizing Non-Convex Constrained Problems

no code implementations19 Mar 2022 Kanji Sato, Akiko Takeda, Reiichiro Kawai, Taiji Suzuki

Gradient Langevin dynamics and a variety of its variants have attracted increasing attention owing to their convergence towards the global optimal solution, initially in the unconstrained convex framework while recently even in convex constrained non-convex problems.

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

no code implementations12 Feb 2022 Tomoya Murata, Taiji Suzuki

In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time.

Distributed Optimization Federated Learning

Convex Analysis of the Mean Field Langevin Dynamics

no code implementations25 Jan 2022 Atsushi Nitanda, Denny Wu, Taiji Suzuki

In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.

A Scaling Law for Syn-to-Real Transfer: How Much Is Your Pre-training Effective?

no code implementations29 Sep 2021 Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi

Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks.

Image Generation Transfer Learning

Learnability of convolutional neural networks for infinite dimensional input via mixed and anisotropic smoothness

no code implementations ICLR 2022 Sho Okumoto, Taiji Suzuki

Although the approximation and estimation errors of neural networks are affected by the curse of dimensionality in the existing analyses for typical function spaces such as the \Holder and Besov spaces, we show that, by considering anisotropic smoothness, they can alleviate exponential dependency on the dimensionality but they only depend on the smoothness of the target functions.

speech-recognition Speech Recognition

Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization

no code implementations ICLR 2022 Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu

We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

no code implementations29 Sep 2021 Hiroki Naganuma, Taiji Suzuki, Rio Yokota, Masahiro Nomura, Kohta Ishikawa, Ikuro Sato

Generalization measures are intensively studied in the machine learning community for better modeling generalization gaps.

Hyperparameter Optimization

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

1 code implementation25 Aug 2021 Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi

Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks.

Image Generation Transfer Learning

AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network

no code implementations5 Aug 2021 Chihiro Watanabe, Taiji Suzuki

However, it is limited to a two-mode reordering (i. e., the rows and columns are reordered separately) and it cannot be applied in the one-mode setting (i. e., the same node order is used for reordering both rows and columns), owing to the characteristics of its model architecture.

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

no code implementations11 Jun 2021 Shunta Akiyama, Taiji Suzuki

Deep learning empirically achieves high performance in many applications, but its training dynamics has not been fully understood theoretically.

Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis

no code implementations NeurIPS 2021 Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

Deep Two-Way Matrix Reordering for Relational Data Analysis

no code implementations26 Mar 2021 Chihiro Watanabe, Taiji Suzuki

This denoised mean matrix can be used to visualize the global structure of the reordered observed matrix.

Vocal Bursts Valence Prediction

A Goodness-of-fit Test on the Number of Biclusters in a Relational Data Matrix

no code implementations23 Feb 2021 Chihiro Watanabe, Taiji Suzuki

Biclustering is a method for detecting homogeneous submatrices in a given observed matrix, and it is an effective tool for relational data analysis.

Clustering

Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

no code implementations5 Feb 2021 Tomoya Murata, Taiji Suzuki

Recently, local SGD has got much attention and been extensively studied in the distributed learning community to overcome the communication bottleneck problem.

Distributed Optimization Federated Learning

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

no code implementations NeurIPS 2021 Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

no code implementations ICLR 2021 Taiji Suzuki, Shunta Akiyama

Establishing a theoretical analysis that explains why deep learning can outperform shallow learning such as kernel methods is one of the biggest issues in the deep learning literature.

Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space

no code implementations23 Sep 2020 Kazuma Tsuji, Taiji Suzuki

In this study, we focus on the adaptivity of deep learning; consequently, we treat the variable exponent Besov space, which has a different smoothness depending on the input location $x$.

speech-recognition Speech Recognition

MSR-DARTS: Minimum Stable Rank of Differentiable Architecture Search

no code implementations19 Sep 2020 Kengo Machida, Kuniaki Uto, Koichi Shinoda, Taiji Suzuki

To overcome this problem, we propose a method called minimum stable rank DARTS (MSR-DARTS), for finding a model with the best generalization error by replacing architecture optimization with the selection process using the minimum stable rank criterion.

Neural Architecture Search

Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding

no code implementations30 Jul 2020 Akira Nakagawa, Keizo Kato, Taiji Suzuki

According to the Rate-distortion theory, the optimal transform coding is achieved by using an orthonormal transform with PCA basis where the transform space is isometric to the input.

Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

no code implementations NeurIPS 2020 Taiji Suzuki

Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

no code implementations ICLR 2021 Atsushi Nitanda, Taiji Suzuki

In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK.

Gradient Descent in RKHS with Importance Labeling

no code implementations19 Jun 2020 Tomoya Murata, Taiji Suzuki

In this paper, we study importance labeling problem, in which we are given many unlabeled data and select a limited number of data to be labeled from the unlabeled data, and then a learning algorithm is executed on the selected one.

When Does Preconditioning Help or Hurt Generalization?

no code implementations ICLR 2021 Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

regression Second-order methods

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

1 code implementation NeurIPS 2020 Kenta Oono, Taiji Suzuki

By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the number of node aggregations under some conditions.

Learning Theory Transductive Learning

Selective Inference for Latent Block Models

no code implementations27 May 2020 Chihiro Watanabe, Taiji Suzuki

In this case, it becomes crucial to consider the selective bias in the block structure, that is, the block structure is selected from all the possible cluster memberships based on some criterion by the clustering algorithm.

Clustering Model Selection

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

no code implementations ICLR 2020 Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.

Inductive Bias Vocal Bursts Valence Prediction

Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding Meta-Amortization Error

no code implementations4 Mar 2020 Yusuke Hayashi, Taiji Suzuki

To address this challenge, we design a novel meta-regularization objective using {\it cyclical annealing schedule} and {\it maximum mean discrepancy} (MMD) criterion.

Few-Shot Learning

Dimension-free convergence rates for gradient Langevin dynamics in RKHS

no code implementations29 Feb 2020 Boris Muzellec, Kanji Sato, Mathurin Massias, Taiji Suzuki

In this work, we provide a convergence analysis of GLD and SGLD when the optimization space is an infinite dimensional Hilbert space.

Understanding Generalization in Deep Learning via Tensor Methods

no code implementations14 Jan 2020 Jingling Li, Yanchao Sun, Jiahao Su, Taiji Suzuki, Furong Huang

Recently proposed complexity measures have provided insights to understanding the generalizability in neural networks from perspectives of PAC-Bayes, robustness, overparametrization, compression and so on.

Domain Adaptation Regularization for Spectral Pruning

no code implementations26 Dec 2019 Laurent Dillard, Yosuke Shinya, Taiji Suzuki

We also show that our method outperforms an existing compression method studied in the DA setting by a large margin for high compression rates.

Domain Adaptation Model Compression

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

no code implementations13 Nov 2019 Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms.

Binary Classification Classification +1

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

no code implementations NeurIPS 2021 Taiji Suzuki, Atsushi Nitanda

The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

no code implementations pproximateinference AABI Symposium 2019 Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.

LEMMA

Scalable Deep Neural Networks via Low-Rank Matrix Factorization

no code implementations25 Sep 2019 Atsushi Yaguchi, Taiji Suzuki, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa

Compressing deep neural networks (DNNs) is important for real-world applications operating on resource-constrained devices.

Image Classification

Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

no code implementations ICLR 2020 Taiji Suzuki, Hiroshi Abe, Tomoaki Nishimura

However, the compression based bound can be applied only to a compressed network, and it is not applicable to the non-compressed original network.

Learning Theory

Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum

no code implementations9 Sep 2019 Yosuke Shinya, Edgar Simo-Serra, Taiji Suzuki

Furthermore, we propose a method for automatically determining the widths (the numbers of channels) of object detectors based on the eigenspectrum.

Image Classification Object +2

Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD

no code implementations26 Jun 2019 Kosuke Haruki, Taiji Suzuki, Yohei Hamakawa, Takeshi Toda, Ryuji Sakai, Masahiro Ozawa, Mitsuhiro Kimura

Large-batch stochastic gradient descent (SGD) is widely used for training in distributed deep learning because of its training-time efficiency, however, extremely large-batch SGD leads to poor generalization and easily converges to sharp minima, which prevents naive large-scale data-parallel SGD (DP-SGD) from converging to good minima.

Goodness-of-fit Test for Latent Block Models

no code implementations10 Jun 2019 Chihiro Watanabe, Taiji Suzuki

Latent block models are used for probabilistic biclustering, which is shown to be an effective method for analyzing various relational data sets.

Accelerated Sparsified SGD with Error Feedback

no code implementations29 May 2019 Tomoya Murata, Taiji Suzuki

Several work has shown that {\it{sparsified}} stochastic gradient descent method (SGD) with {\it{error feedback}} asymptotically achieves the same rate as (non-sparsified) parallel SGD.

Distributed Optimization

Graph Neural Networks Exponentially Lose Expressive Power for Node Classification

1 code implementation ICLR 2020 Kenta Oono, Taiji Suzuki

We show that when the Erd\H{o}s -- R\'{e}nyi graph is sufficiently dense and large, a broad range of GCNs on it suffers from the "information loss" in the limit of infinite layers with high probability.

Classification General Classification +1

Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

no code implementations23 May 2019 Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki

Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out.

General Classification Generalization Bounds

On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces

no code implementations22 May 2019 Satoshi Hayakawa, Taiji Suzuki

Whereas existing theoretical studies of deep learning have been based mainly on mathematical theories of well-known function classes such as H\"{o}lder and Besov classes, we focus on function classes with discontinuity and sparsity, which are those naturally assumed in practice.

Approximation and non-parametric estimation of ResNet-type convolutional neural networks via block-sparse fully-connected neural networks

no code implementations ICLR 2019 Kenta Oono, Taiji Suzuki

We develop new approximation and statistical learning theories of convolutional neural networks (CNNs) via the ResNet-type structure where the channel size, filter size, and width are fixed.

Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks

no code implementations24 Mar 2019 Kenta Oono, Taiji Suzuki

The key idea is that we can replicate the learning ability of Fully-connected neural networks (FNNs) by tailored CNNs, as long as the FNNs have \textit{block-sparse} structures.

Vocal Bursts Type Prediction

Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks

no code implementations19 Dec 2018 Atsushi Yaguchi, Taiji Suzuki, Wataru Asano, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa

In recent years, deep neural networks (DNNs) have been applied to various machine leaning tasks, including image recognition, speech recognition, and machine translation.

Machine Translation speech-recognition +2

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality

no code implementations ICLR 2019 Taiji Suzuki

In addition to this, it is shown that deep learning can avoid the curse of dimensionality if the target function is in a mixed smooth Besov space.

Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

no code implementations NeurIPS 2018 Tomoya Murata, Taiji Suzuki

We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times.

Attribute

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

no code implementations14 Jun 2018 Atsushi Nitanda, Taiji Suzuki

In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.

Binary Classification Classification +1

Cross-domain Recommendation via Deep Domain Adaptation

no code implementations8 Mar 2018 Heishiro Kanagawa, Hayato Kobayashi, Nobuyuki Shimizu, Yukihiro Tagami, Taiji Suzuki

The behavior of users in certain services could be a clue that can be used to infer their preferences and may be used to make recommendations for other services they have never used.

Collaborative Filtering Denoising +2

Functional Gradient Boosting based on Residual Network Perception

no code implementations ICML 2018 Atsushi Nitanda, Taiji Suzuki

Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well.

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

no code implementations7 Jan 2018 Atsushi Nitanda, Taiji Suzuki

In this paper, this phenomenon is explained from the functional gradient method perspective of the gradient layer.

Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables

no code implementations6 Nov 2017 Masaaki Takada, Taiji Suzuki, Hironori Fujisawa

However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features.

regression

Fast learning rate of deep learning via a kernel perspective

no code implementations29 May 2017 Taiji Suzuki

Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one.

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

no code implementations NeurIPS 2017 Tomoya Murata, Taiji Suzuki

In this paper, we develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings.

Learning Sparse Structural Changes in High-dimensional Markov Networks: A Review on Methodologies and Theories

no code implementations6 Jan 2017 Song Liu, Kenji Fukumizu, Taiji Suzuki

Recent years have seen an increasing popularity of learning the sparse \emph{changes} in Markov Networks.

Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning

no code implementations NeurIPS 2016 Taiji Suzuki, Heishiro Kanagawa, Hayato Kobayashi, Nobuyuki Shimizu, Yukihiro Tagami

We investigate the statistical performance and computational efficiency of the alternating minimization procedure for nonparametric tensor learning.

Computational Efficiency

Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems

no code implementations8 Mar 2016 Tomoya Murata, Taiji Suzuki

We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning.

BIG-bench Machine Learning

Structure Learning of Partitioned Markov Networks

no code implementations2 Apr 2015 Song Liu, Taiji Suzuki, Masashi Sugiyama, Kenji Fukumizu

We learn the structure of a Markov Network between two groups of random variables from joint observations.

Time Series Time Series Analysis

Spectral norm of random tensors

no code implementations7 Jul 2014 Ryota Tomioka, Taiji Suzuki

We show that the spectral norm of a random $n_1\times n_2\times \cdots \times n_K$ tensor (or higher-order array) scales as $O\left(\sqrt{(\sum_{k=1}^{K}n_k)\log(K)}\right)$ under some sub-Gaussian assumption on the entries.

Stochastic Dual Coordinate Ascent with Alternating Direction Multiplier Method

no code implementations4 Nov 2013 Taiji Suzuki

We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems.

Convex Tensor Decomposition via Structured Schatten Norm Regularization

no code implementations NeurIPS 2013 Ryota Tomioka, Taiji Suzuki

We discuss structured Schatten norms for tensor decomposition that includes two recently proposed norms ("overlapped" and "latent") for convex-optimization-based tensor decomposition, and connect tensor decomposition with wider literature on structured sparsity.

Tensor Decomposition

Density-Difference Estimation

no code implementations NeurIPS 2012 Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi

A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference.

Change Point Detection

Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness

no code implementations2 Mar 2012 Taiji Suzuki, Masashi Sugiyama

If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than $\ell_1$-regularization; otherwise, a faster convergence rate for the $\ell_1$-regularization is shown.

Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning

no code implementations NeurIPS 2011 Taiji Suzuki

Finally, we show that, when the complexities of candidate reproducing kernel Hilbert spaces are inhomogeneous, dense type regularization shows better learning rate compared with sparse ℓ1 regularization.

Vocal Bursts Type Prediction

Relative Density-Ratio Estimation for Robust Distribution Comparison

no code implementations NeurIPS 2011 Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, Masashi Sugiyama

Divergence estimators based on direct approximation of density-ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test.

Density Ratio Estimation Outlier Detection +1

Condition Number Analysis of Kernel-based Density Ratio Estimation

1 code implementation15 Dec 2009 Takafumi Kanamori, Taiji Suzuki, Masashi Sugiyama

We show that the kernel least-squares method has a smaller condition number than a version of kernel mean matching and other M-estimators, implying that the kernel least-squares method has preferable numerical properties.

Density Ratio Estimation feature selection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.