Search Results for author: Taiji Suzuki

Found 79 papers, 6 papers with code

Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student Settings and its Superiority to Kernel Methods

no code implementations30 May 2022 Shunta Akiyama, Taiji Suzuki

While deep learning has outperformed other methods for various tasks, theoretical frameworks that explain its reason have not been fully established.

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations3 May 2022 Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization

no code implementations30 Mar 2022 Yuri Kinoshita, Taiji Suzuki

The stochastic gradient Langevin Dynamics is one of the most fundamental algorithms to solve sampling problems and non-convex optimization appearing in several machine learning applications.

Convergence Error Analysis of Reflected Gradient Langevin Dynamics for Globally Optimizing Non-Convex Constrained Problems

no code implementations19 Mar 2022 Kanji Sato, Akiko Takeda, Reiichiro Kawai, Taiji Suzuki

This work analyzes reflected gradient Langevin dynamics (RGLD), a global optimization algorithm for smoothly constrained problems, including non-convex constrained ones, and derives a convergence rate to a solution with $\epsilon$-sampling error.

Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning

no code implementations12 Feb 2022 Tomoya Murata, Taiji Suzuki

In recent centralized nonconvex distributed learning and federated learning, local methods are one of the promising approaches to reduce communication time.

Distributed Optimization Federated Learning

Convex Analysis of the Mean Field Langevin Dynamics

no code implementations25 Jan 2022 Atsushi Nitanda, Denny Wu, Taiji Suzuki

In this work, we give a concise and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings.

Learnability of convolutional neural networks for infinite dimensional input via mixed and anisotropic smoothness

no code implementations ICLR 2022 Sho Okumoto, Taiji Suzuki

Although the approximation and estimation errors of neural networks are affected by the curse of dimensionality in the existing analyses for typical function spaces such as the \Holder and Besov spaces, we show that, by considering anisotropic smoothness, they can alleviate exponential dependency on the dimensionality but they only depend on the smoothness of the target functions.

Natural Language Processing speech-recognition +1

A Scaling Law for Syn-to-Real Transfer: How Much Is Your Pre-training Effective?

no code implementations29 Sep 2021 Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi

Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks.

Image Generation Transfer Learning

Understanding the Variance Collapse of SVGD in High Dimensions

no code implementations ICLR 2022 Jimmy Ba, Murat A Erdogdu, Marzyeh Ghassemi, Shengyang Sun, Taiji Suzuki, Denny Wu, Tianzong Zhang

Stein variational gradient descent (SVGD) is a deterministic inference algorithm that evolves a set of particles to fit a target distribution.

Takeuchi's Information Criteria as Generalization Measures for DNNs Close to NTK Regime

no code implementations29 Sep 2021 Hiroki Naganuma, Taiji Suzuki, Rio Yokota, Masahiro Nomura, Kohta Ishikawa, Ikuro Sato

Generalization measures are intensively studied in the machine learning community for better modeling generalization gaps.

Hyperparameter Optimization

Particle Stochastic Dual Coordinate Ascent: Exponential convergent algorithm for mean field neural network optimization

no code implementations ICLR 2022 Kazusato Oko, Taiji Suzuki, Atsushi Nitanda, Denny Wu

We introduce Particle-SDCA, a gradient-based optimization algorithm for two-layer neural networks in the mean field regime that achieves exponential convergence rate in regularized empirical risk minimization.

A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?

1 code implementation25 Aug 2021 Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi

Synthetic-to-real transfer learning is a framework in which a synthetically generated dataset is used to pre-train a model to improve its performance on real vision tasks.

Image Generation Transfer Learning

AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network

no code implementations5 Aug 2021 Chihiro Watanabe, Taiji Suzuki

However, it is limited to a two-mode reordering (i. e., the rows and columns are reordered separately) and it cannot be applied in the one-mode setting (i. e., the same node order is used for reordering both rows and columns), owing to the characteristics of its model architecture.

On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting

no code implementations11 Jun 2021 Shunta Akiyama, Taiji Suzuki

Deep learning empirically achieves high performance in many applications, but its training dynamics has not been fully understood theoretically.

Particle Dual Averaging: Optimization of Mean Field Neural Network with Global Convergence Rate Analysis

no code implementations NeurIPS 2021 Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

Deep Two-Way Matrix Reordering for Relational Data Analysis

no code implementations26 Mar 2021 Chihiro Watanabe, Taiji Suzuki

This denoised mean matrix can be used to visualize the global structure of the reordered observed matrix.

A Goodness-of-fit Test on the Number of Biclusters in a Relational Data Matrix

no code implementations23 Feb 2021 Chihiro Watanabe, Taiji Suzuki

Biclustering is a method for detecting homogeneous submatrices in a given observed matrix, and it is an effective tool for relational data analysis.

Bias-Variance Reduced Local SGD for Less Heterogeneous Federated Learning

no code implementations5 Feb 2021 Tomoya Murata, Taiji Suzuki

Recently, local SGD has got much attention and been extensively studied in the distributed learning community to overcome the communication bottleneck problem.

Distributed Optimization Federated Learning

Particle Dual Averaging: Optimization of Mean Field Neural Networks with Global Convergence Rate Analysis

no code implementations NeurIPS 2021 Atsushi Nitanda, Denny Wu, Taiji Suzuki

An important application of the proposed method is the optimization of neural network in the mean field regime, which is theoretically attractive due to the presence of nonlinear feature learning, but quantitative convergence rate can be challenging to obtain.

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

no code implementations ICLR 2021 Taiji Suzuki, Shunta Akiyama

Establishing a theoretical analysis that explains why deep learning can outperform shallow learning such as kernel methods is one of the biggest issues in the deep learning literature.

Estimation error analysis of deep learning on the regression problem on the variable exponent Besov space

no code implementations23 Sep 2020 Kazuma Tsuji, Taiji Suzuki

In this study, we focus on the adaptivity of deep learning; consequently, we treat the variable exponent Besov space, which has a different smoothness depending on the input location $x$.

speech-recognition Speech Recognition

MSR-DARTS: Minimum Stable Rank of Differentiable Architecture Search

no code implementations19 Sep 2020 Kengo Machida, Kuniaki Uto, Koichi Shinoda, Taiji Suzuki

To overcome this problem, we propose a method called minimum stable rank DARTS (MSR-DARTS), for finding a model with the best generalization error by replacing architecture optimization with the selection process using the minimum stable rank criterion.

Neural Architecture Search

Quantitative Understanding of VAE as a Non-linearly Scaled Isometric Embedding

no code implementations30 Jul 2020 Akira Nakagawa, Keizo Kato, Taiji Suzuki

According to the Rate-distortion theory, the optimal transform coding is achieved by using an orthonormal transform with PCA basis where the transform space is isometric to the input.

Generalization bound of globally optimal non-convex neural network training: Transportation map estimation by infinite dimensional Langevin dynamics

no code implementations NeurIPS 2020 Taiji Suzuki

Existing frameworks such as mean field theory and neural tangent kernel theory for neural network optimization analysis typically require taking limit of infinite width of the network to show its global convergence.

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

no code implementations ICLR 2021 Atsushi Nitanda, Taiji Suzuki

In this study, we show that the averaged stochastic gradient descent can achieve the minimax optimal convergence rate, with the global convergence guarantee, by exploiting the complexities of the target function and the RKHS associated with the NTK.

Gradient Descent in RKHS with Importance Labeling

no code implementations19 Jun 2020 Tomoya Murata, Taiji Suzuki

In this paper, we study importance labeling problem, in which we are given many unlabeled data and select a limited number of data to be labeled from the unlabeled data, and then a learning algorithm is executed on the selected one.

When Does Preconditioning Help or Hurt Generalization?

no code implementations ICLR 2021 Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda, Taiji Suzuki, Denny Wu, Ji Xu

While second order optimizers such as natural gradient descent (NGD) often speed up optimization, their effect on generalization has been called into question.

Second-order methods

Optimization and Generalization Analysis of Transduction through Gradient Boosting and Application to Multi-scale Graph Neural Networks

1 code implementation NeurIPS 2020 Kenta Oono, Taiji Suzuki

By combining it with generalization gap bounds in terms of transductive Rademacher complexity, we show that a test error bound of a specific type of multi-scale GNNs that decreases corresponding to the number of node aggregations under some conditions.

Learning Theory

Selective Inference for Latent Block Models

no code implementations27 May 2020 Chihiro Watanabe, Taiji Suzuki

In this case, it becomes crucial to consider the selective bias in the block structure, that is, the block structure is selected from all the possible cluster memberships based on some criterion by the clustering algorithm.

Model Selection

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

no code implementations ICLR 2020 Jimmy Ba, Murat Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang

This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i. e. when the number of samples $n$, features $d$, and neurons $h$ tend to infinity at the same rate.

Inductive Bias

Meta Cyclical Annealing Schedule: A Simple Approach to Avoiding Meta-Amortization Error

no code implementations4 Mar 2020 Yusuke Hayashi, Taiji Suzuki

To address this challenge, we design a novel meta-regularization objective using {\it cyclical annealing schedule} and {\it maximum mean discrepancy} (MMD) criterion.

Few-Shot Learning

Dimension-free convergence rates for gradient Langevin dynamics in RKHS

no code implementations29 Feb 2020 Boris Muzellec, Kanji Sato, Mathurin Massias, Taiji Suzuki

In this work, we provide a convergence analysis of GLD and SGLD when the optimization space is an infinite dimensional Hilbert space.

Understanding Generalization in Deep Learning via Tensor Methods

no code implementations14 Jan 2020 Jingling Li, Yanchao Sun, Jiahao Su, Taiji Suzuki, Furong Huang

Recently proposed complexity measures have provided insights to understanding the generalizability in neural networks from perspectives of PAC-Bayes, robustness, overparametrization, compression and so on.

Domain Adaptation Regularization for Spectral Pruning

no code implementations26 Dec 2019 Laurent Dillard, Yosuke Shinya, Taiji Suzuki

We also show that our method outperforms an existing compression method studied in the DA setting by a large margin for high compression rates.

Computer Vision Domain Adaptation +1

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

no code implementations13 Nov 2019 Shingo Yashima, Atsushi Nitanda, Taiji Suzuki

To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms.

Classification General Classification

Deep learning is adaptive to intrinsic dimensionality of model smoothness in anisotropic Besov space

no code implementations NeurIPS 2021 Taiji Suzuki, Atsushi Nitanda

The results show that deep learning has better dependence on the input dimensionality if the target function possesses anisotropic smoothness, and it achieves an adaptive rate for functions with spatially inhomogeneous smoothness.

Towards Characterizing the High-dimensional Bias of Kernel-based Particle Inference Algorithms

no code implementations pproximateinference AABI Symposium 2019 Jimmy Ba, Murat A. Erdogdu, Marzyeh Ghassemi, Taiji Suzuki, Shengyang Sun, Denny Wu, Tianzong Zhang

Particle-based inference algorithm is a promising method to efficiently generate samples for an intractable target distribution by iteratively updating a set of particles.

Scalable Deep Neural Networks via Low-Rank Matrix Factorization

no code implementations25 Sep 2019 Atsushi Yaguchi, Taiji Suzuki, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa

Compressing deep neural networks (DNNs) is important for real-world applications operating on resource-constrained devices.

Image Classification

Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network

no code implementations ICLR 2020 Taiji Suzuki, Hiroshi Abe, Tomoaki Nishimura

However, the compression based bound can be applied only to a compressed network, and it is not applicable to the non-compressed original network.

Learning Theory

Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum

no code implementations9 Sep 2019 Yosuke Shinya, Edgar Simo-Serra, Taiji Suzuki

Furthermore, we propose a method for automatically determining the widths (the numbers of channels) of object detectors based on the eigenspectrum.

Image Classification object-detection +1

Gradient Noise Convolution (GNC): Smoothing Loss Function for Distributed Large-Batch SGD

no code implementations26 Jun 2019 Kosuke Haruki, Taiji Suzuki, Yohei Hamakawa, Takeshi Toda, Ryuji Sakai, Masahiro Ozawa, Mitsuhiro Kimura

Large-batch stochastic gradient descent (SGD) is widely used for training in distributed deep learning because of its training-time efficiency, however, extremely large-batch SGD leads to poor generalization and easily converges to sharp minima, which prevents naive large-scale data-parallel SGD (DP-SGD) from converging to good minima.

Goodness-of-fit Test for Latent Block Models

no code implementations10 Jun 2019 Chihiro Watanabe, Taiji Suzuki

Latent block models are used for probabilistic biclustering, which is shown to be an effective method for analyzing various relational data sets.

Accelerated Sparsified SGD with Error Feedback

no code implementations29 May 2019 Tomoya Murata, Taiji Suzuki

Several work has shown that {\it{sparsified}} stochastic gradient descent method (SGD) with {\it{error feedback}} asymptotically achieves the same rate as (non-sparsified) parallel SGD.

Distributed Optimization

Graph Neural Networks Exponentially Lose Expressive Power for Node Classification

1 code implementation ICLR 2020 Kenta Oono, Taiji Suzuki

We show that when the Erd\H{o}s -- R\'{e}nyi graph is sufficiently dense and large, a broad range of GCNs on it suffers from the "information loss" in the limit of infinite layers with high probability.

General Classification Node Classification

Gradient Descent can Learn Less Over-parameterized Two-layer Neural Networks on Classification Problems

no code implementations23 May 2019 Atsushi Nitanda, Geoffrey Chinot, Taiji Suzuki

Most studies especially focused on the regression problems with the squared loss function, except for a few, and the importance of the positivity of the neural tangent kernel has been pointed out.

General Classification Generalization Bounds

On the minimax optimality and superiority of deep neural network learning over sparse parameter spaces

no code implementations22 May 2019 Satoshi Hayakawa, Taiji Suzuki

Whereas existing theoretical studies of deep learning have been based mainly on mathematical theories of well-known function classes such as H\"{o}lder and Besov classes, we focus on function classes with discontinuity and sparsity, which are those naturally assumed in practice.

Approximation and non-parametric estimation of ResNet-type convolutional neural networks via block-sparse fully-connected neural networks

no code implementations ICLR 2019 Kenta Oono, Taiji Suzuki

We develop new approximation and statistical learning theories of convolutional neural networks (CNNs) via the ResNet-type structure where the channel size, filter size, and width are fixed.

Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks

no code implementations24 Mar 2019 Kenta Oono, Taiji Suzuki

The key idea is that we can replicate the learning ability of Fully-connected neural networks (FNNs) by tailored CNNs, as long as the FNNs have \textit{block-sparse} structures.

Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks

no code implementations19 Dec 2018 Atsushi Yaguchi, Taiji Suzuki, Wataru Asano, Shuhei Nitta, Yukinobu Sakata, Akiyuki Tanizawa

In recent years, deep neural networks (DNNs) have been applied to various machine leaning tasks, including image recognition, speech recognition, and machine translation.

Machine Translation speech-recognition +2

Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: optimal rate and curse of dimensionality

no code implementations ICLR 2019 Taiji Suzuki

In addition to this, it is shown that deep learning can avoid the curse of dimensionality if the target function is in a mixed smooth Besov space.

Natural Language Processing

Sample Efficient Stochastic Gradient Iterative Hard Thresholding Method for Stochastic Sparse Linear Regression with Limited Attribute Observation

no code implementations NeurIPS 2018 Tomoya Murata, Taiji Suzuki

We develop new stochastic gradient methods for efficiently solving sparse linear regression in a partial attribute observation setting, where learners are only allowed to observe a fixed number of actively chosen attributes per example at training and prediction times.

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

no code implementations14 Jun 2018 Atsushi Nitanda, Taiji Suzuki

In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions.

Classification General Classification

Cross-domain Recommendation via Deep Domain Adaptation

no code implementations8 Mar 2018 Heishiro Kanagawa, Hayato Kobayashi, Nobuyuki Shimizu, Yukihiro Tagami, Taiji Suzuki

The behavior of users in certain services could be a clue that can be used to infer their preferences and may be used to make recommendations for other services they have never used.

Collaborative Filtering Denoising +2

Functional Gradient Boosting based on Residual Network Perception

no code implementations ICML 2018 Atsushi Nitanda, Taiji Suzuki

Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well.

Gradient Layer: Enhancing the Convergence of Adversarial Training for Generative Models

no code implementations7 Jan 2018 Atsushi Nitanda, Taiji Suzuki

In this paper, this phenomenon is explained from the functional gradient method perspective of the gradient layer.

Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables

no code implementations6 Nov 2017 Masaaki Takada, Taiji Suzuki, Hironori Fujisawa

However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features.

Fast learning rate of deep learning via a kernel perspective

no code implementations29 May 2017 Taiji Suzuki

Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one.

Trimmed Density Ratio Estimation

1 code implementation NeurIPS 2017 Song Liu, Akiko Takeda, Taiji Suzuki, Kenji Fukumizu

Density ratio estimation is a vital tool in both machine learning and statistical community.

Density Ratio Estimation

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

no code implementations NeurIPS 2017 Tomoya Murata, Taiji Suzuki

In this paper, we develop a new accelerated stochastic gradient method for efficiently solving the convex regularized empirical risk minimization problem in mini-batch settings.

Learning Sparse Structural Changes in High-dimensional Markov Networks: A Review on Methodologies and Theories

no code implementations6 Jan 2017 Song Liu, Kenji Fukumizu, Taiji Suzuki

Recent years have seen an increasing popularity of learning the sparse \emph{changes} in Markov Networks.

Minimax Optimal Alternating Minimization for Kernel Nonparametric Tensor Learning

no code implementations NeurIPS 2016 Taiji Suzuki, Heishiro Kanagawa, Hayato Kobayashi, Nobuyuki Shimizu, Yukihiro Tagami

We investigate the statistical performance and computational efficiency of the alternating minimization procedure for nonparametric tensor learning.

Stochastic dual averaging methods using variance reduction techniques for regularized empirical risk minimization problems

no code implementations8 Mar 2016 Tomoya Murata, Taiji Suzuki

We consider a composite convex minimization problem associated with regularized empirical risk minimization, which often arises in machine learning.

Structure Learning of Partitioned Markov Networks

no code implementations2 Apr 2015 Song Liu, Taiji Suzuki, Masashi Sugiyama, Kenji Fukumizu

We learn the structure of a Markov Network between two groups of random variables from joint observations.

Time Series

Spectral norm of random tensors

no code implementations7 Jul 2014 Ryota Tomioka, Taiji Suzuki

We show that the spectral norm of a random $n_1\times n_2\times \cdots \times n_K$ tensor (or higher-order array) scales as $O\left(\sqrt{(\sum_{k=1}^{K}n_k)\log(K)}\right)$ under some sub-Gaussian assumption on the entries.

Stochastic Dual Coordinate Ascent with Alternating Direction Multiplier Method

no code implementations4 Nov 2013 Taiji Suzuki

We propose a new stochastic dual coordinate ascent technique that can be applied to a wide range of regularized learning problems.

Convex Tensor Decomposition via Structured Schatten Norm Regularization

no code implementations NeurIPS 2013 Ryota Tomioka, Taiji Suzuki

We discuss structured Schatten norms for tensor decomposition that includes two recently proposed norms ("overlapped" and "latent") for convex-optimization-based tensor decomposition, and connect tensor decomposition with wider literature on structured sparsity.

Tensor Decomposition

Density-Difference Estimation

no code implementations NeurIPS 2012 Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi

A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference.

Change Point Detection

Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness

no code implementations2 Mar 2012 Taiji Suzuki, Masashi Sugiyama

If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than $\ell_1$-regularization; otherwise, a faster convergence rate for the $\ell_1$-regularization is shown.

Relative Density-Ratio Estimation for Robust Distribution Comparison

no code implementations NeurIPS 2011 Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, Masashi Sugiyama

Divergence estimators based on direct approximation of density-ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test.

Density Ratio Estimation Outlier Detection +1

Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning

no code implementations NeurIPS 2011 Taiji Suzuki

Finally, we show that, when the complexities of candidate reproducing kernel Hilbert spaces are inhomogeneous, dense type regularization shows better learning rate compared with sparse ℓ1 regularization.

Condition Number Analysis of Kernel-based Density Ratio Estimation

1 code implementation15 Dec 2009 Takafumi Kanamori, Taiji Suzuki, Masashi Sugiyama

We show that the kernel least-squares method has a smaller condition number than a version of kernel mean matching and other M-estimators, implying that the kernel least-squares method has preferable numerical properties.

Density Ratio Estimation feature selection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.