no code implementations • NeurIPS 2007 • Tsuyoshi Kato, Hisashi Kashima, Masashi Sugiyama, Kiyoshi Asai
In this paper, we propose a novel MTL algorithm that can overcome these problems.
no code implementations • NeurIPS 2007 • Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul V. Buenau, Motoaki Kawanabe
In this paper, we propose a direct importance estimation method that does not require the input density estimates.
no code implementations • NeurIPS 2008 • Takafumi Kanamori, Shohei Hido, Masashi Sugiyama
We address the problem of estimating the ratio of two probability density functions (a. k. a.~the importance).
1 code implementation • 15 Dec 2009 • Takafumi Kanamori, Taiji Suzuki, Masashi Sugiyama
We show that the kernel least-squares method has a smaller condition number than a version of kernel mean matching and other M-estimators, implying that the kernel least-squares method has preferable numerical properties.
no code implementations • NeurIPS 2010 • Shinichi Nakajima, Masashi Sugiyama, Ryota Tomioka
Bayesian methods of matrix factorization (MF) have been actively explored recently as promising alternatives to classical singular value decomposition.
no code implementations • NeurIPS 2011 • Shinichi Nakajima, Masashi Sugiyama, S. D. Babacan
A recent study on fully-observed VBMF showed that, under a stronger assumption that the two factorized matrices are column-wise independent, the global optimal solution can be analytically computed.
no code implementations • NeurIPS 2011 • Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, Masashi Sugiyama
Divergence estimators based on direct approximation of density-ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test.
no code implementations • NeurIPS 2011 • Ichiro Takeuchi, Masashi Sugiyama
We consider feature selection and weighting for nearest neighbor classifiers.
no code implementations • NeurIPS 2011 • Tingting Zhao, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama
We also theoretically show that PGPE with the optimal baseline is more preferable than REINFORCE with the optimal baseline in terms of the variance of gradient estimates.
no code implementations • 2 Feb 2012 • Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P. Xing, Masashi Sugiyama
We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures.
no code implementations • 2 Mar 2012 • Taiji Suzuki, Masashi Sugiyama
If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than $\ell_1$-regularization; otherwise, a faster convergence rate for the $\ell_1$-regularization is shown.
1 code implementation • 2 Mar 2012 • Song Liu, Makoto Yamada, Nigel Collier, Masashi Sugiyama
The objective of change-point detection is to discover abrupt property changes lying behind time-series data.
1 code implementation • 18 Jun 2012 • Ning Xie, Hirotaka Hachiya, Masashi Sugiyama
Oriental ink painting, called Sumi-e, is one of the most appealing painting styles that has attracted artists around the world.
no code implementations • NeurIPS 2012 • Shinichi Nakajima, Ryota Tomioka, Masashi Sugiyama, S. D. Babacan
The variational Bayesian (VB) approach is one of the best tractable approximations to the Bayesian estimation, and it was demonstrated to perform well in many applications.
no code implementations • NeurIPS 2012 • Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi
A naive approach is a two-step procedure of first estimating two densities separately and then computing their difference.
no code implementations • 25 Apr 2013 • Song Liu, John A. Quinn, Michael U. Gutmann, Taiji Suzuki, Masashi Sugiyama
We propose a new method for detecting changes in Markov network structure between two sets of samples.
no code implementations • 30 Apr 2013 • Daniele Calandriello, Gang Niu, Masashi Sugiyama
Semi-supervised clustering aims to introduce prior knowledge in the decision process of a clustering algorithm.
no code implementations • 1 May 2013 • Marthinus Christoffel du Plessis, Masashi Sugiyama
We consider the unsupervised learning problem of assigning labels to unlabeled data.
no code implementations • 19 Jul 2013 • Syogo Mori, Voot Tangkaratt, Tingting Zhao, Jun Morimoto, Masashi Sugiyama
The model-free RL approach directly learns the policy based on data samples.
no code implementations • NeurIPS 2013 • Ichiro Takeuchi, Tatsuya Hongo, Masashi Sugiyama, Shinichi Nakajima
We introduce a novel formulation of multi-task learning (MTL) called parametric task learning (PTL) that can systematically handle infinitely many tasks parameterized by a continuous parameter.
no code implementations • NeurIPS 2013 • Shinichi Nakajima, Akiko Takeda, S. Derin Babacan, Masashi Sugiyama, Ichiro Takeuchi
However, Bayesian learning is often obstructed by computational difficulty: the rigorous Bayesian learning is intractable in many models, and its variational Bayesian (VB) approximation is prone to suffer from local minima.
3 code implementations • 30 Jan 2014 • David Venuto, Toby Dylan Hocking, Lakjaree Sphanurattana, Masashi Sugiyama
In ranking problems, the goal is to learn a ranking function from labeled pairs of input points.
no code implementations • 3 Feb 2014 • Gang Niu, Bo Dai, Marthinus Christoffel du Plessis, Masashi Sugiyama
Given a hypothesis space, the large volume principle by Vladimir Vapnik prioritizes equivalence classes according to their volume in the hypothesis space.
no code implementations • 20 Apr 2014 • Hiroaki Sasaki, Aapo Hyvärinen, Masashi Sugiyama
We then develop a mean-shift-like fixed-point algorithm to find the modes of the density for clustering.
no code implementations • 28 Apr 2014 • Voot Tangkaratt, Ning Xie, Masashi Sugiyama
In such a case, estimating the conditional density itself is preferable, but conditional density estimation (CDE) is challenging in high-dimensional space.
no code implementations • 30 Jun 2014 • Hiroaki Sasaki, Yung-Kyun Noh, Masashi Sugiyama
Estimation of density derivatives is a versatile tool in statistical data analysis.
no code implementations • 2 Jul 2014 • Song Liu, Taiji Suzuki, Raissa Relator, Jun Sese, Masashi Sugiyama, Kenji Fukumizu
We study the problem of learning sparse structure changes between two Markov networks $P$ and $Q$.
no code implementations • NeurIPS 2014 • Kishan Wimalawarne, Masashi Sugiyama, Ryota Tomioka
We study a multitask learning problem in which each task is parametrized by a weight vector and indexed by a pair of indices, which can be e. g, (consumer, time).
no code implementations • NeurIPS 2014 • Marthinus C. Du Plessis, Gang Niu, Masashi Sugiyama
We next analyze the excess risk when the class prior is estimated from data, and show that the classification accuracy is not sensitive to class prior estimation if the unlabeled data is dominated by the positive data (this is naturally satisfied in inlier-based outlier detection because inliers are dominant in the unlabeled dataset).
no code implementations • NeurIPS 2014 • Shinichi Nakajima, Issei Sato, Masashi Sugiyama, Kazuho Watanabe, Hiroko Kobayashi
Latent Dirichlet allocation (LDA) is a popular generative model of various objects such as texts and images, where an object is expressed as a mixture of latent topics.
no code implementations • 12 Feb 2015 • Florian Yger, Masashi Sugiyama
Metric learning has been shown to be highly effective to improve the performance of nearest neighbor classification.
no code implementations • 2 Apr 2015 • Song Liu, Taiji Suzuki, Masashi Sugiyama, Kenji Fukumizu
We learn the structure of a Markov Network between two groups of random variables from joint observations.
no code implementations • 12 Jul 2015 • Shinya Suzumura, Kohei Ogawa, Masashi Sugiyama, Masayuki Karasuyama, Ichiro Takeuchi
An advantage of our homotopy approach is that it can be interpreted as simulated annealing, a common approach for finding a good local optimal solution in non-convex optimization problems.
no code implementations • 21 Jul 2015 • Hao Zhang, Yao Ma, Masashi Sugiyama
We consider a task assignment problem in crowdsourcing, which is aimed at collecting as many reliable labels as possible within a limited budget.
no code implementations • 26 Jul 2015 • Hao Zhang, Masashi Sugiyama
Task selection (picking an appropriate labeling task) and worker selection (assigning the labeling task to a suitable worker) are two major challenges in task assignment for crowdsourcing.
no code implementations • 1 Aug 2015 • Ikko Yamane, Hiroaki Sasaki, Masashi Sugiyama
Log-density gradient estimation is a fundamental statistical problem and possesses various practical applications such as clustering and measuring non-Gaussianity.
no code implementations • 5 Aug 2015 • Voot Tangkaratt, Hiroaki Sasaki, Masashi Sugiyama
On the other hand, quadratic MI (QMI) is a variant of MI based on the $L_2$ distance which is more robust against outliers than the KL divergence, and a computationally efficient method to estimate QMI from data, called least-squares QMI (LSQMI), has been proposed recently.
no code implementations • 6 Sep 2015 • Kishan Wimalawarne, Ryota Tomioka, Masashi Sugiyama
We theoretically and experimentally investigate tensor-based regression and classification.
no code implementations • 15 Oct 2015 • Yao Ma, Hao Zhang, Masashi Sugiyama
The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions.
no code implementations • 31 Oct 2015 • Mohammad Emtiyaz Khan, Reza Babanezhad, Wu Lin, Mark Schmidt, Masashi Sugiyama
We also give a convergence-rate analysis of our method and many other previous methods which exploit the geometry of the space.
1 code implementation • 15 Dec 2015 • Shinichi Nakajima, Ryota Tomioka, Masashi Sugiyama, S. Derin Babacan
In this paper, we clarify the behavior of VB learning in probabilistic PCA (or fully-observed matrix factorization).
no code implementations • 28 Jan 2016 • Hiroaki Sasaki, Gang Niu, Masashi Sugiyama
Non-Gaussian component analysis (NGCA) is aimed at identifying a linear subspace such that the projected data follows a non-Gaussian distribution.
1 code implementation • 3 Mar 2016 • Hiroaki Shiino, Hiroaki Sasaki, Gang Niu, Masashi Sugiyama
Non-Gaussian component analysis (NGCA) is an unsupervised linear dimension reduction method that extracts low-dimensional non-Gaussian "signals" from high-dimensional data contaminated with Gaussian noise.
no code implementations • NeurIPS 2016 • Gang Niu, Marthinus Christoffel du Plessis, Tomoya Sakai, Yao Ma, Masashi Sugiyama
In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data.
no code implementations • 7 Apr 2016 • Jie Luo, Karteek Popuri, Dana Cobzas, Hongyi Ding, Masashi Sugiyama
Meanwhile, summary statistics of the posterior are employed to evaluate the registration uncertainty, that is the trustworthiness of the registered image.
no code implementations • ICML 2017 • Tomoya Sakai, Marthinus Christoffel du Plessis, Gang Niu, Masashi Sugiyama
Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption.
no code implementations • 25 May 2016 • Inbal Horev, Florian Yger, Masashi Sugiyama
The classic SSA method finds a matrix that projects the data onto a stationary subspace by optimizing a cost function based on a matrix divergence.
no code implementations • 5 Nov 2016 • Marthinus C. du Plessis, Gang Niu, Masashi Sugiyama
Under the assumption that an additional labeled dataset is available, the class prior can be estimated by fitting a mixture of class-wise data distributions to the unlabeled data distribution.
no code implementations • ICML 2018 • Weihua Hu, Gang Niu, Issei Sato, Masashi Sugiyama
Since the DRSL is explicitly formulated for a distribution shift scenario, we naturally expect it to give a robust classifier that can aggressively handle shifted distributions.
no code implementations • 10 Nov 2016 • Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama
A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored.
2 code implementations • ICML 2017 • Weihua Hu, Takeru Miyato, Seiya Tokui, Eiichi Matsumoto, Masashi Sugiyama
Learning discrete representations of data is a central machine learning task because of the compactness of the representations and ease of interpretation.
Ranked #3 on Unsupervised Image Classification on SVHN (using extra training data)
1 code implementation • NeurIPS 2017 • Ryuichi Kiryo, Gang Niu, Marthinus C. Du Plessis, Masashi Sugiyama
From only positive (P) and unlabeled (U) data, a binary classifier could be trained with PU learning, in which the state of the art is unbiased PU learning.
1 code implementation • 22 Apr 2017 • Han Bao, Tomoya Sakai, Issei Sato, Masashi Sugiyama
Multiple instance learning (MIL) is a variation of traditional supervised learning problems where data (referred to as bags) are composed of sub-elements (referred to as instances) and only bag labels are available.
no code implementations • 26 Apr 2017 • Jie Luo, Karteek Popuri, Dana Cobzas, Hongyi Ding, William M. Wells III, Masashi Sugiyama
Since the transformation is such an essential component of registration, most existing researches conventionally quantify the registration uncertainty, which is the confidence in the estimated spatial correspondences, by the transformation uncertainty.
no code implementations • 1 May 2017 • Zhenghang Cui, Issei Sato, Masashi Sugiyama
As the emergence and the thriving development of social networks, a huge number of short texts are accumulated and need to be processed.
no code implementations • 4 May 2017 • Tomoya Sakai, Gang Niu, Masashi Sugiyama
Maximizing the area under the receiver operating characteristic curve (AUC) is a standard approach to imbalanced classification.
1 code implementation • 19 May 2017 • Hongyi Ding, Mohammad Emtiyaz Khan, Issei Sato, Masashi Sugiyama
We model the intensity of each sequence as an infinite mixture of latent functions, each of which is obtained using a function drawn from a Gaussian process.
1 code implementation • ICLR 2018 • Voot Tangkaratt, Abbas Abdolmaleki, Masashi Sugiyama
First, we show that GAC updates the guide actor by performing second-order optimization in the action space where the curvature matrix is based on the Hessians of the critic.
1 code implementation • NeurIPS 2017 • Takashi Ishida, Gang Niu, Weihua Hu, Masashi Sugiyama
Collecting complementary labels would be less laborious than collecting ordinary labels, since users do not have to carefully choose the correct class from a long list of candidate classes.
no code implementations • NeurIPS 2017 • Futoshi Futami, Issei Sato, Masashi Sugiyama
Exponential family distributions are highly useful in machine learning since their calculation can be performed efficiently through natural parameters.
no code implementations • 6 Jul 2017 • Hiroaki Sasaki, Takafumi Kanamori, Aapo Hyvärinen, Gang Niu, Masashi Sugiyama
Based on the proposed estimator, novel methods both for mode-seeking clustering and density ridge estimation are developed, and the respective convergence rates to the mode and ridge of the underlying density are also established.
no code implementations • 15 Oct 2017 • Tomoya Sakai, Gang Niu, Masashi Sugiyama
Recent advances in weakly supervised classification allow us to train a classifier only from positive and unlabeled (PU) data.
no code implementations • 16 Oct 2017 • Liyuan Xu, Junya Honda, Masashi Sugiyama
We propose the first fully-adaptive algorithm for pure exploration in linear bandits---the task to find the arm with the largest expected reward, which depends on an unknown parameter linearly.
no code implementations • 17 Oct 2017 • Hideaki Kano, Junya Honda, Kentaro Sakamaki, Kentaro Matsuura, Atsuyoshi Nakamura, Masashi Sugiyama
We consider a novel stochastic multi-armed bandit problem called {\em good arm identification} (GAI), where a good arm is defined as an arm with expected reward greater than or equal to a given threshold.
1 code implementation • 18 Oct 2017 • Futoshi Futami, Issei Sato, Masashi Sugiyama
In this paper, based on Zellner's optimization and variational formulation of Bayesian inference, we propose an outlier-robust pseudo-Bayesian variational method by replacing the Kullback-Leibler divergence used for data fitting to a robust divergence such as the beta- and gamma-divergences.
1 code implementation • NeurIPS 2018 • Takashi Ishida, Gang Niu, Masashi Sugiyama
Can we learn a binary classifier from only positive data, without any negative data or unlabeled data?
no code implementations • 28 Nov 2017 • Takayuki Osa, Masashi Sugiyama
Learning an optimal policy from a multi-modal reward function is a challenging problem in reinforcement learning (RL).
no code implementations • NeurIPS 2017 • Yung-Kyun Noh, Masashi Sugiyama, Kee-Eung Kim, Frank Park, Daniel D. Lee
This paper shows how metric learning can be used with Nadaraya-Watson (NW) kernel regression.
no code implementations • 12 Feb 2018 • Ryosuke Kamesawa, Issei Sato, Masashi Sugiyama
A state-of-the-art method of Gaussian process classification (GPC) with privileged information is GPC+, which incorporates privileged information into a noise term of the likelihood.
2 code implementations • ICML 2018 • Han Bao, Gang Niu, Masashi Sugiyama
Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high.
2 code implementations • NeurIPS 2018 • Yusuke Tsuzuku, Issei Sato, Masashi Sugiyama
High sensitivity of neural networks against malicious perturbations on inputs causes security concerns.
1 code implementation • ICML 2018 • Hideaki Imamura, Issei Sato, Masashi Sugiyama
In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case.
no code implementations • 15 Feb 2018 • Sheng-Jun Huang, Miao Xu, Ming-Kun Xie, Masashi Sugiyama, Gang Niu, Songcan Chen
Feature missing is a serious problem in many applications, which may lead to low quality of training data and further significantly degrade the learning performance.
no code implementations • 12 Mar 2018 • Hongyi Ding, Young Lee, Issei Sato, Masashi Sugiyama
We present the first framework for Gaussian-process-modulated Poisson processes when the temporal data appear in the form of panel counts.
no code implementations • 13 Mar 2018 • Masayoshi Hayashi, Tomoya Sakai, Masashi Sugiyama
In this paper, motivated by a semi-supervised classification method recently proposed by Sakai et al. (2017), we develop a method for the BMC problem which can use all of positive, negative, and unobserved entries, by combining the risks of Davenport et al. (2014) and Hsieh et al. (2015).
no code implementations • 14 Mar 2018 • Jie Luo, Alireza Sedghi, Karteek Popuri, Dana Cobzas, Miaomiao Zhang, Frank Preiswerk, Matthew Toews, Alexandra Golby, Masashi Sugiyama, William M. Wells III, Sarah Frisken
For probabilistic image registration (PIR), the predominant way to quantify the registration uncertainty is using summary statistics of the distribution of transformation parameters.
1 code implementation • NeurIPS 2018 • Ikko Yamane, Florian Yger, Jamal Atif, Masashi Sugiyama
Uplift modeling is aimed at estimating the incremental impact of an action on an individual's behavior, which is useful in various application domains such as targeted marketing (advertisement campaigns) and personalized medicine (medical treatments).
no code implementations • 20 Mar 2018 • Jie Luo, Matt Toews, Ines Machado, Sarah Frisken, Miaomiao Zhang, Frank Preiswerk, Alireza Sedghi, Hongyi Ding, Steve Pieper, Polina Golland, Alexandra Golby, Masashi Sugiyama, William M. Wells III
Kernels of the GP are estimated by using variograms and a discrete grid search method.
5 code implementations • NeurIPS 2018 • Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, Masashi Sugiyama
Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training.
Ranked #8 on Learning with noisy labels on CIFAR-10N-Random3
no code implementations • 21 May 2018 • Futoshi Futami, Zhenghang Cui, Issei Sato, Masashi Sugiyama
Another example is the Stein points (SP) method, which minimizes kernelized Stein discrepancy directly.
2 code implementations • NeurIPS 2018 • Bo Han, Jiangchao Yao, Gang Niu, Mingyuan Zhou, Ivor Tsang, Ya zhang, Masashi Sugiyama
It is important to learn various types of classifiers given training data with noisy labels.
Ranked #41 on Image Classification on Clothing1M (using extra training data)
no code implementations • 23 May 2018 • Miao Xu, Gang Niu, Bo Han, Ivor W. Tsang, Zhi-Hua Zhou, Masashi Sugiyama
We consider a challenging multi-label classification problem where both feature matrix $\X$ and label matrix $\Y$ have missing entries.
no code implementations • NeurIPS 2018 • Motoya Ohnishi, Masahiro Yukawa, Mikael Johansson, Masashi Sugiyama
Motivated by the success of reinforcement learning (RL) for discrete-time tasks such as AlphaGo and Atari games, there has been a recent surge of interest in using RL for continuous-time control of physical systems (cf.
1 code implementation • ICLR 2019 • Nan Lu, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama
In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM.
no code implementations • 11 Sep 2018 • Seiichi Kuroki, Nontawat Charoenphakdee, Han Bao, Junya Honda, Issei Sato, Masashi Sugiyama
A previously proposed discrepancy that does not use the source domain labels requires high computational cost to estimate and may lead to a loose generalization error bound in the target domain.
no code implementations • 13 Sep 2018 • Takeshi Teshima, Miao Xu, Issei Sato, Masashi Sugiyama
On the other hand, matrix completion (MC) methods can recover a low-rank matrix from various information deficits by using the principle of low-rank completion.
no code implementations • 14 Sep 2018 • Liyuan Xu, Junya Honda, Masashi Sugiyama
We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm.
no code implementations • 15 Sep 2018 • Masahiro Kato, Liyuan Xu, Gang Niu, Masashi Sugiyama
In this paper, we propose a novel unified approach to estimating the class-prior and training a classifier alternately.
no code implementations • 19 Sep 2018 • Nontawat Charoenphakdee, Masashi Sugiyama
Based on the analysis of the Bayes optimal classifier, we show that given a test class prior, PU classification under class prior shift is equivalent to PU classification with asymmetric error.
no code implementations • 27 Sep 2018 • Bo Han, Gang Niu, Jiangchao Yao, Xingrui Yu, Miao Xu, Ivor Tsang, Masashi Sugiyama
To handle these issues, by using the memorization effects of deep neural networks, we may train deep neural networks on the whole dataset only the first few iterations.
no code implementations • 27 Sep 2018 • Voot Tangkaratt, Masashi Sugiyama
Imitation learning aims to learn an optimal policy from expert demonstrations and its recent combination with deep learning has shown impressive performance.
1 code implementation • ICML 2020 • Bo Han, Gang Niu, Xingrui Yu, Quanming Yao, Miao Xu, Ivor Tsang, Masashi Sugiyama
Given data with noisy labels, over-parameterized deep networks can gradually memorize the data, and fit everything in the end.
1 code implementation • ICLR 2019 • Yu-Guan Hsieh, Gang Niu, Masashi Sugiyama
In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios.
1 code implementation • Proceedings of the 36th International Conference on Machine Learning, 2019 • Takashi Ishida, Gang Niu, Aditya Krishna Menon, Masashi Sugiyama
In contrast to the standard classification paradigm where the true class is given to each training pattern, complementary-label learning only uses training patterns each equipped with a complementary label, which only specifies one of the classes that the pattern does not belong to.
Ranked #22 on Image Classification on Kuzushiji-MNIST
no code implementations • 6 Dec 2018 • Si-An Chen, Voot Tangkaratt, Hsuan-Tien Lin, Masashi Sugiyama
In this work, we propose Active Reinforcement Learning with Demonstration (ARLD), a new framework to streamline RL in terms of demonstration efforts by allowing the RL agent to query for demonstration actively during training.
1 code implementation • ICLR 2019 • Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task.
3 code implementations • 14 Jan 2019 • Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor W. Tsang, Masashi Sugiyama
Learning with noisy labels is one of the hottest problems in weakly-supervised learning.
Ranked #13 on Learning with noisy labels on CIFAR-100N
no code implementations • ICML 2020 • Yusuke Tsuzuku, Issei Sato, Masashi Sugiyama
However, existing definitions of the flatness are known to be sensitive to the rescaling of parameters.
1 code implementation • 27 Jan 2019 • Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama
This paper aims to provide a better understanding of a symmetric loss.
no code implementations • 27 Jan 2019 • Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama
Imitation learning (IL) aims to learn an optimal policy from demonstrations.
1 code implementation • 28 Jan 2019 • Yongchan Kwon, Wonyoung Kim, Masashi Sugiyama, Myunghee Cho Paik
We consider the problem of learning a binary classifier from only positive and unlabeled observations (called PU learning).
no code implementations • 29 Jan 2019 • Miao Xu, Bingcong Li, Gang Niu, Bo Han, Masashi Sugiyama
May there be a new sample selection method that can outperform the latest importance reweighting method in the deep learning age?
1 code implementation • NeurIPS 2019 • Chenri Ni, Nontawat Charoenphakdee, Junya Honda, Masashi Sugiyama
First, we consider an approach based on simultaneous training of a classifier and a rejector, which achieves the state-of-the-art performance in the binary case.
no code implementations • 30 Jan 2019 • Jongyeong Lee, Nontawat Charoenphakdee, Seiichi Kuroki, Masashi Sugiyama
Appropriately evaluating the discrepancy between domains is essential for the success of unsupervised domain adaptation.
no code implementations • 31 Jan 2019 • Taira Tsuchiya, Nontawat Charoenphakdee, Issei Sato, Masashi Sugiyama
We further provide an estimation error bound to show that our risk estimator is consistent.
no code implementations • 31 Jan 2019 • Christian J. Walder, Paul Roussel, Richard Nock, Cheng Soon Ong, Masashi Sugiyama
We introduce a family of pairwise stochastic gradient estimators for gradients of expectations, which are related to the log-derivative trick, but involve pairwise interactions between samples.
no code implementations • 4 Feb 2019 • Takuo Kaneko, Issei Sato, Masashi Sugiyama
We consider the problem of online multiclass classification with partial feedback, where an algorithm predicts a class for a new instance in each round and only receives its correctness.
no code implementations • 27 Feb 2019 • Yuko Kuroki, Liyuan Xu, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama
Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time.
no code implementations • 13 Mar 2019 • Masato Ishii, Takashi Takenouchi, Masashi Sugiyama
In this paper, we propose a novel domain adaptation method that can be applied without target data.
no code implementations • ICLR Workshop LLD 2019 • Cheng-Yu Hsieh, Miao Xu, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama
To address the need, we propose a special weakly supervised MLL problem that not only focuses on the situation of limited fine-grained supervision but also leverages the hierarchical relationship between the coarse concepts and the fine-grained ones.
no code implementations • 26 Apr 2019 • Takuya Shimada, Han Bao, Issei Sato, Masashi Sugiyama
In this paper, we derive an unbiased risk estimator which can handle all of similarities/dissimilarities and unlabeled data.
1 code implementation • 19 May 2019 • Feng Liu, Jie Lu, Bo Han, Gang Niu, Guangquan Zhang, Masashi Sugiyama
Hence, we consider a new, more realistic and more challenging problem setting, where classifiers have to be trained with noisy labeled data from SD and unlabeled data from TD -- we name it wildly UDA (WUDA).
Unsupervised Domain Adaptation Wildly Unsupervised Domain Adaptation
2 code implementations • 28 May 2019 • Kenshin Abe, Zijian Xu, Issei Sato, Masashi Sugiyama
There have been increasing challenges to solve combinatorial optimization problems by machine learning.
1 code implementation • 29 May 2019 • Yuangang Pan, WeiJie Chen, Gang Niu, Ivor W. Tsang, Masashi Sugiyama
Specifically, the properties of our CoarsenRank are summarized as follows: (1) CoarsenRank is designed for mild model misspecification, which assumes there exist the ideal preferences (consistent with model assumption) that locates in a neighborhood of the actual preferences.
no code implementations • 29 May 2019 • Han Bao, Masashi Sugiyama
A clue to tackle their direct optimization is a calibrated surrogate utility, which is a tractable lower bound of the true utility function representing a given metric.
1 code implementation • NeurIPS 2019 • Liyuan Xu, Junya Honda, Gang Niu, Masashi Sugiyama
We propose two practical methods for uncoupled regression from pairwise comparison data and show that the learned regression model converges to the optimal model with the optimal parametric convergence rate when the target variable distributes uniformly.
1 code implementation • NeurIPS 2019 • Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, Masashi Sugiyama
Existing theories have shown that the transition matrix can be learned by exploiting \textit{anchor points} (i. e., data points that belong to a specific class almost surely).
Ranked #18 on Learning with noisy labels on CIFAR-10N-Random3
no code implementations • 22 Jul 2019 • Wenkai Xu, Gang Niu, Aapo Hyvärinen, Masashi Sugiyama
On the other hand, compressing the vertices while preserving the directed edge information provides a way to learn the small-scale representation of a directed graph.
1 code implementation • 24 Jul 2019 • Zhenghang Cui, Nontawat Charoenphakdee, Issei Sato, Masashi Sugiyama
Although learning from triplet comparison data has been considered in many applications, an important fundamental question of whether we can learn a classifier only from triplet comparison data has remained unanswered.
no code implementations • 21 Aug 2019 • Jie Luo, Sarah Frisken, Duo Wang, Alexandra Golby, Masashi Sugiyama, William M. Wells III
Probabilistic image registration (PIR) methods provide measures of registration uncertainty, which could be a surrogate for assessing the registration error.
no code implementations • 26 Aug 2019 • Motoya Ohnishi, Gennaro Notomista, Masashi Sugiyama, Magnus Egerstedt
When deploying autonomous agents in unstructured environments over sustained periods of time, adaptability and robustness oftentimes outweigh optimality as a primary consideration.
no code implementations • 15 Sep 2019 • Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama
However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs.
no code implementations • 25 Sep 2019 • Feng Liu, Jie Lu, Bo Han, Gang Niu, Guangquan Zhang, Masashi Sugiyama
Hence, we consider a new, more realistic and more challenging problem setting, where classifiers have to be trained with noisy labeled data from SD and unlabeled data from TD---we name it wildly UDA (WUDA).
Unsupervised Domain Adaptation Wildly Unsupervised Domain Adaptation
2 code implementations • 3 Oct 2019 • Johannes Ackermann, Volker Gabler, Takayuki Osa, Masashi Sugiyama
Finally, we investigate the application of multi-agent methods to high-dimensional robotic tasks and show that our approach can be used to learn decentralized policies in this domain.
Multi-agent Reinforcement Learning Reinforcement Learning (RL)
no code implementations • IJCNLP 2019 • Nontawat Charoenphakdee, Jongyeong Lee, Yiping Jin, Dittaya Wanvarie, Masashi Sugiyama
We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given.
1 code implementation • 10 Oct 2019 • Yivan Zhang, Nontawat Charoenphakdee, Masashi Sugiyama
Weakly-supervised learning is a paradigm for alleviating the scarcity of labeled data by leveraging lower-quality but larger-scale supervision signals.
no code implementations • 14 Oct 2019 • Paavo Parmas, Masashi Sugiyama
Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature.
no code implementations • 20 Oct 2019 • Nan Lu, Tianyi Zhang, Gang Niu, Masashi Sugiyama
The recently proposed unlabeled-unlabeled (UU) classification method allows us to train a binary classifier only from two unlabeled datasets with different class priors.
1 code implementation • EACL 2021 • Alon Jacovi, Gang Niu, Yoav Goldberg, Masashi Sugiyama
We consider the situation in which a user has collected a small set of documents on a cohesive topic, and they want to retrieve additional documents on this topic from a large collection.
no code implementations • 20 Nov 2019 • Jingfeng Zhang, Bo Han, Gang Niu, Tongliang Liu, Masashi Sugiyama
Deep neural networks (DNNs) are incredibly brittle due to adversarial examples.
no code implementations • ICML 2020 • Lei Feng, Takuo Kaneko, Bo Han, Gang Niu, Bo An, Masashi Sugiyama
In this paper, we propose a novel problem setting to allow MCLs for each example and two ways for learning with MCLs.
no code implementations • 11 Jan 2020 • Antonin Berthon, Bo Han, Gang Niu, Tongliang Liu, Masashi Sugiyama
We find with the help of confidence scores, the transition distribution of each instance can be approximately estimated.
no code implementations • 29 Jan 2020 • Kazuhiko Shinoda, Hirotaka Kaji, Masashi Sugiyama
Positive-confidence (Pconf) classification [Ishida et al., 2018] is a promising weakly-supervised learning method which trains a binary classifier only from positive data equipped with confidence.
no code implementations • 3 Feb 2020 • Soham Dan, Han Bao, Masashi Sugiyama
We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy S-D data.
no code implementations • ICLR 2022 • Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Gang Niu, Masashi Sugiyama, DaCheng Tao
Hitherto, the distributional-assumption-free CPE methods rely on a critical assumption that the support of the positive data distribution cannot be contained in the support of the negative data distribution.
no code implementations • ICLR 2021 • Zeke Xie, Issei Sato, Masashi Sugiyama
Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice.
1 code implementation • ICML 2020 • Takeshi Teshima, Issei Sato, Masashi Sugiyama
We take the structural equations in causal modeling as an example and propose a novel DA method, which is shown to be useful both theoretically and experimentally.
1 code implementation • ICML 2020 • Jiaqi Lv, Miao Xu, Lei Feng, Gang Niu, Xin Geng, Masashi Sugiyama
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
1 code implementation • ICML 2020 • Takashi Ishida, Ikko Yamane, Tomoya Sakai, Gang Niu, Masashi Sugiyama
We experimentally show that flooding improves performance and, as a byproduct, induces a double descent curve of the test loss.
1 code implementation • ICML 2020 • Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, Mohan Kankanhalli
Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models.
no code implementations • 10 Mar 2020 • Hideaki Imamura, Nontawat Charoenphakdee, Futoshi Futami, Issei Sato, Junya Honda, Masashi Sugiyama
If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework.
no code implementations • 20 Mar 2020 • Jie Luo, Guangshen Ma, Sarah Frisken, Parikshit Juvekar, Nazim Haouchine, Zhe Xu, Yiming Xiao, Alexandra Golby, Patrick Codd, Masashi Sugiyama, William Wells III
In this study, we use the variogram to screen the manually annotated landmarks in two datasets used to benchmark registration in image-guided neurosurgeries.
1 code implementation • NeurIPS 2020 • Yivan Zhang, Nontawat Charoenphakdee, Zhenguo Wu, Masashi Sugiyama
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances instead of individual instances, while the goal is still to predict labels of unseen individuals.
no code implementations • 28 May 2020 • Han Bao, Clayton Scott, Masashi Sugiyama
Adversarially robust classification seeks a classifier that is insensitive to adversarial perturbations of test patterns.
1 code implementation • NeurIPS 2020 • Tongtong Fang, Nan Lu, Gang Niu, Masashi Sugiyama
Under distribution shift (DS) where the training data distribution differs from the test one, a powerful technique is importance weighting (IW) which handles DS in two separate steps: weight estimation (WE) estimates the test-over-training density ratio and weighted classification (WC) trains the classifier from weighted training data.
no code implementations • 11 Jun 2020 • Han Bao, Takuya Shimada, Liyuan Xu, Issei Sato, Masashi Sugiyama
A classifier built upon the representations is expected to perform well in downstream classification; however, little theory has been given in literature so far and thereby the relationship between similarity and classification has remained elusive.
no code implementations • 13 Jun 2020 • Masahiro Fujisawa, Takeshi Teshima, Issei Sato, Masashi Sugiyama
Approximate Bayesian computation (ABC) is a likelihood-free inference method that has been employed in various applications.
1 code implementation • NeurIPS 2020 • Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Jiankang Deng, Gang Niu, Masashi Sugiyama
By this intermediate class, the original transition matrix can then be factorized into the product of two easy-to-estimate transition matrices.
1 code implementation • NeurIPS 2020 • Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, DaCheng Tao, Masashi Sugiyama
Learning with the \textit{instance-dependent} label noise is challenging, because it is hard to model such real-world noise.
no code implementations • 15 Jun 2020 • Kei Mukaiyama, Issei Sato, Masashi Sugiyama
The prototypical network (ProtoNet) is a few-shot learning framework that performs metric learning and classification using the distance to prototype representations of each class.
no code implementations • NeurIPS 2020 • Taira Tsuchiya, Junya Honda, Masashi Sugiyama
We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback.
no code implementations • NeurIPS 2020 • Takeshi Teshima, Isao Ishikawa, Koichi Tojo, Kenta Oono, Masahiro Ikeda, Masashi Sugiyama
We answer this question by showing a convenient criterion: a CF-INN is universal if its layers contain affine coupling and invertible linear functions as special cases.
1 code implementation • 21 Jun 2020 • Mehdi Abbana Bennani, Thang Doan, Masashi Sugiyama
In this framework, we prove that OGD is robust to Catastrophic Forgetting then derive the first generalisation bound for SGD and OGD for Continual Learning.
no code implementations • ICML 2020 • Yuko Kuroki, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama
Dense subgraph discovery aims to find a dense component in edge-weighted graphs.
1 code implementation • 29 Jun 2020 • Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama
Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and minima selection.
no code implementations • ICML 2020 • Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama
In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions.
no code implementations • 8 Jul 2020 • Tianyi Zhang, Ikko Yamane, Nan Lu, Masashi Sugiyama
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
no code implementations • NeurIPS 2020 • Lei Feng, Jiaqi Lv, Bo Han, Miao Xu, Gang Niu, Xin Geng, Bo An, Masashi Sugiyama
Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels.
no code implementations • 28 Sep 2020 • Zeke Xie, Issei Sato, Masashi Sugiyama
\citet{loshchilov2018decoupled} demonstrated that $L_{2}$ regularization is not identical to weight decay for adaptive gradient methods, such as Adaptive Momentum Estimation (Adam), and proposed Adam with Decoupled Weight Decay (AdamW).
no code implementations • 5 Oct 2020 • Lei Feng, Senlin Shu, Nan Lu, Bo Han, Miao Xu, Gang Niu, Bo An, Masashi Sugiyama
To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed.
2 code implementations • ICLR 2021 • Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, Mohan Kankanhalli
The belief was challenged by recent studies where we can maintain the robustness and improve the accuracy.
1 code implementation • 20 Oct 2020 • Voot Tangkaratt, Nontawat Charoenphakdee, Masashi Sugiyama
Robust learning from noisy demonstrations is a practical but highly challenging problem in imitation learning.
2 code implementations • 22 Oct 2020 • Ruize Gao, Feng Liu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, Masashi Sugiyama
However, it has been shown that the MMD test is unaware of adversarial attacks -- the MMD test failed to detect the discrepancy between natural and adversarial data.
no code implementations • 22 Oct 2020 • Nontawat Charoenphakdee, Zhenghang Cui, Yivan Zhang, Masashi Sugiyama
The goal of classification with rejection is to avoid risky misclassification in error-critical applications such as medical diagnosis and product inspection.
no code implementations • 5 Nov 2020 • Naoya Otani, Yosuke Otsubo, Tetsuya Koike, Masashi Sugiyama
This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples.
1 code implementation • 9 Nov 2020 • Bo Han, Quanming Yao, Tongliang Liu, Gang Niu, Ivor W. Tsang, James T. Kwok, Masashi Sugiyama
Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios.
1 code implementation • 12 Nov 2020 • Zeke Xie, Fengxiang He, Shaopeng Fu, Issei Sato, DaCheng Tao, Masashi Sugiyama
Thus it motivates us to design a similar mechanism named {\it artificial neural variability} (ANV), which helps artificial neural networks learn some advantages from ``natural'' neural networks.
no code implementations • CVPR 2021 • Nontawat Charoenphakdee, Jayakorn Vongkulbhisal, Nuttapong Chairatanakul, Masashi Sugiyama
In this paper, we first prove that the focal loss is classification-calibrated, i. e., its minimizer surely yields the Bayes-optimal classifier and thus the use of the focal loss in classification can be theoretically justified.
1 code implementation • NeurIPS 2023 • Zeke Xie, Zhiqiang Xu, Jingzhao Zhang, Issei Sato, Masashi Sugiyama
Weight decay is a simple yet powerful regularization technique that has been very widely used in training of deep neural networks (DNNs).
no code implementations • 31 Dec 2020 • Yuko Kuroki, Junya Honda, Masashi Sugiyama
Combinatorial optimization is one of the fundamental research fields that has been extensively studied in theoretical computer science and operations research.
no code implementations • 1 Jan 2021 • Chia-You Chen, Hsuan-Tien Lin, Gang Niu, Masashi Sugiyama
One is to (pre-)train a classifier with examples from known classes, and then transfer the pre-trained classifier to unknown classes using the new examples.
no code implementations • 5 Jan 2021 • Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama
When minimizing the empirical risk in binary classification, it is a common practice to replace the zero-one loss with a surrogate loss to make the learning objective feasible to optimize.
no code implementations • 19 Jan 2021 • Masato Ishii, Masashi Sugiyama
In this setting, we cannot access source data during adaptation, while unlabeled target data and a model pretrained with source data are given.
1 code implementation • 1 Feb 2021 • Nan Lu, Shida Lei, Gang Niu, Issei Sato, Masashi Sugiyama
SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation.
1 code implementation • 3 Feb 2021 • Xuefeng Du, Jingfeng Zhang, Bo Han, Tongliang Liu, Yu Rong, Gang Niu, Junzhou Huang, Masashi Sugiyama
In adversarial training (AT), the main focus has been the objective and optimizer while the model has been less studied, so that the models being used are still those classic ones in standard training (ST).
1 code implementation • 4 Feb 2021 • Xuefeng Li, Tongliang Liu, Bo Han, Gang Niu, Masashi Sugiyama
In label-noise learning, the transition matrix plays a key role in building statistically consistent classifiers.
Ranked #14 on Learning with noisy labels on CIFAR-100N
1 code implementation • 4 Feb 2021 • Yivan Zhang, Gang Niu, Masashi Sugiyama
To estimate the transition matrix from noisy data, existing methods often need to estimate the noisy class-posterior, which could be unreliable due to the overconfidence of neural networks.
no code implementations • 6 Feb 2021 • Jianing Zhu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, Hongxia Yang, Mohan Kankanhalli, Masashi Sugiyama
A recent adversarial training (AT) study showed that the number of projected gradient descent (PGD) steps to successfully attack a point (i. e., find an adversarial example in its proximity) is an effective measure of the robustness of this point.
1 code implementation • ICLR 2022 • Haoang Chi, Feng Liu, Bo Han, Wenjing Yang, Long Lan, Tongliang Liu, Gang Niu, Mingyuan Zhou, Masashi Sugiyama
In this paper, we demystify assumptions behind NCD and find that high-level semantic features should be shared among the seen and unseen classes.
2 code implementations • 10 Feb 2021 • Hanshu Yan, Jingfeng Zhang, Gang Niu, Jiashi Feng, Vincent Y. F. Tan, Masashi Sugiyama
By comparing \textit{non-robust} (normally trained) and \textit{robustified} (adversarially trained) models, we observe that adversarial training (AT) robustifies CNNs by aligning the channel-wise activations of adversarial data with those of their natural counterparts.
no code implementations • 13 Feb 2021 • Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama
Weakly supervised learning has drawn considerable attention recently to reduce the expensive time and labor consumption of labeling massive data.
no code implementations • 15 Feb 2021 • Chen Chen, Jingfeng Zhang, Xilie Xu, Tianlei Hu, Gang Niu, Gang Chen, Masashi Sugiyama
To enhance adversarial robustness, adversarial training learns deep neural networks on the adversarial variants generated by their natural data.
1 code implementation • 27 Feb 2021 • Takeshi Teshima, Masashi Sugiyama
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
no code implementations • 1 Mar 2021 • Ziqing Lu, Chang Xu, Bo Du, Takashi Ishida, Lefei Zhang, Masashi Sugiyama
In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas.
1 code implementation • 4 Mar 2021 • Shuhei M. Yoshida, Takashi Takenouchi, Masashi Sugiyama
To this end, we derive a representation theorem for proper losses in supervised learning, which dualizes the Savage representation.
1 code implementation • 12 Mar 2021 • Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies.
1 code implementation • 25 Mar 2021 • Yivan Zhang, Masashi Sugiyama
Label noise in multiclass classification is a major obstacle to the deployment of learning systems.
1 code implementation • 31 Mar 2021 • Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama
It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks.
no code implementations • 31 May 2021 • Paavo Parmas, Masashi Sugiyama
Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks, with no insight into their nature.
1 code implementation • 31 May 2021 • Jingfeng Zhang, Xilie Xu, Bo Han, Tongliang Liu, Gang Niu, Lizhen Cui, Masashi Sugiyama
First, we thoroughly investigate noisy labels (NLs) injection into AT's inner maximization and outer minimization, respectively and obtain the observations on when NL injection benefits AT.
no code implementations • 1 Jun 2021 • Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama
Lots of approaches, e. g., loss correction and label correction, cannot handle such open-set noisy labels well, since they need training data and test data to share the same label space, which does not hold for learning with open-set noisy labels.
no code implementations • NeurIPS 2021 • Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama
In this way, we also give large-loss but less selected data a try; then, we can better distinguish between the cases (a) and (b) by seeing if the losses effectively decrease with the uncertainty after the try.
Ranked #26 on Image Classification on mini WebVision 1.0
1 code implementation • 8 Jun 2021 • Jiaheng Wei, Hangyu Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama, Yang Liu
We provide understandings for the properties of LS and NLS when learning with noisy labels.
Ranked #9 on Learning with noisy labels on CIFAR-10N-Random3
no code implementations • NeurIPS 2021 • Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama
First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function.
no code implementations • 11 Jun 2021 • Jiaqi Lv, Biao Liu, Lei Feng, Ning Xu, Miao Xu, Bo An, Gang Niu, Xin Geng, Masashi Sugiyama
Partial-label learning (PLL) utilizes instances with PLs, where a PL includes several candidate labels but only one is the true label (TL).
1 code implementation • NeurIPS 2021 • Qizhou Wang, Feng Liu, Bo Han, Tongliang Liu, Chen Gong, Gang Niu, Mingyuan Zhou, Masashi Sugiyama
Reweighting adversarial data during training has been recently shown to improve adversarial robustness, where data closer to the current decision boundaries are regarded as more critical and given larger weights.
no code implementations • 16 Jun 2021 • Yuzhou Cao, Lei Feng, Senlin Shu, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama
We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i. e., the class-posterior probabilities for all the classes) are available.
no code implementations • 17 Jun 2021 • Xin-Qiang Cai, Yao-Xiang Ding, Zi-Xuan Chen, Yuan Jiang, Masashi Sugiyama, Zhi-Hua Zhou
In many real-world imitation learning tasks, the demonstrator and the learner have to act under different observation spaces.
1 code implementation • 11 Jul 2021 • Shota Nakajima, Masashi Sugiyama
Learning from positive and unlabeled (PU) data is an important problem in various applications.
1 code implementation • 16 Jul 2021 • Ikko Yamane, Junya Honda, Florian Yger, Masashi Sugiyama
In this paper, we consider the task of predicting $Y$ from $X$ when we have no paired data of them, but we have two separate, independent datasets of $X$ and $Y$ each observed with some mediating variable $U$, that is, we have two datasets $S_X = \{(X_i, U_i)\}$ and $S_Y = \{(U'_j, Y'_j)\}$.