no code implementations • ICML 2020 • Futoshi Futami, Issei Sato, Masashi Sugiyama
Compared with the naive parallel-chain SGLD that updates multiple particles independently, ensemble methods update particles with their interactions.
1 code implementation • ICML 2020 • Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama
Learning from demonstrations can be challenging when the quality of demonstrations is diverse, and even more so when the quality is unknown and there is no additional information to estimate the quality.
no code implementations • 10 Dec 2024 • Huanjian Zhou, Masashi Sugiyama
Our work highlights the potential advantages of simulation methods in scientific computation for dynamics-based sampling and diffusion models.
no code implementations • 26 Oct 2024 • Yuting Tang, Xin-Qiang Cai, Jing-Cheng Pang, Qiyu Wu, Yao-Xiang Ding, Masashi Sugiyama
In this paper, we introduce the problem of RL from Composite Delayed Reward (RLCoDe), which generalizes traditional RL from delayed rewards by eliminating the strong assumption.
no code implementations • 16 Oct 2024 • Feiyang Ye, Yueming Lyu, Xuehao Wang, Masashi Sugiyama, Yu Zhang, Ivor Tsang
To address those problems in black-box optimization, we propose a novel Sharpness-Aware Black-box Optimization (SABO) algorithm, which applies a sharpness-aware minimization strategy to improve the model generalization.
no code implementations • 4 Oct 2024 • Zhen-Yu Zhang, Jiandong Zhang, Huaxiu Yao, Gang Niu, Masashi Sugiyama
In this paper, we propose unsupervised prompt learning for classification with black-box LLMs, where the learning parameters are the prompt itself and the pseudo labels of unlabeled data.
1 code implementation • 25 Sep 2024 • Ming Li, Jike Zhong, Chenxin Li, Liuzhuozheng Li, Nie Lin, Masashi Sugiyama
Recent advances in fine-tuning Vision-Language Models (VLMs) have witnessed the success of prompt tuning and adapter tuning, while the classic model fine-tuning on inherent parameters seems to be overlooked.
1 code implementation • 26 Jul 2024 • Jia-Hao Xiao, Ming-Kun Xie, Heng-Bo Fan, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang
In this paper, we propose a dual-perspective method to generate high-quality pseudo-labels.
no code implementations • 13 Jun 2024 • Qizhou Wang, Bo Han, Puning Yang, Jianing Zhu, Tongliang Liu, Masashi Sugiyama
The compelling goal of eradicating undesirable data behaviors, while preserving usual model functioning, underscores the significance of machine unlearning within the domain of large language models (LLMs).
no code implementations • 12 Jun 2024 • Jianing Zhu, Bo Han, Jiangchao Yao, Jianliang Xu, Gang Niu, Masashi Sugiyama
Previous studies showed that class-wise unlearning is successful in forgetting the knowledge of a target class, through gradient ascent on the forgetting data or fine-tuning with the remaining data.
no code implementations • 30 May 2024 • Hao Chen, Yujin Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, Jindong Wang, Bhiksha Raj
They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs.
1 code implementation • 29 May 2024 • Ziqing Fan, Shengchao Hu, Jiangchao Yao, Gang Niu, Ya zhang, Masashi Sugiyama, Yanfeng Wang
However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training.
no code implementations • 25 May 2024 • Or Raveh, Junya Honda, Masashi Sugiyama
Various approaches have emerged for multi-armed bandits in distributed systems.
1 code implementation • 23 May 2024 • Johannes Ackermann, Takayuki Osa, Masashi Sugiyama
Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy.
no code implementations • 16 May 2024 • Kunda Yan, Sen Cui, Abudukelimu Wuerkaixi, Jingfeng Zhang, Bo Han, Gang Niu, Masashi Sugiyama, ChangShui Zhang
Our framework aims to approximate an optimal cooperation network for each client by optimizing a weighted sum of model similarity and feature complementarity.
no code implementations • 11 Apr 2024 • Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, Masashi Sugiyama
In this paper, we investigate an offline reinforcement learning (RL) problem where datasets are collected from two domains.
1 code implementation • 9 Apr 2024 • Ming-Kun Xie, Jia-Hao Xiao, Pei Peng, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang
In this paper, we provide a causal inference framework to show that the correlative features caused by the target object and its co-occurring objects can be regarded as a mediator, which has both positive and negative impacts on model predictions.
no code implementations • 16 Mar 2024 • Ayoub Ghriss, Masashi Sugiyama, Alessandro Lazaric
The current thesis aims to explore the reinforcement learning field and build on existing methods to produce improved ones to tackle the problem of learning in high-dimensional and complex environments.
no code implementations • 11 Mar 2024 • Hao Chen, Jindong Wang, Zihan Wang, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, Bhiksha Raj
Foundation models are usually pre-trained on large-scale datasets and then adapted to downstream tasks through tuning.
1 code implementation • 29 Feb 2024 • Guillaume Braun, Masashi Sugiyama
Social networks are often associated with rich side information, such as texts and images.
no code implementations • 10 Feb 2024 • Zhen-Yu Zhang, Siwei Han, Huaxiu Yao, Gang Niu, Masashi Sugiyama
To improve the ability of the large language model (LLMs) to tackle complex reasoning problems, chain-of-thoughts (CoT) methods were proposed to guide LLMs to reason step-by-step, enabling problem solving from simple to complex.
no code implementations • 6 Feb 2024 • Yuting Tang, Xin-Qiang Cai, Yao-Xiang Ding, Qiyu Wu, Guoqing Liu, Masashi Sugiyama
Theoretically, we demonstrate that RLBR can be addressed by solving a standard MDP with properly redistributed bagged rewards allocated to each instance within a bag.
1 code implementation • 2 Feb 2024 • Hao Chen, Jindong Wang, Lei Feng, Xiang Li, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj
Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment.
no code implementations • 12 Jan 2024 • Jialiang Tang, Shuo Chen, Gang Niu, Hongyuan Zhu, Joey Tianyi Zhou, Chen Gong, Masashi Sugiyama
Then, we build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network, while simultaneously encouraging the adapter within the teacher network to learn the domain-specific knowledge of the target data.
1 code implementation • 27 Nov 2023 • Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, Masashi Sugiyama
Existing consistent approaches have relied on the uniform distribution assumption to model the generation of complementary labels, or on an ordinary-label training set to estimate the transition matrix in non-uniform cases.
no code implementations • 24 Oct 2023 • Shintaro Nakamura, Masashi Sugiyama
We show that the upper bound of the probability of error of the CSA algorithm matches a lower bound up to a logarithmic factor in the exponent.
no code implementations • 11 Oct 2023 • Wentao Yu, Shuo Chen, Chen Gong, Gang Niu, Masashi Sugiyama
As motifs in a molecule are significant patterns that are of great importance for determining molecular properties (e. g., toxicity and solubility), overlooking motif interactions inevitably hinders the effectiveness of MPP.
no code implementations • 1 Oct 2023 • Jongyeong Lee, Junya Honda, Masashi Sugiyama
This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models.
1 code implementation • 29 Sep 2023 • Hao Chen, Jindong Wang, Ankit Shah, Ran Tao, Hongxin Wei, Xing Xie, Masashi Sugiyama, Bhiksha Raj
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
no code implementations • 15 Sep 2023 • Chao-Kai Chiang, Masashi Sugiyama
The analysis component of the framework, viewed as a decontamination process, provides a systematic method of conducting risk rewrite.
no code implementations • 20 Aug 2023 • Shintaro Nakamura, Masashi Sugiyama
We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in $d$.
1 code implementation • ICCV 2023 • Penghui Yang, Ming-Kun Xie, Chen-Chen Zong, Lei Feng, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang
Existing knowledge distillation methods typically work by imparting the knowledge of output logits or intermediate feature maps from the teacher network to the student network, which is very successful in multi-class single-label learning.
no code implementations • ICCV 2023 • Jialiang Tang, Shuo Chen, Gang Niu, Masashi Sugiyama, Chen Gong
Knowledge distillation aims to learn a lightweight student network from a pre-trained teacher network.
no code implementations • 12 Jul 2023 • Ruijiang Dong, Feng Liu, Haoang Chi, Tongliang Liu, Mingming Gong, Gang Niu, Masashi Sugiyama, Bo Han
In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC).
no code implementations • 15 Jun 2023 • Shintaro Nakamura, Masashi Sugiyama
In such a case, the R-CPE-MAB can be seen as a special case of the so-called transductive linear bandits.
no code implementations • 12 Jun 2023 • Yuhao Wu, Xiaobo Xia, Jun Yu, Bo Han, Gang Niu, Masashi Sugiyama, Tongliang Liu
Training a classifier exploiting a huge amount of supervised data is expensive or even prohibited in a situation, where the labeling cost is high.
1 code implementation • 28 May 2023 • Jingfeng Zhang, Bo Song, Haohan Wang, Bo Han, Tongliang Liu, Lei Liu, Masashi Sugiyama
To address the challenge posed by BadLabel, we further propose a robust LNL method that perturbs the labels in an adversarial manner at each epoch to make the loss values of clean and noisy labels again distinguishable.
1 code implementation • 22 May 2023 • Hao Chen, Ankit Shah, Jindong Wang, Ran Tao, Yidong Wang, Xing Xie, Masashi Sugiyama, Rita Singh, Bhiksha Raj
In this paper, we introduce imprecise label learning (ILL), a framework for the unification of learning with various imprecise label configurations.
Ranked #1 on Learning with noisy labels on mini WebVision 1.0
no code implementations • 19 May 2023 • Yivan Zhang, Masashi Sugiyama
Disentangling the explanatory factors in complex data is a promising approach for generalizable and data-efficient representation learning.
no code implementations • 15 May 2023 • Wei-I Lin, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama
Our analysis reveals that the efficiency of implicit label sharing is closely related to the performance of existing CLL models.
no code implementations • 11 May 2023 • Yivan Zhang, Masashi Sugiyama
Disentangling the factors of variation in data is a fundamental concept in machine learning and has been studied in various ways by different researchers, leading to a multitude of definitions.
1 code implementation • 4 May 2023 • Ming-Kun Xie, Jia-Hao Xiao, Hao-Zhe Liu, Gang Niu, Masashi Sugiyama, Sheng-Jun Huang
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
1 code implementation • NeurIPS 2023 • Xilie Xu, Jingfeng Zhang, Feng Liu, Masashi Sugiyama, Mohan Kankanhalli
To improve transferability, the existing work introduced the standard invariant regularization (SIR) to impose style-independence property to SCL, which can exempt the impact of nuisance style factors in the standard representation.
no code implementations • 22 Mar 2023 • Jiaheng Wei, Zhaowei Zhu, Gang Niu, Tongliang Liu, Sijia Liu, Masashi Sugiyama, Yang Liu
Both long-tailed and noisily labeled data frequently appear in real-world applications and impose significant challenges for learning.
no code implementations • 28 Feb 2023 • Jongyeong Lee, Chao-Kai Chiang, Masashi Sugiyama
Although the uniform prior is shown to be optimal, we highlight the inherent limitation of its optimality, which is limited to specific parameterizations and emphasizes the significance of the invariance property of priors.
1 code implementation • NeurIPS 2023 • Xilie Xu, Jingfeng Zhang, Feng Liu, Masashi Sugiyama, Mohan Kankanhalli
Adversarial contrastive learning (ACL) does not require expensive data annotations but outputs a robust representation that withstands adversarial attacks and also generalizes to a wide range of downstream tasks.
no code implementations • NeurIPS 2023 • Yu-Jie Zhang, Zhen-Yu Zhang, Peng Zhao, Masashi Sugiyama
Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound, which finally leads to an excess risk guarantee for the predictor.
1 code implementation • 6 Feb 2023 • Salah Ghamizi, Jingfeng Zhang, Maxime Cordy, Mike Papadakis, Masashi Sugiyama, Yves Le Traon
While leveraging additional training data is well established to improve adversarial robustness, it incurs the unavoidable cost of data collection and the heavy computation to train models.
no code implementations • 3 Feb 2023 • Jongyeong Lee, Junya Honda, Chao-Kai Chiang, Masashi Sugiyama
In addition to the empirical performance, TS has been shown to achieve asymptotic problem-dependent lower bounds in several models.
no code implementations • 26 Dec 2022 • Shintaro Nakamura, Han Bao, Masashi Sugiyama
Optimal transport (OT) has become a widely used tool in the machine learning field to measure the discrepancy between probability distributions.
no code implementations • 23 Nov 2022 • Tingting Zhao, Ying Wang, Wei Sun, Yarui Chen, Gang Niub, Masashi Sugiyama
Meanwhile, we divide the whole learning task into learning with the large-scale representation models in an unsupervised manner and learning with the small-scale policy model in the RL manner. The small policy model facilitates policy learning, while not sacrificing generalization and expressiveness via the large representation model.
2 code implementations • Conference 2022 • Yuzhou Cao, Tianchi Cai, Lei Feng, Lihong Gu, Jinjie Gu, Bo An, Gang Niu, Masashi Sugiyama
\emph{Classification with rejection} (CwR) refrains from making a prediction to avoid critical misclassification when encountering test samples that are difficult to classify.
1 code implementation • 1 Nov 2022 • Jianan Zhou, Jianing Zhu, Jingfeng Zhang, Tongliang Liu, Gang Niu, Bo Han, Masashi Sugiyama
Adversarial training (AT) with imperfect supervision is significant but receives limited attention.
no code implementations • 3 Aug 2022 • Yivan Zhang, Jindong Wang, Xing Xie, Masashi Sugiyama
To formally analyze this issue, we provide a unique algebraic formulation of the combination shift problem based on the concepts of homomorphism, equivariance, and a refined definition of disentanglement.
no code implementations • 5 Jul 2022 • Yong Bai, Yu-Jie Zhang, Peng Zhao, Masashi Sugiyama, Zhi-Hua Zhou
In this paper, we formulate and investigate the problem of \emph{online label shift} (OLaS): the learner trains an initial model from the labeled offline data and then deploys it to an unlabeled online environment where the underlying label distribution changes over time but the label-conditional density does not.
1 code implementation • 4 Jul 2022 • Yuting Tang, Nan Lu, Tianyi Zhang, Masashi Sugiyama
Recent years have witnessed a great success of supervised deep learning, where predictive models were trained from a large amount of fully labeled data.
no code implementations • 7 Jun 2022 • Charles Riou, Junya Honda, Masashi Sugiyama
For that purpose, we identify two key components in the survival regret: the regret given no ruin (which corresponds to the regret in the MAB), and the probability that the procedure is interrupted, called the probability of ruin.
no code implementations • CVPR 2022 • De Cheng, Tongliang Liu, Yixiong Ning, Nannan Wang, Bo Han, Gang Niu, Xinbo Gao, Masashi Sugiyama
In label-noise learning, estimating the transition matrix has attracted more and more attention as the matrix plays an important role in building statistically consistent classifiers.
no code implementations • 2 Jun 2022 • Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama
Bayesian deep learning plays an important role especially for its ability evaluating epistemic uncertainty (EU).
no code implementations • 15 Apr 2022 • Isao Ishikawa, Takeshi Teshima, Koichi Tojo, Kenta Oono, Masahiro Ikeda, Masashi Sugiyama
Invertible neural networks (INNs) are neural network architectures with invertibility by design.
1 code implementation • 7 Apr 2022 • Nan Lu, Zhao Wang, Xiaoxiao Li, Gang Niu, Qi Dou, Masashi Sugiyama
We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model.
no code implementations • 22 Feb 2022 • Yinghua Gao, Dongxian Wu, Jingfeng Zhang, Guanhao Gan, Shu-Tao Xia, Gang Niu, Masashi Sugiyama
To explore whether adversarial training could defend against backdoor attacks or not, we conduct extensive experiments across different threat models and perturbation budgets, and find the threat model in adversarial training matters.
1 code implementation • 7 Feb 2022 • Xilie Xu, Jingfeng Zhang, Feng Liu, Masashi Sugiyama, Mohan Kankanhalli
Furthermore, we theoretically find that the adversary can also degrade the lower bound of a TST's test power, which enables us to iteratively minimize the test criterion in order to search for adversarial pairs.
1 code implementation • 1 Feb 2022 • Takashi Ishida, Ikko Yamane, Nontawat Charoenphakdee, Gang Niu, Masashi Sugiyama
In contrast to others, our method is model-free and even instance-free.
no code implementations • 12 Jan 2022 • Hanshu Yan, Jingfeng Zhang, Jiashi Feng, Masashi Sugiyama, Vincent Y. F. Tan
Secondly, to robustify DIDs, we propose an adversarial training strategy, hybrid adversarial training ({\sc HAT}), that jointly trains DIDs with adversarial and non-adversarial noisy data to ensure that the reconstruction quality is high and the denoisers around non-adversarial data are locally smooth.
no code implementations • 23 Dec 2021 • Zhenguo Wu, Jiaqi Lv, Masashi Sugiyama
Recently, various approaches on partial-label learning have been proposed under different generation models of candidate label sets.
no code implementations • 19 Dec 2021 • Nan Lu, Tianyi Zhang, Tongtong Fang, Takeshi Teshima, Masashi Sugiyama
A key assumption in supervised learning is that training and test data follow the same probability distribution.
3 code implementations • ICLR 2022 • Fei Zhang, Lei Feng, Bo Han, Tongliang Liu, Gang Niu, Tao Qin, Masashi Sugiyama
As the first contribution, we empirically show that the class activation map (CAM), a simple technique for discriminating the learning patterns of each class in images, is surprisingly better at making accurate predictions than the model itself on selecting the true label from candidate labels.
no code implementations • ICLR 2022 • Nan Lu, Zhao Wang, Xiaoxiao Li, Gang Niu, Qi Dou, Masashi Sugiyama
We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model.
no code implementations • 29 Sep 2021 • Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama
Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and flat minima selection.
no code implementations • 29 Sep 2021 • Cheng-Yu Hsieh, Wei-I Lin, Miao Xu, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama
The goal of multi-label learning (MLL) is to associate a given instance with its relevant labels from a set of concepts.
no code implementations • 29 Sep 2021 • Sen Cui, Jingfeng Zhang, Jian Liang, Masashi Sugiyama, ChangShui Zhang
However, an ensemble still wastes the limited capacity of multiple models.
no code implementations • 29 Sep 2021 • Yinghua Gao, Dongxian Wu, Jingfeng Zhang, Shu-Tao Xia, Gang Niu, Masashi Sugiyama
Based on thorough experiments, we find that such trade-off ignores the interactions between the perturbation budget of adversarial training and the magnitude of the backdoor trigger.
no code implementations • 29 Sep 2021 • Yu Yao, Xuefeng Li, Tongliang Liu, Alan Blair, Mingming Gong, Bo Han, Gang Niu, Masashi Sugiyama
Existing methods for learning with noisy labels can be generally divided into two categories: (1) sample selection and label correction based on the memorization effect of neural networks; (2) loss correction with the transition matrix.
1 code implementation • 16 Jul 2021 • Ikko Yamane, Junya Honda, Florian Yger, Masashi Sugiyama
In this paper, we consider the task of predicting $Y$ from $X$ when we have no paired data of them, but we have two separate, independent datasets of $X$ and $Y$ each observed with some mediating variable $U$, that is, we have two datasets $S_X = \{(X_i, U_i)\}$ and $S_Y = \{(U'_j, Y'_j)\}$.
1 code implementation • 11 Jul 2021 • Shota Nakajima, Masashi Sugiyama
Learning from positive and unlabeled (PU) data is an important problem in various applications.
no code implementations • 17 Jun 2021 • Xin-Qiang Cai, Yao-Xiang Ding, Zi-Xuan Chen, Yuan Jiang, Masashi Sugiyama, Zhi-Hua Zhou
In many real-world imitation learning tasks, the demonstrator and the learner have to act under different observation spaces.
no code implementations • 16 Jun 2021 • Yuzhou Cao, Lei Feng, Senlin Shu, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama
We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i. e., the class-posterior probabilities for all the classes) are available.
1 code implementation • NeurIPS 2021 • Qizhou Wang, Feng Liu, Bo Han, Tongliang Liu, Chen Gong, Gang Niu, Mingyuan Zhou, Masashi Sugiyama
Reweighting adversarial data during training has been recently shown to improve adversarial robustness, where data closer to the current decision boundaries are regarded as more critical and given larger weights.
no code implementations • 11 Jun 2021 • Jiaqi Lv, Biao Liu, Lei Feng, Ning Xu, Miao Xu, Bo An, Gang Niu, Xin Geng, Masashi Sugiyama
Partial-label learning (PLL) utilizes instances with PLs, where a PL includes several candidate labels but only one is the true label (TL).
no code implementations • NeurIPS 2021 • Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama
First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function.
1 code implementation • 8 Jun 2021 • Jiaheng Wei, Hangyu Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama, Yang Liu
We provide understandings for the properties of LS and NLS when learning with noisy labels.
Ranked #9 on Learning with noisy labels on CIFAR-10N-Random3
no code implementations • NeurIPS 2021 • Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama
In this way, we also give large-loss but less selected data a try; then, we can better distinguish between the cases (a) and (b) by seeing if the losses effectively decrease with the uncertainty after the try.
Ranked #26 on Image Classification on mini WebVision 1.0
no code implementations • 1 Jun 2021 • Xiaobo Xia, Tongliang Liu, Bo Han, Mingming Gong, Jun Yu, Gang Niu, Masashi Sugiyama
Lots of approaches, e. g., loss correction and label correction, cannot handle such open-set noisy labels well, since they need training data and test data to share the same label space, which does not hold for learning with open-set noisy labels.
1 code implementation • 31 May 2021 • Jingfeng Zhang, Xilie Xu, Bo Han, Tongliang Liu, Gang Niu, Lizhen Cui, Masashi Sugiyama
First, we thoroughly investigate noisy labels (NLs) injection into AT's inner maximization and outer minimization, respectively and obtain the observations on when NL injection benefits AT.
no code implementations • 31 May 2021 • Paavo Parmas, Masashi Sugiyama
Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used to estimate gradients of expectations throughout machine learning and reinforcement learning; however, they are usually explained as simple mathematical tricks, with no insight into their nature.
1 code implementation • 31 Mar 2021 • Zeke Xie, Li Yuan, Zhanxing Zhu, Masashi Sugiyama
It is well-known that stochastic gradient noise (SGN) acts as implicit regularization for deep learning and is essentially important for both optimization and generalization of deep networks.
1 code implementation • 25 Mar 2021 • Yivan Zhang, Masashi Sugiyama
Label noise in multiclass classification is a major obstacle to the deployment of learning systems.
2 code implementations • 12 Mar 2021 • Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama
In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies.
1 code implementation • 4 Mar 2021 • Shuhei M. Yoshida, Takashi Takenouchi, Masashi Sugiyama
To this end, we derive a representation theorem for proper losses in supervised learning, which dualizes the Savage representation.
no code implementations • 1 Mar 2021 • Ziqing Lu, Chang Xu, Bo Du, Takashi Ishida, Lefei Zhang, Masashi Sugiyama
In neural networks, developing regularization algorithms to settle overfitting is one of the major study areas.
1 code implementation • 27 Feb 2021 • Takeshi Teshima, Masashi Sugiyama
Causal graphs (CGs) are compact representations of the knowledge of the data generating processes behind the data distributions.
no code implementations • 15 Feb 2021 • Chen Chen, Jingfeng Zhang, Xilie Xu, Tianlei Hu, Gang Niu, Gang Chen, Masashi Sugiyama
To enhance adversarial robustness, adversarial training learns deep neural networks on the adversarial variants generated by their natural data.
no code implementations • 13 Feb 2021 • Yuzhou Cao, Lei Feng, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama
Weakly supervised learning has drawn considerable attention recently to reduce the expensive time and labor consumption of labeling massive data.
2 code implementations • 10 Feb 2021 • Hanshu Yan, Jingfeng Zhang, Gang Niu, Jiashi Feng, Vincent Y. F. Tan, Masashi Sugiyama
By comparing \textit{non-robust} (normally trained) and \textit{robustified} (adversarially trained) models, we observe that adversarial training (AT) robustifies CNNs by aligning the channel-wise activations of adversarial data with those of their natural counterparts.
1 code implementation • ICLR 2022 • Haoang Chi, Feng Liu, Bo Han, Wenjing Yang, Long Lan, Tongliang Liu, Gang Niu, Mingyuan Zhou, Masashi Sugiyama
In this paper, we demystify assumptions behind NCD and find that high-level semantic features should be shared among the seen and unseen classes.
no code implementations • 6 Feb 2021 • Jianing Zhu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, Hongxia Yang, Mohan Kankanhalli, Masashi Sugiyama
A recent adversarial training (AT) study showed that the number of projected gradient descent (PGD) steps to successfully attack a point (i. e., find an adversarial example in its proximity) is an effective measure of the robustness of this point.
1 code implementation • 4 Feb 2021 • Xuefeng Li, Tongliang Liu, Bo Han, Gang Niu, Masashi Sugiyama
In label-noise learning, the transition matrix plays a key role in building statistically consistent classifiers.
Ranked #14 on Learning with noisy labels on CIFAR-100N
1 code implementation • 4 Feb 2021 • Yivan Zhang, Gang Niu, Masashi Sugiyama
To estimate the transition matrix from noisy data, existing methods often need to estimate the noisy class-posterior, which could be unreliable due to the overconfidence of neural networks.
1 code implementation • 3 Feb 2021 • Xuefeng Du, Jingfeng Zhang, Bo Han, Tongliang Liu, Yu Rong, Gang Niu, Junzhou Huang, Masashi Sugiyama
In adversarial training (AT), the main focus has been the objective and optimizer while the model has been less studied, so that the models being used are still those classic ones in standard training (ST).
1 code implementation • 1 Feb 2021 • Nan Lu, Shida Lei, Gang Niu, Issei Sato, Masashi Sugiyama
SSC can be solved by a standard (multi-class) classification method, and we use the SSC solution to obtain the final binary classifier through a certain linear-fractional transformation.
no code implementations • 19 Jan 2021 • Masato Ishii, Masashi Sugiyama
In this setting, we cannot access source data during adaptation, while unlabeled target data and a model pretrained with source data are given.
no code implementations • 5 Jan 2021 • Nontawat Charoenphakdee, Jongyeong Lee, Masashi Sugiyama
When minimizing the empirical risk in binary classification, it is a common practice to replace the zero-one loss with a surrogate loss to make the learning objective feasible to optimize.
no code implementations • 1 Jan 2021 • Chia-You Chen, Hsuan-Tien Lin, Gang Niu, Masashi Sugiyama
One is to (pre-)train a classifier with examples from known classes, and then transfer the pre-trained classifier to unknown classes using the new examples.
no code implementations • 31 Dec 2020 • Yuko Kuroki, Junya Honda, Masashi Sugiyama
Combinatorial optimization is one of the fundamental research fields that has been extensively studied in theoretical computer science and operations research.
1 code implementation • NeurIPS 2023 • Zeke Xie, Zhiqiang Xu, Jingzhao Zhang, Issei Sato, Masashi Sugiyama
Weight decay is a simple yet powerful regularization technique that has been very widely used in training of deep neural networks (DNNs).
no code implementations • CVPR 2021 • Nontawat Charoenphakdee, Jayakorn Vongkulbhisal, Nuttapong Chairatanakul, Masashi Sugiyama
In this paper, we first prove that the focal loss is classification-calibrated, i. e., its minimizer surely yields the Bayes-optimal classifier and thus the use of the focal loss in classification can be theoretically justified.
1 code implementation • 12 Nov 2020 • Zeke Xie, Fengxiang He, Shaopeng Fu, Issei Sato, DaCheng Tao, Masashi Sugiyama
Thus it motivates us to design a similar mechanism named {\it artificial neural variability} (ANV), which helps artificial neural networks learn some advantages from ``natural'' neural networks.
1 code implementation • 9 Nov 2020 • Bo Han, Quanming Yao, Tongliang Liu, Gang Niu, Ivor W. Tsang, James T. Kwok, Masashi Sugiyama
Classical machine learning implicitly assumes that labels of the training data are sampled from a clean distribution, which can be too restrictive for real-world scenarios.
no code implementations • 5 Nov 2020 • Naoya Otani, Yosuke Otsubo, Tetsuya Koike, Masashi Sugiyama
This problem is substantially different from semi-supervised learning since unlabeled samples are not necessarily difficult samples.
no code implementations • 22 Oct 2020 • Nontawat Charoenphakdee, Zhenghang Cui, Yivan Zhang, Masashi Sugiyama
The goal of classification with rejection is to avoid risky misclassification in error-critical applications such as medical diagnosis and product inspection.
2 code implementations • 22 Oct 2020 • Ruize Gao, Feng Liu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, Masashi Sugiyama
However, it has been shown that the MMD test is unaware of adversarial attacks -- the MMD test failed to detect the discrepancy between natural and adversarial data.
1 code implementation • 20 Oct 2020 • Voot Tangkaratt, Nontawat Charoenphakdee, Masashi Sugiyama
Robust learning from noisy demonstrations is a practical but highly challenging problem in imitation learning.
no code implementations • 5 Oct 2020 • Lei Feng, Senlin Shu, Nan Lu, Bo Han, Miao Xu, Gang Niu, Bo An, Masashi Sugiyama
To alleviate the data requirement for training effective binary classifiers in binary classification, many weakly supervised learning settings have been proposed.
2 code implementations • ICLR 2021 • Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, Mohan Kankanhalli
The belief was challenged by recent studies where we can maintain the robustness and improve the accuracy.
no code implementations • 28 Sep 2020 • Zeke Xie, Issei Sato, Masashi Sugiyama
\citet{loshchilov2018decoupled} demonstrated that $L_{2}$ regularization is not identical to weight decay for adaptive gradient methods, such as Adaptive Momentum Estimation (Adam), and proposed Adam with Decoupled Weight Decay (AdamW).
no code implementations • NeurIPS 2020 • Lei Feng, Jiaqi Lv, Bo Han, Miao Xu, Gang Niu, Xin Geng, Bo An, Masashi Sugiyama
Partial-label learning (PLL) is a multi-class classification problem, where each training example is associated with a set of candidate labels.
no code implementations • 8 Jul 2020 • Tianyi Zhang, Ikko Yamane, Nan Lu, Masashi Sugiyama
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
no code implementations • ICML 2020 • Yu-Ting Chou, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama
In weakly supervised learning, unbiased risk estimator(URE) is a powerful tool for training classifiers when training and test data are drawn from different distributions.
1 code implementation • 29 Jun 2020 • Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama
Specifically, we disentangle the effects of Adaptive Learning Rate and Momentum of the Adam dynamics on saddle-point escaping and minima selection.
no code implementations • ICML 2020 • Yuko Kuroki, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama
Dense subgraph discovery aims to find a dense component in edge-weighted graphs.
1 code implementation • 21 Jun 2020 • Mehdi Abbana Bennani, Thang Doan, Masashi Sugiyama
In this framework, we prove that OGD is robust to Catastrophic Forgetting then derive the first generalisation bound for SGD and OGD for Continual Learning.
no code implementations • NeurIPS 2020 • Takeshi Teshima, Isao Ishikawa, Koichi Tojo, Kenta Oono, Masahiro Ikeda, Masashi Sugiyama
We answer this question by showing a convenient criterion: a CF-INN is universal if its layers contain affine coupling and invertible linear functions as special cases.
no code implementations • NeurIPS 2020 • Taira Tsuchiya, Junya Honda, Masashi Sugiyama
We investigate finite stochastic partial monitoring, which is a general model for sequential learning with limited feedback.
no code implementations • 15 Jun 2020 • Kei Mukaiyama, Issei Sato, Masashi Sugiyama
The prototypical network (ProtoNet) is a few-shot learning framework that performs metric learning and classification using the distance to prototype representations of each class.
1 code implementation • NeurIPS 2020 • Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Jiankang Deng, Gang Niu, Masashi Sugiyama
By this intermediate class, the original transition matrix can then be factorized into the product of two easy-to-estimate transition matrices.
1 code implementation • NeurIPS 2020 • Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, DaCheng Tao, Masashi Sugiyama
Learning with the \textit{instance-dependent} label noise is challenging, because it is hard to model such real-world noise.
no code implementations • 13 Jun 2020 • Masahiro Fujisawa, Takeshi Teshima, Issei Sato, Masashi Sugiyama
Approximate Bayesian computation (ABC) is a likelihood-free inference method that has been employed in various applications.
no code implementations • 11 Jun 2020 • Han Bao, Takuya Shimada, Liyuan Xu, Issei Sato, Masashi Sugiyama
A classifier built upon the representations is expected to perform well in downstream classification; however, little theory has been given in literature so far and thereby the relationship between similarity and classification has remained elusive.
1 code implementation • NeurIPS 2020 • Tongtong Fang, Nan Lu, Gang Niu, Masashi Sugiyama
Under distribution shift (DS) where the training data distribution differs from the test one, a powerful technique is importance weighting (IW) which handles DS in two separate steps: weight estimation (WE) estimates the test-over-training density ratio and weighted classification (WC) trains the classifier from weighted training data.
no code implementations • 28 May 2020 • Han Bao, Clayton Scott, Masashi Sugiyama
Adversarially robust classification seeks a classifier that is insensitive to adversarial perturbations of test patterns.
1 code implementation • NeurIPS 2020 • Yivan Zhang, Nontawat Charoenphakdee, Zhenguo Wu, Masashi Sugiyama
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances instead of individual instances, while the goal is still to predict labels of unseen individuals.
no code implementations • 20 Mar 2020 • Jie Luo, Guangshen Ma, Sarah Frisken, Parikshit Juvekar, Nazim Haouchine, Zhe Xu, Yiming Xiao, Alexandra Golby, Patrick Codd, Masashi Sugiyama, William Wells III
In this study, we use the variogram to screen the manually annotated landmarks in two datasets used to benchmark registration in image-guided neurosurgeries.
no code implementations • 10 Mar 2020 • Hideaki Imamura, Nontawat Charoenphakdee, Futoshi Futami, Issei Sato, Junya Honda, Masashi Sugiyama
If the black-box function varies with time, then time-varying Bayesian optimization is a promising framework.
1 code implementation • ICML 2020 • Jingfeng Zhang, Xilie Xu, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama, Mohan Kankanhalli
Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models.
1 code implementation • ICML 2020 • Takashi Ishida, Ikko Yamane, Tomoya Sakai, Gang Niu, Masashi Sugiyama
We experimentally show that flooding improves performance and, as a byproduct, induces a double descent curve of the test loss.
1 code implementation • ICML 2020 • Jiaqi Lv, Miao Xu, Lei Feng, Gang Niu, Xin Geng, Masashi Sugiyama
Partial-label learning (PLL) is a typical weakly supervised learning problem, where each training instance is equipped with a set of candidate labels among which only one is the true label.
1 code implementation • ICML 2020 • Takeshi Teshima, Issei Sato, Masashi Sugiyama
We take the structural equations in causal modeling as an example and propose a novel DA method, which is shown to be useful both theoretically and experimentally.
no code implementations • ICLR 2021 • Zeke Xie, Issei Sato, Masashi Sugiyama
Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training deep networks in practice.
no code implementations • ICLR 2022 • Yu Yao, Tongliang Liu, Bo Han, Mingming Gong, Gang Niu, Masashi Sugiyama, DaCheng Tao
Hitherto, the distributional-assumption-free CPE methods rely on a critical assumption that the support of the positive data distribution cannot be contained in the support of the negative data distribution.
no code implementations • 3 Feb 2020 • Soham Dan, Han Bao, Masashi Sugiyama
We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy S-D data.
no code implementations • 29 Jan 2020 • Kazuhiko Shinoda, Hirotaka Kaji, Masashi Sugiyama
Positive-confidence (Pconf) classification [Ishida et al., 2018] is a promising weakly-supervised learning method which trains a binary classifier only from positive data equipped with confidence.
no code implementations • 11 Jan 2020 • Antonin Berthon, Bo Han, Gang Niu, Tongliang Liu, Masashi Sugiyama
We find with the help of confidence scores, the transition distribution of each instance can be approximately estimated.
no code implementations • ICML 2020 • Lei Feng, Takuo Kaneko, Bo Han, Gang Niu, Bo An, Masashi Sugiyama
In this paper, we propose a novel problem setting to allow MCLs for each example and two ways for learning with MCLs.
no code implementations • 20 Nov 2019 • Jingfeng Zhang, Bo Han, Gang Niu, Tongliang Liu, Masashi Sugiyama
Deep neural networks (DNNs) are incredibly brittle due to adversarial examples.
1 code implementation • EACL 2021 • Alon Jacovi, Gang Niu, Yoav Goldberg, Masashi Sugiyama
We consider the situation in which a user has collected a small set of documents on a cohesive topic, and they want to retrieve additional documents on this topic from a large collection.
no code implementations • 20 Oct 2019 • Nan Lu, Tianyi Zhang, Gang Niu, Masashi Sugiyama
The recently proposed unlabeled-unlabeled (UU) classification method allows us to train a binary classifier only from two unlabeled datasets with different class priors.
no code implementations • 14 Oct 2019 • Paavo Parmas, Masashi Sugiyama
Reparameterization (RP) and likelihood ratio (LR) gradient estimators are used throughout machine and reinforcement learning; however, they are usually explained as simple mathematical tricks without providing any insight into their nature.
1 code implementation • 10 Oct 2019 • Yivan Zhang, Nontawat Charoenphakdee, Masashi Sugiyama
Weakly-supervised learning is a paradigm for alleviating the scarcity of labeled data by leveraging lower-quality but larger-scale supervision signals.
no code implementations • IJCNLP 2019 • Nontawat Charoenphakdee, Jongyeong Lee, Yiping Jin, Dittaya Wanvarie, Masashi Sugiyama
We consider a document classification problem where document labels are absent but only relevant keywords of a target class and unlabeled documents are given.
3 code implementations • 3 Oct 2019 • Johannes Ackermann, Volker Gabler, Takayuki Osa, Masashi Sugiyama
Finally, we investigate the application of multi-agent methods to high-dimensional robotic tasks and show that our approach can be used to learn decentralized policies in this domain.
Multi-agent Reinforcement Learning Reinforcement Learning +1
no code implementations • 25 Sep 2019 • Feng Liu, Jie Lu, Bo Han, Gang Niu, Guangquan Zhang, Masashi Sugiyama
Hence, we consider a new, more realistic and more challenging problem setting, where classifiers have to be trained with noisy labeled data from SD and unlabeled data from TD---we name it wildly UDA (WUDA).
Unsupervised Domain Adaptation Wildly Unsupervised Domain Adaptation
no code implementations • 15 Sep 2019 • Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama
However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs.
no code implementations • 26 Aug 2019 • Motoya Ohnishi, Gennaro Notomista, Masashi Sugiyama, Magnus Egerstedt
When deploying autonomous agents in unstructured environments over sustained periods of time, adaptability and robustness oftentimes outweigh optimality as a primary consideration.
no code implementations • 21 Aug 2019 • Jie Luo, Sarah Frisken, Duo Wang, Alexandra Golby, Masashi Sugiyama, William M. Wells III
Probabilistic image registration (PIR) methods provide measures of registration uncertainty, which could be a surrogate for assessing the registration error.
1 code implementation • 24 Jul 2019 • Zhenghang Cui, Nontawat Charoenphakdee, Issei Sato, Masashi Sugiyama
Although learning from triplet comparison data has been considered in many applications, an important fundamental question of whether we can learn a classifier only from triplet comparison data has remained unanswered.
no code implementations • 22 Jul 2019 • Wenkai Xu, Gang Niu, Aapo Hyvärinen, Masashi Sugiyama
On the other hand, compressing the vertices while preserving the directed edge information provides a way to learn the small-scale representation of a directed graph.
1 code implementation • NeurIPS 2019 • Xiaobo Xia, Tongliang Liu, Nannan Wang, Bo Han, Chen Gong, Gang Niu, Masashi Sugiyama
Existing theories have shown that the transition matrix can be learned by exploiting \textit{anchor points} (i. e., data points that belong to a specific class almost surely).
Ranked #18 on Learning with noisy labels on CIFAR-10N-Random3
1 code implementation • NeurIPS 2019 • Liyuan Xu, Junya Honda, Gang Niu, Masashi Sugiyama
We propose two practical methods for uncoupled regression from pairwise comparison data and show that the learned regression model converges to the optimal model with the optimal parametric convergence rate when the target variable distributes uniformly.
no code implementations • 29 May 2019 • Han Bao, Masashi Sugiyama
A clue to tackle their direct optimization is a calibrated surrogate utility, which is a tractable lower bound of the true utility function representing a given metric.
1 code implementation • 29 May 2019 • Yuangang Pan, WeiJie Chen, Gang Niu, Ivor W. Tsang, Masashi Sugiyama
Specifically, the properties of our CoarsenRank are summarized as follows: (1) CoarsenRank is designed for mild model misspecification, which assumes there exist the ideal preferences (consistent with model assumption) that locates in a neighborhood of the actual preferences.
2 code implementations • 28 May 2019 • Kenshin Abe, Zijian Xu, Issei Sato, Masashi Sugiyama
There have been increasing challenges to solve combinatorial optimization problems by machine learning.
1 code implementation • 19 May 2019 • Feng Liu, Jie Lu, Bo Han, Gang Niu, Guangquan Zhang, Masashi Sugiyama
Hence, we consider a new, more realistic and more challenging problem setting, where classifiers have to be trained with noisy labeled data from SD and unlabeled data from TD -- we name it wildly UDA (WUDA).
Unsupervised Domain Adaptation Wildly Unsupervised Domain Adaptation
no code implementations • 26 Apr 2019 • Takuya Shimada, Han Bao, Issei Sato, Masashi Sugiyama
In this paper, we derive an unbiased risk estimator which can handle all of similarities/dissimilarities and unlabeled data.
no code implementations • ICLR Workshop LLD 2019 • Cheng-Yu Hsieh, Miao Xu, Gang Niu, Hsuan-Tien Lin, Masashi Sugiyama
To address the need, we propose a special weakly supervised MLL problem that not only focuses on the situation of limited fine-grained supervision but also leverages the hierarchical relationship between the coarse concepts and the fine-grained ones.
no code implementations • 13 Mar 2019 • Masato Ishii, Takashi Takenouchi, Masashi Sugiyama
In this paper, we propose a novel domain adaptation method that can be applied without target data.
no code implementations • 27 Feb 2019 • Yuko Kuroki, Liyuan Xu, Atsushi Miyauchi, Junya Honda, Masashi Sugiyama
Based on our approximation algorithm, we propose novel bandit algorithms for the top-k selection problem, and prove that our algorithms run in polynomial time.
no code implementations • 4 Feb 2019 • Takuo Kaneko, Issei Sato, Masashi Sugiyama
We consider the problem of online multiclass classification with partial feedback, where an algorithm predicts a class for a new instance in each round and only receives its correctness.