no code implementations • 15 Apr 2025 • Alireza Salehi, Mohammadreza Salehi, Reshad Hosseini, Cees G. M. Snoek, Makoto Yamada, Mohammad Sabokrou
Anomaly Detection (AD) involves identifying deviations from normal data distributions and is critical in fields such as medical diagnostics and industrial defect detection.
no code implementations • 31 Mar 2025 • Weijie Liu, Han Bao, Makoto Yamada, Zenan Huang, Nenggan Zheng, Hui Qian
The first component is the matching budget constraints on each row and column of a transport plan, which specify how many points can be matched to a point at most.
1 code implementation • 3 Feb 2025 • Siqi Zeng, Yifei He, Weiqiu You, Yifan Hao, Yao-Hung Hubert Tsai, Makoto Yamada, Han Zhao
Task vectors, which are derived from the difference between pre-trained and fine-tuned model weights, enable flexible task adaptation and model merging through arithmetic operations such as addition and negation.
1 code implementation • 19 Dec 2024 • Daniel Yang, Yao-Hung Hubert Tsai, Makoto Yamada
The rise of large language models (LLMs) and their tight integration into our daily life make it essential to dedicate efforts towards their trustworthiness.
1 code implementation • 2 Dec 2024 • Aditya Sinha, Siqi Zeng, Makoto Yamada, Han Zhao
Most real-world datasets consist of a natural hierarchy between classes or an inherent label structure that is either already available or can be constructed cheaply.
no code implementations • 11 Nov 2024 • Kira M. Düsterwald, Samo Hromadka, Makoto Yamada
The performance of unsupervised methods such as clustering depends on the choice of distance metric between features, or ground metric.
no code implementations • 12 Oct 2024 • Pengfei He, Yingqian Cui, Han Xu, Hui Liu, Makoto Yamada, Jiliang Tang, Yue Xing
To better understand how ICL integrates the examples with the knowledge learned by the LLM during pre-training (i. e., pre-training knowledge) and how the examples impact ICL, this paper conducts a theoretical study in binary classification tasks.
1 code implementation • 4 Oct 2024 • Siqi Zeng, Sixian Du, Makoto Yamada, Han Zhao
To address this limitation, under the CPCC framework, we propose to use the Earth Mover's Distance (EMD) to measure the pairwise distances among classes in the feature space.
1 code implementation • 16 Jun 2024 • Yuping Lin, Pengfei He, Han Xu, Yue Xing, Makoto Yamada, Hui Liu, Jiliang Tang
Large language models (LLMs) are susceptible to a type of attack known as jailbreaking, which misleads LLMs to output harmful contents.
no code implementations • 23 May 2024 • Satoki Ishikawa, Makoto Yamada, Han Bao, Yuki Takezawa
In this work, we aim to explore the temporal prediction hypothesis from the perspective of self-supervised learning.
no code implementations • 23 May 2024 • Yuki Takezawa, Han Bao, Ryoma Sato, Kenta Niwa, Makoto Yamada
We numerically validated our convergence results using a synthetic function and demonstrated the effectiveness of our proposed methods using LSTM, Nano-GPT, and T5.
no code implementations • 16 Oct 2023 • Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai
We find that the model performance depends on the combination of TWD and probability model, and that the Jeffrey divergence regularization helps in model training.
1 code implementation • 13 Oct 2023 • Ryoma Sato, Yuki Takezawa, Han Bao, Kenta Niwa, Makoto Yamada
LLMs can generate texts that cannot be distinguished from human-written texts.
no code implementations • 2 Oct 2023 • Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada
Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts.
1 code implementation • 28 Jul 2023 • Peter Naylor, Diego Di Carlo, Arianna Traviglia, Makoto Yamada, Marco Fiorucci
We outperform the previous methods by a margin of 10% in the intersection over union metric.
2 code implementations • 20 Feb 2023 • Ryuichiro Hataya, Makoto Yamada
The essential difficulty of gradient-based bilevel optimization using implicit differentiation is to estimate the inverse Hessian vector product with respect to neural network parameters.
1 code implementation • 14 Feb 2023 • Marco Fiorucci, Peter Naylor, Makoto Yamada
The method is based on unbalanced optimal transport and can be generalised to any change detection problem with LiDAR data.
no code implementations • 4 Feb 2023 • Ayato Toyokuni, Makoto Yamada
More specifically, we extend the GraphLIME method for node explanation with a group lasso and a fused lasso-based node explanation method.
no code implementations • 30 Sep 2022 • Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, Makoto Yamada
In this study, we propose Momentum Tracking, which is a method with momentum whose convergence rate is proven to be independent of data heterogeneity.
1 code implementation • 21 Aug 2022 • Ryoma Sato, Makoto Yamada, Hisashi Kashima
The main difficulty in investigating the effects is that we need to know counterfactual results, which are not available in reality.
no code implementations • 8 Aug 2022 • Yanbin Liu, Girish Dwivedi, Farid Boussaid, Frank Sanfilippo, Makoto Yamada, Mohammed Bennamoun
Novel 3D network architectures are proposed for both the generator and discriminator of the GAN model to significantly reduce the number of parameters while maintaining the quality of image generation.
1 code implementation • 22 Jul 2022 • Peter Naylor, Yao-Hung Hubert Tsai, Marick Laé, Makoto Yamada
Recent developments in self-supervised learning give us the possibility to further reduce human intervention in multi-step pipelines where the focus evolves around particular objects of interest.
no code implementations • 24 Jun 2022 • Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi
In this paper, we aim to approximate the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where TWD is a 1-Wasserstein distance with tree-based embedding and can be computed in linear time with respect to the number of nodes on a tree.
no code implementations • 1 Jun 2022 • Yoichi Chikahara, Makoto Yamada, Hisashi Kashima
Finding the features relevant to the difference in treatment effects is essential to unveil the underlying causal mechanisms.
no code implementations • 23 May 2022 • Yuki Takezawa, Kenta Niwa, Makoto Yamada
However, the convergence rate of the ECL is provided only when the objective function is convex, and has not been shown in a standard machine learning setting where the objective function is non-convex.
no code implementations • 8 May 2022 • Yuki Takezawa, Kenta Niwa, Makoto Yamada
Moreover, we demonstrate that the C-ECL is more robust to heterogeneous data than the Gossip-based algorithms.
no code implementations • 16 Oct 2021 • Dinesh Singh, Hardik Tankaria, Makoto Yamada
However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives.
no code implementations • NeurIPS 2021 • Tam Le, Truyen Nguyen, Makoto Yamada, Jose Blanchet, Viet Anh Nguyen
In this paper, we propose a novel and coherent scheme for kernel-reweighted regression by reparametrizing the sample weights using a doubly non-negative matrix.
1 code implementation • 8 Sep 2021 • Yuki Takezawa, Ryoma Sato, Zornitsa Kozareva, Sujith Ravi, Makoto Yamada
By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions.
1 code implementation • 30 May 2021 • Ryoma Sato, Makoto Yamada, Hisashi Kashima
The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets.
no code implementations • EACL 2021 • Ayato Toyokuni, Sho Yokoi, Hisashi Kashima, Makoto Yamada
The problem of estimating the probability distribution of labels has been widely studied as a label distribution learning (LDL) problem, whose applications include age estimation, emotion analysis, and semantic segmentation.
no code implementations • NeurIPS 2021 • Hiroaki Yamada, Makoto Yamada
A recently introduced technique for a sparse optimization problem called "safe screening" allows us to identify irrelevant variables in the early stage of optimization.
no code implementations • 27 Jan 2021 • Yuki Takezawa, Ryoma Sato, Makoto Yamada
Specifically, we rewrite the Wasserstein distance on the tree metric by the parent-child relationships of a tree and formulate it as a continuous optimization problem using a contrastive loss.
no code implementations • 1 Jan 2021 • Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada
To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence.
no code implementations • 1 Jan 2021 • ZiHao Wang, Xu Zhao, Tam Le, Hao Wu, Yong Zhang, Makoto Yamada
In this work, we consider OT over tree metrics, which is more general than the sliced Wasserstein and includes the sliced Wasserstein as a special case, and we propose a fast minimization algorithm in $O(n)$ for the optimal Wasserstein-1 transport plan between two distributions in the tree structure.
2 code implementations • 29 Oct 2020 • Tobias Freidling, Benjamin Poignard, Héctor Climente-González, Makoto Yamada
Detecting influential features in non-linear and/or high-dimensional data is a challenging and increasingly important task in machine learning.
1 code implementation • 19 Oct 2020 • Ryoma Sato, Makoto Yamada, Hisashi Kashima
We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue.
1 code implementation • 13 Jun 2020 • Vu Nguyen, Tam Le, Makoto Yamada, Michael A. Osborne
Building upon tree-Wasserstein (TW), which is a negative definite variant of OT, we develop a novel discrepancy for neural architectures, and demonstrate it within a Gaussian process surrogate model for the sequential NAS settings.
1 code implementation • NeurIPS 2020 • Yao-Hung Hubert Tsai, Han Zhao, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov
Since its inception, the neural estimation of mutual information (MI) has demonstrated the empirical success of modeling expected dependency between high-dimensional random variables.
1 code implementation • NeurIPS 2020 • Ryoma Sato, Makoto Yamada, Hisashi Kashima
This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the first time.
1 code implementation • 25 May 2020 • Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada
To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence.
no code implementations • 25 Mar 2020 • Liu Ziyin, ZiHao Wang, Makoto Yamada, Masahito Ueda
We propose a novel regularization method, called \textit{volumization}, for neural networks.
1 code implementation • 21 Feb 2020 • Mathis Petrovich, Makoto Yamada
Regression is an important task in machine learning and data mining.
1 code implementation • 8 Feb 2020 • Ryoma Sato, Makoto Yamada, Hisashi Kashima
Through experiments, we show that the addition of random features enables GNNs to solve various problems that normal GNNs, including the graph convolutional networks (GCNs) and graph isomorphism networks (GINs), cannot solve.
1 code implementation • 5 Feb 2020 • Ryoma Sato, Marco Cuturi, Makoto Yamada, Hisashi Kashima
Building on \cite{memoli-2011}, who proposed to represent each point in each distribution as the 1D distribution of its distances to all other points, we introduce in this paper the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations.
2 code implementations • 23 Jan 2020 • Dinesh Singh, Héctor Climente-González, Mathis Petrovich, Eiryo Kawakami, Makoto Yamada
Because a large number of parameters in the selection and reconstruction layers can easily result in overfitting under a limited number of samples, we use two tiny networks to predict the large, virtual weight matrices of the selection and reconstruction layers.
2 code implementations • 17 Jan 2020 • Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, Dawei Yin, Yi Chang
In this paper, we propose GraphLIME, a local interpretable model explanation for graphs using the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear feature selection method.
no code implementations • IJCNLP 2019 • Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov
This new formulation gives us a better way to understand individual components of the Transformer{'}s attention, such as the better way to integrate the positional embedding.
3 code implementations • NeurIPS 2019 • Jen Ning Lim, Makoto Yamada, Bernhard Schölkopf, Wittawat Jitkrittum
The first test, building on the post selection inference framework, provably controls the number of best models that are wrongly declared worse (false positive rate).
1 code implementation • 14 Oct 2019 • Jen Ning Lim, Makoto Yamada, Wittawat Jitkrittum, Yoshikazu Terada, Shigeyuki Matsui, Hidetoshi Shimodaira
An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent information to be used again after selection.
1 code implementation • 10 Oct 2019 • Tam Le, Nhat Ho, Makoto Yamada
By leveraging a tree structure, we propose to align \textit{flows} from a root to each support instead of pair-wise tree metrics of supports, i. e., flows from a support to another, in GW.
no code implementations • 10 Oct 2019 • Tam Le, Viet Huynh, Nhat Ho, Dinh Phung, Makoto Yamada
We study in this paper a variant of Wasserstein barycenter problem, which we refer to as tree-Wasserstein barycenter, by leveraging a specific class of ground metrics, namely tree metrics, for Wasserstein distance.
1 code implementation • 5 Sep 2019 • Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang
To estimate the mutual information from data, a common practice is preparing a set of paired samples $\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n \stackrel{\mathrm{i. i. d.
1 code implementation • EMNLP 2019 • Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov
This new formulation gives us a better way to understand individual components of the Transformer's attention, such as the better way to integrate the positional embedding.
no code implementations • NeurIPS 2019 • Ryoma Sato, Makoto Yamada, Hisashi Kashima
We theoretically demonstrate that the most powerful GNN can learn approximation algorithms for the minimum dominating set problem and the minimum vertex cover problem with some approximation ratios with the aid of the theory of distributed local algorithms.
no code implementations • 26 Feb 2019 • Tatsuya Shiraishi, Tam Le, Hisashi Kashima, Makoto Yamada
In this paper, we propose the topological Bayesian optimization, which can efficiently find an optimal solution from structured data using \emph{topological information}.
1 code implementation • 26 Feb 2019 • Ryoma Sato, Makoto Yamada, Hisashi Kashima
We propose HiSampler, the hard instance sampler, to model the hard instance distribution of graph algorithms.
2 code implementations • NeurIPS 2019 • Tam Le, Makoto Yamada, Kenji Fukumizu, Marco Cuturi
Optimal transport (\OT) theory defines a powerful set of tools to compare probability distributions.
no code implementations • 23 Jan 2019 • Ryoma Sato, Makoto Yamada, Hisashi Kashima
The recent advancements in graph neural networks (GNNs) have led to state-of-the-art performances in various applications, including chemo-informatics, question-answering systems, and recommender systems.
no code implementations • EMNLP 2018 • Tanmoy Mukherjee, Makoto Yamada, Timothy Hospedales
Word translation, or bilingual dictionary induction, is an important capability that impacts many multilingual language processing tasks.
no code implementations • ICLR 2019 • Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Ichiro Takeuchi, Ruslan Salakhutdinov, Kenji Fukumizu
In the paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions.
no code implementations • 15 Feb 2018 • Yao-Hung Hubert Tsai, Makoto Yamada, Denny Wu, Ruslan Salakhutdinov, Ichiro Takeuchi, Kenji Fukumizu
"Which Generative Adversarial Networks (GANs) generates the most plausible images?"
no code implementations • 15 Feb 2018 • Denny Wu, Yixiu Zhao, Yao-Hung Hubert Tsai, Makoto Yamada, Ruslan Salakhutdinov
To address this issue, we propose to measure the dependency instead of MI between layers in DNNs.
1 code implementation • NeurIPS 2018 • Tam Le, Makoto Yamada
To deal with it, an emerged approach is to use kernel methods, and an appropriate geometry for PDs is an important factor to measure the similarity of PDs.
no code implementations • 16 Nov 2017 • Tanmoy Mukherjee, Makoto Yamada, Timothy M. Hospedales
In this paper we introduce Deep Matching Autoencoders (DMAE), which learn a common latent space and pairing from unpaired multi-modal data.
no code implementations • 15 May 2017 • Kishan Wimalawarne, Makoto Yamada, Hiroshi Mamitsuka
We propose a set of convex low rank inducing norms for a coupled matrices and tensors (hereafter coupled tensors), which shares information between matrices and tensors through common modes.
no code implementations • 21 Feb 2017 • Makoto Yamada, Song Liu, Samuel Kaski
We propose an inlier-based outlier detection method capable of both identifying the outliers and explaining why they are outliers, by identifying the outlier-specific features.
no code implementations • NeurIPS 2016 • Tomoharu Iwata, Makoto Yamada
With the proposed model, all views of a non-anomalous instance are assumed to be generated from a single latent vector.
no code implementations • 12 Oct 2016 • Makoto Yamada, Yuta Umezu, Kenji Fukumizu, Ichiro Takeuchi
We propose a novel kernel based post selection inference (PSI) algorithm, which can not only handle non-linearity in data but also structured output such as multi-dimensional and multi-label outputs.
no code implementations • 14 Aug 2016 • Makoto Yamada, Jiliang Tang, Jose Lugo-Martinez, Ermin Hodzic, Raunak Shrestha, Avishek Saha, Hua Ouyang, Dawei Yin, Hiroshi Mamitsuka, Cenk Sahinalp, Predrag Radivojac, Filippo Menczer, Yi Chang
However, sophisticated learning models are computationally unfeasible for data with millions of features.
no code implementations • 22 Mar 2016 • Makoto Yamada, Koh Takeuchi, Tomoharu Iwata, John Shawe-Taylor, Samuel Kaski
We introduce the localized Lasso, which is suited for learning models that are both interpretable and have a high predictive power in problems with high dimensionality $d$ and small sample size $n$.
1 code implementation • 4 Jul 2015 • Makoto Yamada, Wenzhao Lian, Amit Goyal, Jianhui Chen, Kishan Wimalawarne, Suleiman A. Khan, Samuel Kaski, Hiroshi Mamitsuka, Yi Chang
We propose the convex factorization machine (CFM), which is a convex variant of the widely used Factorization Machines (FMs).
no code implementations • 5 Dec 2014 • Suriya Gunasekar, Makoto Yamada, Dawei Yin, Yi Chang
We address the collective matrix completion problem of jointly recovering a collection of matrices with shared structure from partial (and potentially noisy) observations.
no code implementations • 13 Nov 2014 • Tomoharu Iwata, Makoto Yamada
We propose a nonparametric Bayesian probabilistic latent variable model for multi-view anomaly detection, which is the task of finding instances that have inconsistent views.
no code implementations • 10 Nov 2014 • Makoto Yamada, Avishek Saha, Hua Ouyang, Dawei Yin, Yi Chang
We propose a feature selection method that finds non-redundant features from a large and high-dimensional data in nonlinear way.
1 code implementation • 2 Mar 2012 • Song Liu, Makoto Yamada, Nigel Collier, Masashi Sugiyama
The objective of change-point detection is to discover abrupt property changes lying behind time-series data.
no code implementations • 2 Feb 2012 • Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P. Xing, Masashi Sugiyama
We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures.
no code implementations • NeurIPS 2011 • Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, Masashi Sugiyama
Divergence estimators based on direct approximation of density-ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test.