Search Results for author: Makoto Yamada

Found 68 papers, 29 papers with code

An Empirical Study of Self-supervised Learning with Wasserstein Distance

no code implementations16 Oct 2023 Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai

We find that the model performance depends on the combination of TWD and probability model, and that the Jeffrey divergence regularization helps in model training.

Representation Learning Self-Supervised Learning

Embarrassingly Simple Text Watermarks

1 code implementation13 Oct 2023 Ryoma Sato, Yuki Takezawa, Han Bao, Kenta Niwa, Makoto Yamada

LLMs can generate texts that cannot be distinguished from human-written texts.

Necessary and Sufficient Watermark for Large Language Models

no code implementations2 Oct 2023 Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada

Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts.

Machine Translation

Implicit neural representation for change detection

1 code implementation28 Jul 2023 Peter Naylor, Diego Di Carlo, Arianna Traviglia, Makoto Yamada, Marco Fiorucci

We outperform the previous methods by a margin of 10% in the intersection over union metric.

Change Detection

Nystrom Method for Accurate and Scalable Implicit Differentiation

2 code implementations20 Feb 2023 Ryuichiro Hataya, Makoto Yamada

The essential difficulty of gradient-based bilevel optimization using implicit differentiation is to estimate the inverse Hessian vector product with respect to neural network parameters.

Bilevel Optimization Hyperparameter Optimization +1

Optimal Transport for Change Detection on LiDAR Point Clouds

1 code implementation14 Feb 2023 Marco Fiorucci, Peter Naylor, Makoto Yamada

The method is based on unbalanced optimal transport and can be generalised to any change detection problem with LiDAR data.

Change Detection Multi-class Classification +1

Structural Explanations for Graph Neural Networks using HSIC

no code implementations4 Feb 2023 Ayato Toyokuni, Makoto Yamada

More specifically, we extend the GraphLIME method for node explanation with a group lasso and a fused lasso-based node explanation method.

Graph Classification Link Prediction

Momentum Tracking: Momentum Acceleration for Decentralized Deep Learning on Heterogeneous Data

no code implementations30 Sep 2022 Yuki Takezawa, Han Bao, Kenta Niwa, Ryoma Sato, Makoto Yamada

In this study, we propose Momentum Tracking, which is a method with momentum whose convergence rate is proven to be independent of data heterogeneity.

Image Classification

Twin Papers: A Simple Framework of Causal Inference for Citations via Coupling

1 code implementation21 Aug 2022 Ryoma Sato, Makoto Yamada, Hisashi Kashima

The main difficulty in investigating the effects is that we need to know counterfactual results, which are not available in reality.

Causal Inference counterfactual

Inflating 2D Convolution Weights for Efficient Generation of 3D Medical Images

no code implementations8 Aug 2022 Yanbin Liu, Girish Dwivedi, Farid Boussaid, Frank Sanfilippo, Makoto Yamada, Mohammed Bennamoun

Novel 3D network architectures are proposed for both the generator and discriminator of the GAN model to significantly reduce the number of parameters while maintaining the quality of image generation.

Image Generation Medical Image Generation

Scale dependant layer for self-supervised nuclei encoding

1 code implementation22 Jul 2022 Peter Naylor, Yao-Hung Hubert Tsai, Marick Laé, Makoto Yamada

Recent developments in self-supervised learning give us the possibility to further reduce human intervention in multi-step pipelines where the focus evolves around particular objects of interest.

Self-Supervised Learning

Approximating 1-Wasserstein Distance with Trees

no code implementations24 Jun 2022 Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi

In this paper, we aim to approximate the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where TWD is a 1-Wasserstein distance with tree-based embedding and can be computed in linear time with respect to the number of nodes on a tree.

Feature Selection for Discovering Distributional Treatment Effect Modifiers

no code implementations1 Jun 2022 Yoichi Chikahara, Makoto Yamada, Hisashi Kashima

Finding the features relevant to the difference in treatment effects is essential to unveil the underlying causal mechanisms.

Feature Importance feature selection

Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization

no code implementations23 May 2022 Yuki Takezawa, Kenta Niwa, Makoto Yamada

However, the convergence rate of the ECL is provided only when the objective function is convex, and has not been shown in a standard machine learning setting where the objective function is non-convex.

Communication Compression for Decentralized Learning with Operator Splitting Methods

no code implementations8 May 2022 Yuki Takezawa, Kenta Niwa, Makoto Yamada

Moreover, we demonstrate that the C-ECL is more robust to heterogeneous data than the Gossip-based algorithms.

Nys-Newton: Nyström-Approximated Curvature for Stochastic Optimization

no code implementations16 Oct 2021 Dinesh Singh, Hardik Tankaria, Makoto Yamada

However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives.

Stochastic Optimization

Adversarial Regression with Doubly Non-negative Weighting Matrices

no code implementations NeurIPS 2021 Tam Le, Truyen Nguyen, Makoto Yamada, Jose Blanchet, Viet Anh Nguyen

In this paper, we propose a novel and coherent scheme for kernel-reweighted regression by reparametrizing the sample weights using a doubly non-negative matrix.

regression

Fixed Support Tree-Sliced Wasserstein Barycenter

1 code implementation8 Sep 2021 Yuki Takezawa, Ryoma Sato, Zornitsa Kozareva, Sujith Ravi, Makoto Yamada

By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions.

Re-evaluating Word Mover's Distance

1 code implementation30 May 2021 Ryoma Sato, Makoto Yamada, Hisashi Kashima

The original study on WMD reported that WMD outperforms classical baselines such as bag-of-words (BOW) and TF-IDF by significant margins in various datasets.

Computationally Efficient Wasserstein Loss for Structured Labels

no code implementations EACL 2021 Ayato Toyokuni, Sho Yokoi, Hisashi Kashima, Makoto Yamada

The problem of estimating the probability distribution of labels has been widely studied as a label distribution learning (LDL) problem, whose applications include age estimation, emotion analysis, and semantic segmentation.

Age Estimation Emotion Recognition +3

Dynamic Sasvi: Strong Safe Screening for Norm-Regularized Least Squares

no code implementations NeurIPS 2021 Hiroaki Yamada, Makoto Yamada

A recently introduced technique for a sparse optimization problem called "safe screening" allows us to identify irrelevant variables in the early stage of optimization.

Supervised Tree-Wasserstein Distance

no code implementations27 Jan 2021 Yuki Takezawa, Ryoma Sato, Makoto Yamada

Specifically, we rewrite the Wasserstein distance on the tree metric by the parent-child relationships of a tree and formulate it as a continuous optimization problem using a contrastive loss.

Document Classification Metric Learning

Adaptive Tree Wasserstein Minimization for Hierarchical Generative Modeling

no code implementations1 Jan 2021 ZiHao Wang, Xu Zhao, Tam Le, Hao Wu, Yong Zhang, Makoto Yamada

In this work, we consider OT over tree metrics, which is more general than the sliced Wasserstein and includes the sliced Wasserstein as a special case, and we propose a fast minimization algorithm in $O(n)$ for the optimal Wasserstein-1 transport plan between two distributions in the tree structure.

Unsupervised Domain Adaptation

Feature-Robust Optimal Transport for High-Dimensional Data

no code implementations1 Jan 2021 Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada

To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence.

feature selection Semantic correspondence +1

Post-selection inference with HSIC-Lasso

2 code implementations29 Oct 2020 Tobias Freidling, Benjamin Poignard, Héctor Climente-González, Makoto Yamada

Detecting influential features in non-linear and/or high-dimensional data is a challenging and increasingly important task in machine learning.

LEMMA Variable Selection

Poincare: Recommending Publication Venues via Treatment Effect Estimation

1 code implementation19 Oct 2020 Ryoma Sato, Makoto Yamada, Hisashi Kashima

We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue.

Causal Inference Recommendation Systems

Optimal Transport Kernels for Sequential and Parallel Neural Architecture Search

1 code implementation13 Jun 2020 Vu Nguyen, Tam Le, Makoto Yamada, Michael A. Osborne

Building upon tree-Wasserstein (TW), which is a negative definite variant of OT, we develop a novel discrepancy for neural architectures, and demonstrate it within a Gaussian process surrogate model for the sequential NAS settings.

Neural Architecture Search

Neural Methods for Point-wise Dependency Estimation

1 code implementation NeurIPS 2020 Yao-Hung Hubert Tsai, Han Zhao, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov

Since its inception, the neural estimation of mutual information (MI) has demonstrated the empirical success of modeling expected dependency between high-dimensional random variables.

Cross-Modal Retrieval Representation Learning +1

Fast Unbalanced Optimal Transport on a Tree

1 code implementation NeurIPS 2020 Ryoma Sato, Makoto Yamada, Hisashi Kashima

This study examines the time complexities of the unbalanced optimal transport problems from an algorithmic perspective for the first time.

Feature Robust Optimal Transport for High-dimensional Data

1 code implementation25 May 2020 Mathis Petrovich, Chao Liang, Ryoma Sato, Yanbin Liu, Yao-Hung Hubert Tsai, Linchao Zhu, Yi Yang, Ruslan Salakhutdinov, Makoto Yamada

To show the effectiveness of FROT, we propose using the FROT algorithm for the layer selection problem in deep neural networks for semantic correspondence.

feature selection Semantic correspondence +1

Volumization as a Natural Generalization of Weight Decay

no code implementations25 Mar 2020 Liu Ziyin, ZiHao Wang, Makoto Yamada, Masahito Ueda

We propose a novel regularization method, called \textit{volumization}, for neural networks.

Memorization

Fast local linear regression with anchor regularization

1 code implementation21 Feb 2020 Mathis Petrovich, Makoto Yamada

Regression is an important task in machine learning and data mining.

regression

Random Features Strengthen Graph Neural Networks

1 code implementation8 Feb 2020 Ryoma Sato, Makoto Yamada, Hisashi Kashima

Through experiments, we show that the addition of random features enables GNNs to solve various problems that normal GNNs, including the graph convolutional networks (GCNs) and graph isomorphism networks (GINs), cannot solve.

Graph Learning

Fast and Robust Comparison of Probability Measures in Heterogeneous Spaces

1 code implementation5 Feb 2020 Ryoma Sato, Marco Cuturi, Makoto Yamada, Hisashi Kashima

Building on \cite{memoli-2011}, who proposed to represent each point in each distribution as the 1D distribution of its distances to all other points, we introduce in this paper the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations.

Graph Matching Word Embeddings

FsNet: Feature Selection Network on High-dimensional Biological Data

2 code implementations23 Jan 2020 Dinesh Singh, Héctor Climente-González, Mathis Petrovich, Eiryo Kawakami, Makoto Yamada

Because a large number of parameters in the selection and reconstruction layers can easily result in overfitting under a limited number of samples, we use two tiny networks to predict the large, virtual weight matrices of the selection and reconstruction layers.

BIG-bench Machine Learning feature selection +1

GraphLIME: Local Interpretable Model Explanations for Graph Neural Networks

2 code implementations17 Jan 2020 Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, Dawei Yin, Yi Chang

In this paper, we propose GraphLIME, a local interpretable model explanation for graphs using the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear feature selection method.

Descriptive feature selection

Transformer Dissection: An Unified Understanding for Transformer's Attention via the Lens of Kernel

no code implementations IJCNLP 2019 Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov

This new formulation gives us a better way to understand individual components of the Transformer{'}s attention, such as the better way to integrate the positional embedding.

Machine Translation Translation

Kernel Stein Tests for Multiple Model Comparison

3 code implementations NeurIPS 2019 Jen Ning Lim, Makoto Yamada, Bernhard Schölkopf, Wittawat Jitkrittum

The first test, building on the post selection inference framework, provably controls the number of best models that are wrongly declared worse (false positive rate).

More Powerful Selective Kernel Tests for Feature Selection

1 code implementation14 Oct 2019 Jen Ning Lim, Makoto Yamada, Wittawat Jitkrittum, Yoshikazu Terada, Shigeyuki Matsui, Hidetoshi Shimodaira

An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent information to be used again after selection.

feature selection Selection bias

Tree-Wasserstein Barycenter for Large-Scale Multilevel Clustering and Scalable Bayes

no code implementations10 Oct 2019 Tam Le, Viet Huynh, Nhat Ho, Dinh Phung, Makoto Yamada

We study in this paper a variant of Wasserstein barycenter problem, which we refer to as tree-Wasserstein barycenter, by leveraging a specific class of ground metrics, namely tree metrics, for Wasserstein distance.

Clustering

Flow-based Alignment Approaches for Probability Measures in Different Spaces

1 code implementation10 Oct 2019 Tam Le, Nhat Ho, Makoto Yamada

By leveraging a tree structure, we propose to align \textit{flows} from a root to each support instead of pair-wise tree metrics of supports, i. e., flows from a support to another, in GW.

LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport

1 code implementation5 Sep 2019 Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang

To estimate the mutual information from data, a common practice is preparing a set of paired samples $\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n \stackrel{\mathrm{i. i. d.

BIG-bench Machine Learning Mutual Information Estimation

Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel

1 code implementation EMNLP 2019 Yao-Hung Hubert Tsai, Shaojie Bai, Makoto Yamada, Louis-Philippe Morency, Ruslan Salakhutdinov

This new formulation gives us a better way to understand individual components of the Transformer's attention, such as the better way to integrate the positional embedding.

Machine Translation Translation

Approximation Ratios of Graph Neural Networks for Combinatorial Problems

no code implementations NeurIPS 2019 Ryoma Sato, Makoto Yamada, Hisashi Kashima

We theoretically demonstrate that the most powerful GNN can learn approximation algorithms for the minimum dominating set problem and the minimum vertex cover problem with some approximation ratios with the aid of the theory of distributed local algorithms.

Feature Engineering

Learning to Sample Hard Instances for Graph Algorithms

1 code implementation26 Feb 2019 Ryoma Sato, Makoto Yamada, Hisashi Kashima

We propose HiSampler, the hard instance sampler, to model the hard instance distribution of graph algorithms.

Evolutionary Algorithms

Topological Bayesian Optimization with Persistence Diagrams

no code implementations26 Feb 2019 Tatsuya Shiraishi, Tam Le, Hisashi Kashima, Makoto Yamada

In this paper, we propose the topological Bayesian optimization, which can efficiently find an optimal solution from structured data using \emph{topological information}.

Bayesian Optimization Topological Data Analysis

Tree-Sliced Variants of Wasserstein Distances

2 code implementations NeurIPS 2019 Tam Le, Makoto Yamada, Kenji Fukumizu, Marco Cuturi

Optimal transport (\OT) theory defines a powerful set of tools to compare probability distributions.

Constant Time Graph Neural Networks

no code implementations23 Jan 2019 Ryoma Sato, Makoto Yamada, Hisashi Kashima

The recent advancements in graph neural networks (GNNs) have led to state-of-the-art performances in various applications, including chemo-informatics, question-answering systems, and recommender systems.

Graph Attention Question Answering +1

Learning Unsupervised Word Translations Without Adversaries

no code implementations EMNLP 2018 Tanmoy Mukherjee, Makoto Yamada, Timothy Hospedales

Word translation, or bilingual dictionary induction, is an important capability that impacts many multilingual language processing tasks.

Machine Translation Multilingual Word Embeddings +3

Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator

no code implementations ICLR 2019 Makoto Yamada, Denny Wu, Yao-Hung Hubert Tsai, Ichiro Takeuchi, Ruslan Salakhutdinov, Kenji Fukumizu

In the paper, we propose a post selection inference (PSI) framework for divergence measure, which can select a set of statistically significant features that discriminate two distributions.

Binary Classification Change Point Detection +1

Persistence Fisher Kernel: A Riemannian Manifold Kernel for Persistence Diagrams

1 code implementation NeurIPS 2018 Tam Le, Makoto Yamada

To deal with it, an emerged approach is to use kernel methods, and an appropriate geometry for PDs is an important factor to measure the similarity of PDs.

Deep Matching Autoencoders

no code implementations16 Nov 2017 Tanmoy Mukherjee, Makoto Yamada, Timothy M. Hospedales

In this paper we introduce Deep Matching Autoencoders (DMAE), which learn a common latent space and pairing from unpaired multi-modal data.

Image Captioning Representation Learning

Convex Coupled Matrix and Tensor Completion

no code implementations15 May 2017 Kishan Wimalawarne, Makoto Yamada, Hiroshi Mamitsuka

We propose a set of convex low rank inducing norms for a coupled matrices and tensors (hereafter coupled tensors), which shares information between matrices and tensors through common modes.

Interpreting Outliers: Localized Logistic Regression for Density Ratio Estimation

no code implementations21 Feb 2017 Makoto Yamada, Song Liu, Samuel Kaski

We propose an inlier-based outlier detection method capable of both identifying the outliers and explaining why they are outliers, by identifying the outlier-specific features.

Density Ratio Estimation Outlier Detection +2

Post Selection Inference with Kernels

no code implementations12 Oct 2016 Makoto Yamada, Yuta Umezu, Kenji Fukumizu, Ichiro Takeuchi

We propose a novel kernel based post selection inference (PSI) algorithm, which can not only handle non-linearity in data but also structured output such as multi-dimensional and multi-label outputs.

General Classification Multi-class Classification

Localized Lasso for High-Dimensional Regression

no code implementations22 Mar 2016 Makoto Yamada, Koh Takeuchi, Tomoharu Iwata, John Shawe-Taylor, Samuel Kaski

We introduce the localized Lasso, which is suited for learning models that are both interpretable and have a high predictive power in problems with high dimensionality $d$ and small sample size $n$.

regression Vocal Bursts Intensity Prediction

Convex Factorization Machine for Regression

1 code implementation4 Jul 2015 Makoto Yamada, Wenzhao Lian, Amit Goyal, Jianhui Chen, Kishan Wimalawarne, Suleiman A. Khan, Samuel Kaski, Hiroshi Mamitsuka, Yi Chang

We propose the convex factorization machine (CFM), which is a convex variant of the widely used Factorization Machines (FMs).

regression

Consistent Collective Matrix Completion under Joint Low Rank Structure

no code implementations5 Dec 2014 Suriya Gunasekar, Makoto Yamada, Dawei Yin, Yi Chang

We address the collective matrix completion problem of jointly recovering a collection of matrices with shared structure from partial (and potentially noisy) observations.

Matrix Completion

Multi-view Anomaly Detection via Probabilistic Latent Variable Models

no code implementations13 Nov 2014 Tomoharu Iwata, Makoto Yamada

We propose a nonparametric Bayesian probabilistic latent variable model for multi-view anomaly detection, which is the task of finding instances that have inconsistent views.

Anomaly Detection Bayesian Inference

High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso

no code implementations2 Feb 2012 Makoto Yamada, Wittawat Jitkrittum, Leonid Sigal, Eric P. Xing, Masashi Sugiyama

We first show that, with particular choices of kernel functions, non-redundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures.

feature selection Vocal Bursts Intensity Prediction

Relative Density-Ratio Estimation for Robust Distribution Comparison

no code implementations NeurIPS 2011 Makoto Yamada, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, Masashi Sugiyama

Divergence estimators based on direct approximation of density-ratios without going through separate approximation of numerator and denominator densities have been successfully applied to machine learning tasks that involve distribution comparison such as outlier detection, transfer learning, and two-sample homogeneity test.

Density Ratio Estimation Outlier Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.