This is because coordinate descent iteratively updates all the parameters in the objective until convergence.
Since the supports may have various granularities depending on attributes (e. g., poverty rate and crime rate), modeling such data is not straightforward.
With the proposed method, the OoD detection is performed by density estimation in a latent space.
Existing analysis is limited to the Bayesian setting, which assumes a correct model and exact Bayesian posterior distribution.
Speaker diarization has been investigated extensively as an important central task for meeting analysis.
In the proposed model, both the prediction and explanation for each sample are performed using an easy-to-interpret locally linear model.
In this study, we formulate the "Evacuation Shelter Scheduling Problem," which allocates evacuees to shelters in such a way to minimize the movement costs of the evacuees and the operation costs of the shelters.
In experiments using three text document datasets, we demonstrate that the proposed method achieves better BO performance than the existing methods.
We propose a few-shot learning method for unsupervised feature selection, which is a task to select a subset of relevant features in unlabeled data.
The closed-form solution enables fast and effective adaptation to a few instances, and its differentiability enables us to train our model such that the expected test error for relative DRE can be explicitly minimized after adapting to a few instances.
The neural network is meta-learned such that the expected imputation error is minimized when the factorized matrices are adapted to each matrix by a maximum a posteriori (MAP) estimation.
First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function.
Hawkes processes offer a central tool for modeling the diffusion processes, in which the influence from the past events is described by the triggering kernel.
The proposed method trains the neural networks such that the expected test likelihood is improved when topic model parameters are estimated by maximizing the posterior probability using the priors based on the EM algorithm.
It is crucial to provide an inter-sentence context in Neural Machine Translation (NMT) models for higher-quality translation.
We propose a meta-learning method that train neural networks for obtaining representations such that clustering performance improves when the representations are clustered by the variational Bayesian (VB) inference with an infinite Gaussian mixture model.
With a meta-learning framework, quick adaptation to each task and its effective backpropagation are important since the model is trained by the adaptation for each epoch.
With the proposed method, a representation of a given short time-series is obtained by a bidirectional LSTM for extracting its properties.
We theoretically and experimentally confirm that the weight loss landscape becomes sharper as the magnitude of the noise of adversarial training increases in the linear logistic regression model.
With our proposed method, the forecast error is backpropagated through the neural networks and the spectral decomposition, enabling end-to-end learning of Koopman spectral analysis.
We propose a heterogeneous meta-learning method that trains a model on tasks with various attribute spaces, such that it can solve unseen tasks whose attribute spaces are different from the training tasks given a few labeled instances.
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus (e. g. a few hundred sentence pairs).
In this study, we propose a new framework in which OT is considered as a maximum a posteriori (MAP) solution of a probabilistic generative model.
Additionally, the prediction is interpretable because it is obtained by the inner product between the simplified representations and the sparse weights, where only a small number of weights are selected by our gate module in the NGSLL.
To learn node embeddings specialized for anomaly detection, in which there is a class imbalance due to the rarity of anomalies, the parameters of a GCN are trained to minimize the volume of a hypersphere that encloses the node embeddings of normal instances while embedding anomalous ones outside the hypersphere.
The proposed method can infer the anomaly detectors for target domains without re-training by introducing the concept of latent domain vectors, which are latent representations of the domains and are used for inferring the anomaly detectors.
We model the anomaly score function by a neural network-based unsupervised anomaly detection method, e. g., autoencoders.
By deriving the posterior GP, we can predict the data value at any location point by considering the spatial correlations and the dependences between areal data sets, simultaneously.
Recently, a variety of unsupervised methods have been proposed that map pre-trained word embeddings of different languages into the same space without any parallel data.
Though many point processes have been proposed to model events in a continuous spatio-temporal space, none of them allow for the consideration of the rich contextual factors that affect event occurrence, such as weather, social activities, geographical characteristics, and traffic.
We propose a supervised anomaly detection method based on neural density estimators, where the negative log likelihood is used for the anomaly score.
Since our approach becomes able to reconstruct the normal data points accurately and fails to reconstruct the known and unknown anomalies, it can accurately discriminate both known and unknown anomalies from normal data points.
In this paper, we propose a method to learn a function that outputs regulation effects given the current traffic situation as inputs.
With the proposed model, a distribution for each auxiliary data set on the continuous space is modeled using a Gaussian process, where the representation of uncertainty considers the levels of granularity.
However, KL divergence with the aggregated posterior cannot be calculated in a closed form, which prevents us from using this optimal prior.
The proposed model contains bidirectional LSTMs that perform as forward and backward language models, and these networks are shared among all the languages.
The proposed method can infer appropriate domain-specific models without any semantic descriptors by introducing the concept of latent domain vectors, which are latent representations for the domains and are used for inferring the models.
In this paper, we propose a simple but effective method for training neural networks with a limited amount of training data.
We propose a simple method that combines neural networks and Gaussian processes.
With the proposed model, all views of a non-anomalous instance are assumed to be generated from a single latent vector.
We introduce the localized Lasso, which is suited for learning models that are both interpretable and have a high predictive power in problems with high dimensionality $d$ and small sample size $n$.
We propose a kernel-based method for finding matching between instances across different domains, such as multilingual documents and images with annotations.
With the latent SMM, a latent vector is associated with each vocabulary term, and each document is represented as a distribution of the latent vectors for words appearing in the document.
We propose a nonparametric Bayesian probabilistic latent variable model for multi-view anomaly detection, which is the task of finding instances that have inconsistent views.
We propose a new probabilistic model for analyzing dynamic evolutions of relational data, such as additions, deletions and split & merge, of relation clusters like communities in social networks.
We propose a probabilistic topic model for analyzing and extracting content-related annotations from noisy annotated discrete data such as web pages stored in social bookmarking services.