We propose a human-in-the-loop PdM approach in which a machine learning system predicts future problems in sets of workstations (computers, laptops, and servers).
In this paper, we interpret these latent noise variables as implicit representations of simple and domain-agnostic data perturbations during training, producing BNNs that perform well under covariate shift due to input corruptions.
For example, confidence intervals become too narrow, which we demonstrate with a simple experiment.
Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human.
We consider the problem of creating assistants that can help agents - often humans - solve novel sequential decision problems, assuming the agent is not able to specify the reward function explicitly to the assistant.
Similarity metrics such as representational similarity analysis (RSA) and centered kernel alignment (CKA) have been used to compare layer-wise representations between neural networks.
Approximate Bayesian computation (ABC) is a popular likelihood-free inference method for models with intractable likelihood functions.
We propose a way of doing likelihood-free inference (LFI) of states and state prediction with a limited number of simulations.
In recent years, local differential privacy (LDP) has emerged as a technique of choice for privacy-preserving data collection in several scenarios when the aggregator is not trustworthy.
Parallel to LD, Stein variational gradient descent (SVGD) similarly minimizes the KL, albeit endowed with a novel Stein-Wasserstein distance, by deterministically transporting a set of particle samples, thus de-randomizes the stochastic diffusion process.
We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour, and use this data for training a meta-learner.
On the other hand, offline data about the behavior of the assisted agent might be available, but is non-trivial to take advantage of by methods such as offline reinforcement learning.
Recent machine learning advances have proposed black-box estimation of unknown continuous-time system dynamics directly from data.
Active learning is usually applied to acquire labels of informative data points in supervised learning, to maximize accuracy in a sample-efficient way.
In this paper, we present affine transport -- a variant of optimal transport, which models the mapping between state transition distributions between the source and target domains with an affine transformation.
More specifically, we sequentially integrate five different data sets, which have not all been combined in earlier bioinformatic methods for predicting drug responses.
We present d3p, a software package designed to help fielding runtime efficient widely-applicable Bayesian inference under differential privacy guarantees.
Human-in-the-loop machine learning is widely used in artificial intelligence (AI) to elicit labels for data points from experts or to provide feedback on how close the predicted results are to the target.
In model-based reinforcement learning efficiency is improved by learning to simulate the world dynamics.
Generalized linear models (GLMs) such as logistic regression are among the most widely used arms in data analyst's repertoire and often used on sensitive datasets.
We introduce implicit Bayesian neural networks, a simple and scalable approach for uncertainty representation in deep learning.
In this paper, we build upon representative GNNs and introduce variants that challenge the need for locality-preserving representations, either using randomization or clustering on the complement graph.
In this work, we present a method for differentially private data sharing by training a mixture model on vertically partitioned data, where each party holds different features for the same set of individuals.
In machine learning and computer vision, optimal transport has had significant success in learning generative models and defining metric distances between structured and stochastic data objects, that can be cast as probability measures.
In sequential machine teaching, a teacher's objective is to provide the optimal sequence of inputs to sequential learners in order to guide them towards the best model.
In this paper we combine additively homomorphic secure summation protocols with differential privacy in the so-called cross-silo federated learning setting.
In recent years, surrogate models have been successfully used in likelihood-free inference to decrease the number of simulator evaluations.
To verify this hypothesis, that humans steer and are able to improve performance by steering, we designed a function optimization task where a human and an optimization algorithm collaborate to find the maximum of a 1-dimensional function.
We apply conducive gradients to distributed stochastic gradient Langevin dynamics (DSGLD) and call the resulting method federated stochastic gradient Langevin dynamics (FSGLD).
We show that the combination of the two methods gives substantial improvements in the scalability of BMF on web-scale datasets, when the goal is to reduce the wall-clock time.
In many high dimensional classification or regression problems set in a biological context, the complete identification of the set of informative features is often as important as predictive accuracy, since this can provide mechanistic insight and conceptual understanding.
Encoding domain knowledge into the prior over the high-dimensional weight space of a neural network is challenging but essential in applications with limited data and weak signals.
To advance the possibilities for performing likelihood-free inference in high-dimensional parameter spaces, here we introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner which lends itself to an efficient exploration of the parameter space.
This is demonstrated in a user experiment in which the user feedback comes in the form of optimal position and orientation of a molecule adsorbing to a surface.
Differential privacy allows quantifying privacy loss resulting from accessing sensitive personal data.
The strengths of the probabilistic formulation, in addition to providing a bounded rational account of the learning of the heuristic, include natural extensibility with additional cognitively plausible constraints and prior information, and the possibility to embed the heuristic as a subpart of a larger probabilistic model.
Through experiments on real-word data sets, using decision trees as interpretable models and Bayesian additive regression models as reference models, we show that for the same level of interpretability, our approach generates more accurate models than the alternative of restricting the prior.
The identification and removal of contested edges adds no computational complexity to state-of-the-art graph-regularized matrix factorization, remaining linear with respect to the number of non-zeros.
Ranked #1 on Recommendation Systems on MovieLens 20M (RMSE metric, using extra training data)
Matrix completion aims to predict missing elements in a partially observed data matrix which in typical applications, such as collaborative filtering, is large and extremely sparsely observed.
We introduce the convolutional spectral kernel (CSK), a novel family of non-stationary, nonparametric covariance kernels for Gaussian process (GP) models, derived from the convolution between two imaginary radial basis functions.
Machine learning can help personalized decision support by learning models to predict individual treatment effects (ITE).
While MCMC methods have become a main work-horse for Bayesian inference, scaling them to large distributed datasets is still a challenge.
Learning predictive models from small high-dimensional data sets is a key problem in high-dimensional statistics.
Differentially private learning with genomic data is challenging because it is more difficult to guarantee the privacy in high dimensions.
Estimating global pairwise interaction effects, i. e., the difference between the joint effect and the sum of marginal effects of two input features, with uncertainty properly quantified, is centrally important in science applications.
Approximate Bayesian computation (ABC) methods can be used to sample from posterior distributions when the likelihood function is unavailable or intractable, as is often the case in biological systems.
We propose a novel deep learning paradigm of differential flows that learn a stochastic differential equation transformations of inputs prior to a standard classification or regression function.
We formulate this sequential teaching problem, which current techniques in machine teaching do not address, as a Markov decision process, with the dynamics nesting a model of the learner and the actions being the teacher's responses.
Flux analysis methods commonly place unrealistic assumptions on fluxes due to the convenience of formulating the problem as a linear programming model, and most methods ignore the notable uncertainty in flux estimates.
We propose a novel model family of zero-inflated Gaussian processes (ZiGP) for such zero-inflated datasets, produced by sparse kernels through learning a latent probit Gaussian process that can zero out kernel rows and columns whenever the signal is absent.
In human-in-the-loop machine learning, the user provides information beyond that in the training data.
2 code implementations • 2 Aug 2017 • Jarno Lintusaari, Henri Vuollekoski, Antti Kangasrääsiö, Kusti Skytén, Marko Järvenpää, Pekka Marttinen, Michael U. Gutmann, Aki Vehtari, Jukka Corander, Samuel Kaski
The stand-alone ELFI graph can be used with any of the available inference methods without modifications.
Predicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine.
Inverse reinforcement learning (IRL) aims to explain observed strategic behavior by fitting reinforcement learning models to behavioral data.
Many applications of machine learning, for example in health care, would benefit from methods that can guarantee privacy of data subjects.
Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals.
We introduce a novel kernel that models input-dependent couplings across multiple latent processes.
We propose an inlier-based outlier detection method capable of both identifying the outliers and explaining why they are outliers, by identifying the outlier-specific features.
Prediction in a small-sized sample with a large number of covariates, the "small n, large p" problem, is challenging.
The key idea is to use an interactive multidimensional-scaling (MDS) type scatterplot display of the features to elicit the similarity relationships, and then use the elicited relationships in the prior distribution of prediction parameters.
The main component of our approach is a user model that models the domain expert's knowledge of the relevance of different features for a prediction task.
An important problem for HCI researchers is to estimate the parameter values of a cognitive model from behavioral data.
The popular synthetic likelihood approach infers the parameters by modelling summary statistics of the data by a Gaussian probability distribution.
We introduce a brain-information interface used for recommending information by relevance inferred directly from brain signals.
We demonstrate that pathway-response associations can be learned by the proposed model for the well known EGFR and MEK inhibitors.
Good personalised predictions are vitally important in precision medicine, but genomic information on which the predictions are based is also particularly sensitive, as it directly identifies the patients and hence cannot easily be anonymised.
We consider regression under the "extremely small $n$ large $p$" condition, where the number of samples $n$ is so small compared to the dimensionality $p$ that predictors cannot be estimated without prior knowledge.
Hierarchical models are versatile tools for joint modeling of data sets arising from different, but related, sources.
We introduce the localized Lasso, which is suited for learning models that are both interpretable and have a high predictive power in problems with high dimensionality $d$ and small sample size $n$.
In this paper, we introduce the first method that (1) can complete kernel matrices with completely missing rows and columns as opposed to individual missing kernel values, (2) does not require any of the kernels to be complete a priori, and (3) can tackle non-linear kernels.
Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments.
The factorization is a generative model in which the display is parameterized as a part of the factorization and the other factors explain away the aspects not expressible in a two-dimensional display.
We propose a novel classification model for weak signal data, building upon a recent model for Bayesian multi-view learning, Group Factor Analysis (GFA).
We show that for dependency structures between groups to be expressible exactly, the data have to satisfy the so-called groupwise faithfulness assumption.
We present a novel approach for fully non-stationary Gaussian process regression (GPR), where all three key parameters -- noise variance, signal variance and lengthscale -- can be simultaneously input-dependent.
We propose the convex factorization machine (CFM), which is a convex variant of the widely used Factorization Machines (FMs).
For retrieval of gene expression experiments, we use a probabilistic model called product partition model, which induces a clustering of genes that show similar expression patterns across a number of samples.
This approach faces at least two major difficulties: The first difficulty is the choice of the discrepancy measure which is used to judge whether the simulated data resemble the observed data.
We introduce Bayesian multi-tensor factorization, a model that is the first Bayesian formulation for joint factorization of multiple matrices and tensors.
no code implementations • 27 Oct 2014 • Jussi Gillberg, Pekka Marttinen, Matti Pirinen, Antti J. Kangas, Pasi Soininen, Mehreen Ali, Aki S. Havulinna, Marjo-Riitta Marjo-Riitta Järvelin, Mika Ala-Korpela, Samuel Kaski
In high-dimensional data, structured noise caused by observed and unobserved factors affecting multiple target variables simultaneously, imposes a serious challenge for modeling, by masking the often weak signal.
It then retrieves images with a specialized online learning algorithm that balances the tradeoff between exploring new images and exploiting the already inferred interests of the user.
Increasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference.
A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative.
To incorporate this information in the retrieval task, we suggest employing a retrieval metric that utilizes probabilistic models learned from the measurements.
Results: In this paper, we present the first comprehensive multi-set analysis on how the chemical structure of drugs impacts on ge-nome-wide gene expression across several cancer cell lines (CMap database).
To facilitate the prediction of the weak effects, we constrain our model structure by introducing a novel Bayesian approach of sharing information between the regression model and the noise model.
We address the problem of retrieving relevant experiments given a query experiment, motivated by the public databases of datasets in molecular biology and other experimental sciences, and the need of scientists to relate to earlier work on the level of actual measurement data.
Our algorithm obtains the lowest Hamming loss values on 10 out of 14 multilabel classification data sets compared to five state-of-the-art multilabel learning algorithms.
Probe-level models have led to improved performance in microarray studies but the various sources of probe-level contamination are still poorly understood.