We then discuss the mutual information (MI) and pointwise mutual information (PMI), which depend on the ratio $P(A, B)/P(A)P(B)$, as measures of association.
Our objective is to locate and provide a unique identifier for each mouse in a cluttered home-cage environment through time, as a precursor to automated behaviour recognition for biological research.
We consider the problem of identifying the units of measurement in a data column that contains both numeric values and unit symbols in each row, e. g., "5. 2 l", "7 pints".
Existing methods for SFDA leverage entropy-minimization techniques which: (i) apply only to classification; (ii) destroy model calibration; and (iii) rely on the source model achieving a good level of feature-space class-separation in the target domain.
Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process.
In this note I study how the precision of a classifier depends on the ratio $r$ of positive to negative cases in the test set, as well as the classifier's true and false positive rates.
Real world datasets often contain entries with missing elements e. g. in a medical dataset, a patient is unlikely to have taken all possible diagnostic tests.
Next, we present a benchmark study where 14 algorithms are evaluated on each of the time series in the data set.
Dynamical system models (including RNNs) often lack the ability to adapt the sequence generation or prediction to a given context, limiting their real-world application.
We show experimentally that not only RVAE performs better than several state-of-the-art methods in cell outlier detection and repair for tabular data, but also that is robust against the initial hyper-parameter selection.
While label fusion from multiple noisy annotations is a well understood concept in data wrangling (tackled for example by the Dawid-Skene (DS) model), we consider the extended problem of carrying out learning when the labels themselves are not consistently annotated with the same schema.
Time series models such as dynamical systems are frequently fitted to a cohort of data, ignoring variation between individual entities such as patients.
We develop a Learning Direct Optimization (LiDO) method for the refinement of a latent variable model that describes input image x.
In addition, we can use these inversion models to estimate the mutual information between a model's inputs and its intermediate representations, thus quantifying the amount of information preserved by the network at different stages.
We show how to calculate exactly the latent posterior distribution for the factor analysis (FA) model in the presence of missing data, and note that this solution implies that a different encoder network is required for each pattern of missingness.
Model criticism is usually carried out by assessing if replicated data generated under the fitted model looks similar to the observed data, see e. g. Gelman, Carlin, Stern, and Rubin [2004, p. 165].
Bedside monitors in Intensive Care Units (ICUs) frequently sound incorrectly, slowing response times and desensitising nurses to alarms (Chambrin, 2001), causing true alarms to be missed (Hug et al., 2011).
We present a non-linear dynamical system for modelling the effect of drug infusions on the vital signs of patients admitted in Intensive Care Units (ICUs).
This paper presents a new probabilistic generative model for image segmentation, i. e. the task of partitioning an image into homogeneous regions.
We present a Discriminative Switching Linear Dynamical System (DSLDS) applied to patient monitoring in Intensive Care Units (ICUs).
Large astronomical databases obtained from sky surveys such as the SuperCOSMOS Sky Surveys (SSS) invariably suffer from a small number of spurious records coming from artefactual effects of the telescope, satellites and junk objects in orbit around earth and physical defects on the photographic plate or CCD.