Stochastic neural networks with discrete random variables are an important class of models for their expressiveness and interpretability.
We present WeakSTIL, an interpretable two-stage weak label deep learning pipeline for scoring the percentage of stromal tumor infiltrating lymphocytes (sTIL%) in H&E-stained whole-slide images (WSIs) of breast cancer tissue.
Learning the structure of a causal graphical model using both observational and interventional data is a fundamental problem in many scientific fields.
We propose a Deep learning-based weak label learning method for analysing whole slide images (WSIs) of Hematoxylin and Eosin (H&E) stained tumorcells not requiring pixel-level or tile-level annotations using Self-supervised pre-training and heterogeneity-aware deep Multiple Instance LEarning (DeepSMILE).
Federated learning (FL) has emerged as the predominant approach for collaborative training of neural network models across multiple users, without the need to gather the data at a central location.
In this work we propose a batch Bayesian optimization method for combinatorial problems on permutations, which is well suited for expensive cost functions on permutations.
In experiments, we demonstrate the improved sample efficiency of GP BO using FM kernels (BO-FM). On synthetic problems and hyperparameter optimization problems, BO-FM outperforms competitors consistently.
Variational autoencoders with deep hierarchies of stochastic layers have been known to suffer from the problem of posterior collapse, where the top layers fall back to the prior and become independent of input.
We further show that this change in orientation can be used to impose an additional motion constraint in Siamese tracking through imposing restriction on the change in orientation between two consecutive frames.
Classically, visual object tracking involves following a target object throughout a given video, and it provides us the motion trajectory of the object.
Our experiments show that SSC leads to an important increase in interaction recognition performance, while using much fewer parameters.
In this paper, we define data augmentation between point clouds as a shortest path linear interpolation.
Specifically, we present structured dropout to mimick the change in latent codes under occlusion.
We study the three properties of PIC and demonstrate its effectiveness in recognizing the long-range activities of Charades, Breakfast, and MultiThumos.
The Critic network is environmentally aware to prune trajectories that are in collision or are in violation with the environment.
Learning suitable latent representations for observed, high-dimensional data is an important research topic underlying many recent advances in machine learning.
We propose to model the effective receptive field of 2D convolution based on the scale and locality from the 3D neighborhood.
Adversarial training has been recently employed for realizing structured semantic segmentation, in which the aim is to preserve higher-level scene structural consistencies in dense predictions.
In response to this, Scellier & Bengio (2017) proposed Equilibrium Propagation - a method for gradient-based train- ing of neural networks which uses only local learning rules and, crucially, does not rely on neurons having a mechanism for back-propagating an error gradient.
We observe many continuous output problems in computer vision are naturally contained in closed geometrical manifolds, like the Euler angles in viewpoint estimation or the normals in surface normal estimation.
On this combinatorial graph, we propose an ARD diffusion kernel with which the GP is able to model high-order interactions between variables leading to better performance.
This paper focuses on the temporal aspect for recognizing human activities in videos; an important visual cue that has long been undervalued.
Neural network quantization has become an important research area due to its great impact on deployment of large models on resource constrained devices.
To demonstrate the effectiveness of our proposed framework, we modify associative domain adaptation to work well on source and target data batches with unequal class distributions.
A major challenge in Bayesian Optimization is the boundary issue (Swersky, 2017) where an algorithm spends too many evaluations near the boundary of its search space.
We introduce the OxUvA dataset and benchmark for evaluating single-object tracking algorithms.
In this work we propose a blackbox intervention method for visual dialog models, with the aim of assessing the contribution of individual linguistic or visual components.
This paper strives to track a target object in a video.
Ranked #12 on Referring Expression Segmentation on A2D Sentences
We present a variant on backpropagation for neural networks in which computation scales with the rate of change of the data - not the rate at which we process the data.
This is a powerful idea because it allows to convert any video to an image so that existing CNN models pre-trained for the analysis of still images can be immediately extended to videos.
On action classification, our method obtains 60. 3\% on the UCF101 dataset using only UCF101 data for training which is approximately 10% better than current state-of-the-art self-supervised learning methods.
Ranked #35 on Self-Supervised Action Recognition on UCF101
We present a new architecture for end-to-end sequence learning of actions in video, we call VideoLSTM.
We introduce the concept of dynamic image, a novel compact representation of videos useful for video analysis especially when convolutional neural networks (CNNs) are used.
Ranked #51 on Action Recognition on HMDB-51
In this paper we present a tracker, which is radically different from state-of-the-art trackers: we apply no model updating, no occlusion detection, no combination of trackers, no geometric matching, and still deliver state-of-the-art tracking performance, as demonstrated on the popular online tracking benchmark (OTB) and six very challenging YouTube videos.
Third, the start of the action is unknown, so it is unclear over what time window the information should be integrated.
We show how the parameters of a function that has been fit to the video data can serve as a robust new video representation.
We present a supervised learning to rank algorithm that effectively orders images by exploiting the structure in image sequences.
Undoing the image formation process and therefore decomposing appearance into its intrinsic properties is a challenging task due to the under-constraint nature of this inverse problem.
How can we reuse existing knowledge, in the form of available datasets, when solving a new and apparently unrelated target task from a set of unlabeled data?
We postulate that a function capable of ordering the frames of a video temporally (based on the appearance) captures well the evolution of the appearance within the video.
In this paper we aim for zero-shot classification, that is visual recognition of an unseen class by using knowledge transfer from known classes.