Markerless estimation of 3D Kinematics has the great potential to clinically diagnose and monitor movement disorders without referrals to expensive motion capture labs; however, current approaches are limited by performing multiple de-coupled steps to estimate the kinematics of a person from videos.
For unsupervised semantic segmentation of urban scenes, our method surpasses the previous state-of-the-art baseline by +7. 14% in mIoU on Cityscapes and +6. 65% on KITTI.
Binary Neural Networks (BNNs) are receiving an upsurge of attention for bringing power-hungry deep learning towards edge devices.
However, the blackbox nature of deep learning models hampers urban planners to understand what landscape objects contribute to a particularly high quality or low quality urban space perception.
Existing work can robustly measure heart rate under some degree of motion by face tracking.
Our work is the first to evaluate IoU with humans and makes it clear that relying on IoU scores alone to evaluate localization errors might not be sufficient.
Mean squared error (MSE) is one of the most widely used metrics to expression differences between multi-dimensional entities, including images.
Our result credits the best accuracy to the ResNet-101 model pre-trained on the Landmarks dataset for both verification and retrieval tasks by 84% and 24%, respectively.
Ranked #2 on Image Classification on AmsterTime (using extra training data)
The second edition of the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" challenges featured five data-impaired challenges, where models are trained from scratch on a reduced number of training samples for various key computer vision tasks.
Experiments on both synthetic and real-world datasets show the benefit of our proposed changes for improved data efficiency and inference speed.
While domain adaptation is generally applied on completely synthetic source domains and real target domains, we explore how domain adaptation can be applied when only a single rare class is augmented with simulated samples.
Additionally, due to crowdedness and occlusion in the videos, aligning the identity of runners across multiple disjoint cameras is a challenge.
We present the first edition of "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" challenges.
We make the observation that pruning weights adds the value 0 as an additional symbol and thus increases the information capacity of the network.
We introduce a refined and efficient real-time rPPG pipeline with novel filtering and motion suppression that not only estimates heart rates, but also extracts the pulse waveform to time heart beats and measure heart rate variability.
This layer is shown to minimize a penalized term of the Wasserstein distance between the learned continuous image features and the optimal half-half bit distribution.
1 code implementation • 1 Dec 2020 • Burak Yildiz, Hayley Hung, Jesse H. Krijthe, Cynthia C. S. Liem, Marco Loog, Gosia Migut, Frans Oliehoek, Annibale Panichella, Przemyslaw Pawelczak, Stjepan Picek, Mathijs de Weerdt, Jan van Gemert
We present ReproducedPapers. org: an open online repository for teaching and structuring machine learning reproducibility.
Current weakly supervised object localization and segmentation rely on class-discriminative visualization techniques to generate pseudo-labels for pixel-level training.
In this work, we define DIRs employed by existing works in probabilistic terms and show that by learning DIRs, overly strict requirements are imposed concerning the invariance.
Such methods are less stable than BN as they critically depend on the statistics of a single input sample.
Our study investigates the subjective human factor in comparisons of state of the art results and scientific reproducibility in deep learning.
In this paper, we run two methods of explanation, namely LIME and Grad-CAM, on a convolutional neural network trained to label images with the LEGO bricks that are visible in them.
Our results confirm the problems of the previous evaluation protocols, and suggest that an IA-based protocol is more adequate to the online scenario.
The problem of Online Human Behaviour Recognition in untrimmed videos, aka Online Action Detection (OAD), needs to be revisited.
Different from conventional VPR settings where the query images and gallery images come from the same domain, we propose a more common but challenging setup where the query images are collected under a new unseen condition.
To this end, we identify two methods for runner identification at different points of the event, for determining their trajectory.
Remote photo-plethysmography (rPPG) uses a remotely placed camera to estimating a person's heart rate (HR).
(ii) A pretrained semantic segmentation model is used to label objects in pixel level, and then we introduce statistical measures to quantitatively evaluate the interpretability of discriminate objects.
We propose ViDeNN: a CNN for Video Denoising without prior knowledge on the noise distribution (blind denoising).
Ranked #2 on Color Image Denoising on CBSD68 sigma5
To facilitate this, we propose a novel global pooling technique called Spatial Pyramid Averaged Max (SPAM) pooling for training this CAM-based network for object extent localisation with only weak image-level supervision.
There is an inherent need for autonomous cars, drones, and other robots to have a notion of how their environment behaves and to anticipate changes in the near future.
First, inspired by selective search for object proposals, we introduce an approach to generate action proposals from spatiotemporal super-voxels in an unsupervised manner, we call them Tubelets.
Our approach significantly outperforms the state-of-the-art on both datasets, while restricting the search of actions to a fraction of possible bounding box sequences.