In this paper, we propose a self-supervised approach for tumor segmentation.
The objective of this work is to segment any arbitrary structures of interest (SOI) in 3D volumes by only annotating a single slice, (i. e. semi-automatic 3D segmentation).
We additionally evaluate on a challenging camouflage dataset (MoCA), significantly outperforming the other self-supervised approaches, and comparing favourably to the top supervised approach, highlighting the importance of motion cues, and the potential bias towards visual appearance in existing video segmentation models.
A central challenge for the task of semantic segmentation is the prohibitive cost of obtaining dense pixel-level annotations to supervise model training.
We show that our algorithm achieves state-of-the-art performance on the popular Flickr SoundNet dataset.
The popularisation of neural networks has seen incredible advances in pattern recognition, driven by the supervised learning of human annotations.
This paper tackles the problem of novel view synthesis (NVS) from 2D images without known camera poses and intrinsics.
We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020.
We make the following three contributions: (i) We propose a novel architecture that consists of two essential components for breaking camouflage, namely, a differentiable registration module to align consecutive frames based on the background, which effectively emphasises the object boundary in the difference image, and a motion segmentation module with memory that discovers the moving objects, while maintaining the object permanence even when motion is absent at some point.
The objective of this paper is visual-only self-supervised video representation learning.
Ranked #12 on Self-Supervised Action Recognition on HMDB51 (finetuned)
We present a method for retiming people in an ordinary, natural video---manipulating and editing the time in which different motions of individuals in the video occur.
We describe three use cases on the public IJB-C face verification benchmark: (i) to improve 1:1 image-based verification error rates by rejecting low-quality face images; (ii) to improve quality score based fusion performance on the 1:1 set-based verification benchmark; and (iii) its use as a quality measure for selecting high quality (unblurred, good lighting, more frontal) faces from a collection, e. g. for automatic enrolment or display.
The objective of this paper is self-supervised learning from video, in particular for representations for action recognition.
Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods.
Ranked #1 on Vehicle Re-Identification on VehicleID Small
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a. k. a.
Our goal is to collect a large-scale audio-visual dataset with low label noise from videos in the wild using computer vision techniques.
Recent interest in self-supervised dense tracking has yielded rapid progress, but performance still remains far from supervised methods.
The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data.
The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.
Ranked #21 on Self-Supervised Action Recognition on UCF101
We propose AutoCorrect, a method to automatically learn object-annotation alignments from a dataset with annotations affected by geometric noise.
Fourth, in order to shed light on the potential of self-supervised learning on the task of video correspondence flow, we probe the upper bound by training on additional data, \ie more diverse videos, further demonstrating significant improvements on video segmentation.
The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals.
The model achieves competitive performance on cell and crowd counting datasets, and surpasses the state-of-the-art on the car dataset using only three training images.
Our contributions are: (i) We propose a Deep Comparator Network (DCN) that can ingest a pair of sets (each may contain a variable number of images) as inputs, and compute a similarity between the pair--this involves attending to multiple discriminative local regions (landmarks), and comparing local descriptors between pairs of faces; (ii) To encourage high-quality representations for each set, internal competition is introduced for recalibration based on the landmark score; (iii) Inspired by image retrieval, a novel hard sample mining regime is proposed to control the sampling process, such that the DCN is complementary to the standard image classification models.
In this paper, we design a neural network architecture that learns to aggregate based on both "visual" quality (resolution, illumination), and "content" quality (relative importance for discriminative classification).
Ranked #5 on Face Verification on IJB-C (TAR @ FAR=0.01 metric)
Pixelwise segmentation of the left ventricular (LV) myocardium and the four cardiac chambers in 2-D steady state free precession (SSFP) cine sequences is an essential preprocessing step for a wide range of analyses.
The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise.
Ranked #3 on Face Verification on IJB-C (TAR @ FAR=0.01 metric)
Sonography synthesis has a wide range of applications, including medical procedure simulation, clinical training and multimodality image registration.
Feature tracking Cardiac Magnetic Resonance (CMR) has recently emerged as an area of interest for quantification of regional cardiac function from balanced, steady state free precession (SSFP) cine sequences.