We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e. g., outdoors) when we only have labeled images taken under very different conditions (e. g., indoors).
To mitigate this problem, we propose the Self-Training Multimodal Vehicle Detection Network (ST-MVDNet) which leverages a Teacher-Student mutual learning framework and a simulated sensor noise model used in strong data augmentation for Lidar and Radar.
To mitigate this problem, we propose a teacher-student framework named Adaptive Teacher (AT) which leverages domain adversarial learning and weak-strong data augmentation to address the domain gap.
This enables the student model to capture domain-invariant features.
We address the problem of estimating the 3D pose of a network of cameras for large-environment wide-baseline scenarios, e. g., cameras for construction sites, sports stadiums, and public spaces.
We propose a inter-tracklet (person to person) attention mechanism that learns a representation for a target tracklet while taking into account other tracklets across multiple views.
Learning interpretable and interpolatable latent representations has been an emerging research direction, allowing researchers to understand and utilize the derived latent space for further applications such as visual synthesis or recognition.
As a result, our approach is able to augment the labeled training data in the semi-supervised setting.
Video summarization is among challenging tasks in computer vision, which aims at identifying highlight frames or shots over a lengthy video input.
To tackle the re-ID problem in the context of clothing changes, we propose a novel representation learning model which is able to generate a body shape feature representation without being affected by clothing color or patterns.
Person re-identification (re-ID) aims at matching images of the same person across camera views.
Person re-identification (re-ID) aims at recognizing the same person from images taken across different cameras.
Ranked #16 on Unsupervised Domain Adaptation on Market to Duke
Person re-identification (re-ID) aims at matching images of the same identity across camera views.
Moreover, the extension of our model for semi-supervised re-ID further confirms the scalability of our proposed method for real-world scenarios and applications.
Audio-visual event localization requires one to identify theevent which is both visible and audible in a video (eitherat a frame or video level).
Person re-identification (Re-ID) aims at recognizing the same person from images taken across different cameras.
Ranked #19 on Unsupervised Domain Adaptation on Duke to Market
We also conduct a partial flow experiment which shows the feasibility of real-time detection and a zero-shot learning experiment which justifies the generalization capability of deep learning in cyber security.