Camera pose estimation is a long-standing computer vision problem that to date often relies on classical methods, such as handcrafted keypoint matching, RANSAC and bundle adjustment.
Based on this observation, we propose a new learning strategy named False Negative Aware Contrastive (FNAC) to mitigate the problem of misleading the training with such false negative samples.
To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
Generative models make huge progress to the photorealistic image synthesis in recent years.
Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
Based on this observation, we present a Vicinity Attention that introduces a locality bias to vision transformers with linear complexity.
We further propose to align the spatial awareness of G with the attention map induced from D. Through this way we effectively lessen the information gap between D and G. Extensive results show that our method pushes the two-player game in GANs closer to the equilibrium, leading to a better synthesis performance.
The task of semi-supervised video object segmentation (VOS) has been greatly advanced and state-of-the-art performance has been made by dense matching-based methods.
Two-view structure-from-motion (SfM) is the cornerstone of 3D reconstruction and visual SLAM.
Ranked #16 on Monocular Depth Estimation on KITTI Eigen split
Learning matching costs has been shown to be critical to the success of the state-of-the-art deep stereo matching methods, in which 3D convolutions are applied on a 4D feature volume to learn a 3D cost volume.
In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning.