We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds.
Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images.
We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.
Ranked #1 on Video Matting on VideoMatte240K
In this paper, we present DeepSIM, a generative model for conditional image manipulation based on a single image.
We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT.
Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators.
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result.