We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds.
Ranked #5 on 3D Object Detection on ScanNetV2
We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.
Ranked #1 on Video Matting on VideoMatte240K
Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images.
Annotating a qualitative large-scale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, low-quality facial images, and the subjectiveness of annotators.
We introduce RAFT-Stereo, a new deep architecture for rectified stereo based on the optical flow network RAFT.
In this paper, we present DeepSIM, a generative model for conditional image manipulation based on a single image.
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates the image feature map into the object detection result.