In recent years, several high-performance conversational systems have been proposed based on the Transformer encoder-decoder model.
We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance.
Ranked #1 on Video Matting on VideoMatte240K
The proposed model can generate photo-realistic portrait images with accurate movements according to intuitive modifications.
We propose 3DETR, an end-to-end Transformer based object detection model for 3D point clouds.
Ranked #5 on 3D Object Detection on ScanNetV2
Furthermore, based on LibFewShot, we provide comprehensive evaluations on multiple benchmark datasets with multiple backbone architectures to evaluate common pitfalls and effects of different training tricks.
Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images.