We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
Ranked #1 on Image Generation on ARKitScenes
This paper deals with the problem of audio source separation.
By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.
Ranked #2 on Video Instance Segmentation on OVIS validation
Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.
Ranked #1 on Zero-Shot Action Recognition on Kinetics
Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.
Ranked #1 on Novel View Synthesis on Mip-NeRF 360
It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Ranked #1 on Real-Time Object Detection on COCO
This paper inherits a strong and simple image restoration model, NAFNet, for single-view feature extraction and extends it by adding cross attention modules to fuse features between views to adapt to binocular scenarios.
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices.