This paper deals with the problem of audio source separation.
We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera.
Ranked #1 on
Image Generation
on ARKitScenes
By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.
Ranked #2 on
Video Instance Segmentation
on OVIS validation
Source separation models either work on the spectrogram or waveform domain.
Ranked #1 on
Music Source Separation
on MUSDB18
Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.
Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.
Ranked #1 on
Zero-Shot Action Recognition
on Kinetics
We train a family of large language models, called CodeGen, on natural language and programming language data.
Ranked #1 on
Program Synthesis
on HumanEval
Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.
Ranked #1 on
Novel View Synthesis
on Mip-NeRF 360
This paper inherits a strong and simple image restoration model, NAFNet, for single-view feature extraction and extends it by adding cross attention modules to fuse features between views to adapt to binocular scenarios.
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Ranked #1 on
Real-Time Object Detection
on COCO