This paper deals with the problem of audio source separation.
Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.
Ranked #1 on Novel View Synthesis on Mip-NeRF 360
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices.
Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.
Ranked #1 on Zero-Shot Action Recognition on Kinetics
It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror.