The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.
Source separation models either work on the spectrogram or waveform domain.
Ranked #1 on
Music Source Separation
on MUSDB18
This paper deals with the problem of audio source separation.
Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.
Ranked #1 on
Novel View Synthesis
on Mip-NeRF 360
We propose to use pretraining to boost general image-to-image translation.
Ranked #1 on
Sketch-to-Image Translation
on COCO-Stuff
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices.
Extensive experiments demonstrate that our approach is effective and can be generalized to different video recognition scenarios.
Ranked #1 on
Zero-Shot Action Recognition
on Kinetics
It is a challenging problem because a target moment may take place in the context of other temporal moments in the untrimmed video.
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror.