The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.
We propose Deep Patch Visual Odometry (DPVO), a new deep learning system for monocular Visual Odometry (VO).
In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror.
This paper deals with the problem of audio source separation.
We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices.
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Ranked #1 on
Real-Time Object Detection
on COCO
Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.
Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views.
Ranked #1 on
Novel View Synthesis
on Mip-NeRF 360
We introduce Ivy, a templated Deep Learning (DL) framework which abstracts existing DL frameworks.
We propose to use pretraining to boost general image-to-image translation.
Ranked #1 on
Sketch-to-Image Translation
on COCO-Stuff