Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations.
Drawing images of characters at desired poses is an essential but laborious task in anime production.
Then, Next Hybrid Strategy (NHS) is designed to stack NCB and NTB in an efficient hybrid paradigm, which boosts performance in various downstream tasks.
We propose Deep Patch Visual Odometry (DPVO), a new deep learning system for monocular Visual Odometry (VO).
In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views.
Ranked #1 on
Generalizable Novel View Synthesis
on ZJU-MoCap
The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.
Nowadays, with the rapid development of IoT (Internet of Things) and CPS (Cyber-Physical Systems) technologies, big spatiotemporal data are being generated from mobile phones, car navigation systems, and traffic sensors.
This paper deals with the problem of audio source separation.
YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56. 8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100.
Ranked #1 on
Real-Time Object Detection
on COCO
In this paper, we introduce the new task of reconstructing 3D human pose from a single image in which we can see the person and the person's image through a mirror.