In this paper, we present ExtremeBERT, a toolkit for accelerating and customizing BERT pretraining.
Approximating radiance fields with volumetric grids is one of promising directions for improving NeRF, represented by methods like Plenoxels and DVGO, which achieve super-fast training convergence and real-time rendering.
We propose to apply chain rule on the learned gradients, and back-propagate the score of a diffusion model through the Jacobian of a differentiable renderer, which we instantiate to be a voxel radiance field.
Instead of trying to process more frames at once like most existing methods, we propose to process videos in an online fashion and cache "memory" at each iteration.
Ranked #2 on Action Anticipation on EPIC-KITCHENS-100 (using extra training data)
Through our experiments, we demonstrate that VD and its underlying framework have the following merits: a) VD handles all subtasks with competitive quality; b) VD initiates novel extensions and applications such as disentanglement of style and semantic, image-text dual-guided generation, etc.
To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels.
In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series.
Ranked #29 on Real-Time Object Detection on COCO