Zero-1-to-3: Zero-shot One Image to 3D Object

cvlab-columbia/zero123 20 Mar 2023

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.

3D Reconstruction Novel View Synthesis +1

510
4.75 stars / hour

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

dvlab-research/VoxelNeXt 20 Mar 2023

Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.

3D Object Detection object-detection

169
4.09 stars / hour

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

megvii-research/CVPR2023-DMVFN 17 Mar 2023

The performance of video prediction has been greatly boosted by advanced deep neural networks.

Video Prediction

105
3.15 stars / hour

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

lukashoel/text2room 21 Mar 2023

We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.

Monocular Depth Estimation

72
2.71 stars / hour

NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping

junyuandeng/nerf-loam 19 Mar 2023

To bridge this gap, in this paper, we propose a novel NeRF-based LiDAR odometry and mapping approach, NeRF-LOAM, consisting of three modules neural odometry, neural mapping, and mesh reconstruction.

89
2.52 stars / hour
12,640
2.49 stars / hour

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

modelscope/modelscope 15 Mar 2023

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Denoising Image Generation +1

1,274
1.71 stars / hour

CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition

deeptibhegde/clip-goes-3d 20 Mar 2023

Attempting to train the visual and text encoder to account for this shift results in catastrophic forgetting and a notable decrease in performance.

Retrieval Scene Understanding

62
1.68 stars / hour

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

microsoft/visual-chatgpt 8 Mar 2023

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

25,270
1.62 stars / hour

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

chenyangqiqi/fatezero 16 Mar 2023

We also have a better zero-shot shape-aware editing ability based on the text-to-video model.

Video Editing

311
1.45 stars / hour