Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

lukashoel/text2room 21 Mar 2023

We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.

Monocular Depth Estimation

277
5.16 stars / hour

Zero-1-to-3: Zero-shot One Image to 3D Object

cvlab-columbia/zero123 20 Mar 2023

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.

3D Reconstruction Novel View Synthesis +1

681
3.83 stars / hour
13,837
2.84 stars / hour

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

dvlab-research/VoxelNeXt 20 Mar 2023

Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.

3D Object Detection object-detection

206
2.76 stars / hour

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

megvii-research/CVPR2023-DMVFN 17 Mar 2023

The performance of video prediction has been greatly boosted by advanced deep neural networks.

Video Prediction

141
2.29 stars / hour

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

modelscope/modelscope 15 Mar 2023

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Denoising Image Generation +1

1,447
2.26 stars / hour

Learning Context-aware Classifier for Semantic Segmentation

Pointcept/Pointcept 21 Mar 2023

Semantic segmentation is still a challenging task for parsing diverse contexts in different scenes, thus the fixed classifier might not be able to well address varying feature distributions during testing.

Semantic Segmentation

115
1.93 stars / hour

NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping

junyuandeng/nerf-loam 19 Mar 2023

To bridge this gap, in this paper, we propose a novel NeRF-based LiDAR odometry and mapping approach, NeRF-LOAM, consisting of three modules neural odometry, neural mapping, and mesh reconstruction.

123
1.91 stars / hour

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

microsoft/visual-chatgpt 8 Mar 2023

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

25,924
1.65 stars / hour

CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition

deeptibhegde/clip-goes-3d 20 Mar 2023

Attempting to train the visual and text encoder to account for this shift results in catastrophic forgetting and a notable decrease in performance.

Retrieval Scene Understanding

81
1.27 stars / hour