Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

lukashoel/text2room 21 Mar 2023

We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input.

Monocular Depth Estimation

Zero-1-to-3: Zero-shot One Image to 3D Object

cvlab-columbia/zero123 20 Mar 2023

We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image.

3D Reconstruction Novel View Synthesis +1

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

picsart-ai-research/text2video-zero 23 Mar 2023

Recent text-to-video generation approaches rely on computationally heavy training and require large-scale video datasets.

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

modelscope/modelscope 15 Mar 2023

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data distribution.

Denoising Image Generation +1

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

dvlab-research/VoxelNeXt 20 Mar 2023

Our core insight is to predict objects directly based on sparse voxel features, without relying on hand-crafted proxies.

3D Object Detection object-detection

SHERF: Generalizable Human NeRF from a Single Image

skhu101/sherf 22 Mar 2023

To this end, we propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.

3D Human Reconstruction

A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

megvii-research/CVPR2023-DMVFN 17 Mar 2023

The performance of video prediction has been greatly boosted by advanced deep neural networks.

Video Prediction

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

microsoft/visual-chatgpt 8 Mar 2023

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

NeRF-LOAM: Neural Implicit Representation for Large-Scale Incremental LiDAR Odometry and Mapping

junyuandeng/nerf-loam 19 Mar 2023

To bridge this gap, in this paper, we propose a novel NeRF-based LiDAR odometry and mapping approach, NeRF-LOAM, consisting of three modules neural odometry, neural mapping, and mesh reconstruction.

