Search Results for author: Shenlong Wang

Found 72 papers, 22 papers with code

DRAWER: Digital Reconstruction and Articulation With Environment Realism

no code implementations21 Apr 2025 Hongchi Xia, Entong Su, Marius Memmel, Arhan Jain, Raymond Yu, Numfor Mbiziwo-Tiapo, Ali Farhadi, Abhishek Gupta, Shenlong Wang, Wei-Chiu Ma

Creating virtual digital replicas from real-world data unlocks significant potential across domains like gaming and robotics.

PhysGen3D: Crafting a Miniature Interactive World from a Single Image

no code implementations26 Mar 2025 Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, Shenlong Wang

Envisioning physically plausible outcomes from a single image requires a deep understanding of the world's dynamics.

GigaSLAM: Large-Scale Monocular SLAM with Hierachical Gaussian Splats

1 code implementation11 Mar 2025 Kai Deng, Jian Yang, Shenlong Wang, Jin Xie

Consequently, GigaSLAM delivers high-precision tracking and visually faithful rendering on urban outdoor benchmarks, establishing a robust SLAM solution for large-scale, long-term scenarios, and significantly extending the applicability of Gaussian Splatting SLAM systems to unbounded outdoor environments.

3DGS NeRF

Map Space Belief Prediction for Manipulation-Enhanced Mapping

no code implementations28 Feb 2025 Joao Marcos Correia Marques, Nils Dengler, Tobias Zaenker, Jesper Mucke, Shenlong Wang, Maren Bennewitz, Kris Hauser

To tackle this, we define a POMDP whose belief is summarized by a metric-semantic grid map and propose a novel framework that uses neural networks to perform map-space belief updates to reason efficiently and simultaneously about object geometries, locations, categories, occlusions, and manipulation physics.

Decision Making Under Uncertainty Prediction

LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh

no code implementations13 Feb 2025 Jing Wen, Alexander G. Schwing, Shenlong Wang

Generalizable rendering of an animatable human avatar from sparse inputs relies on data priors and inductive biases extracted from training on large data to avoid scene-specific optimization and to enable fast reconstruction.

CropCraft: Inverse Procedural Modeling for 3D Reconstruction of Crop Plants

no code implementations14 Nov 2024 Albert J. Zhai, Xinlei Wang, Kaiyuan Li, Zhao Jiang, Junxiong Zhou, Sheng Wang, Zhenong Jin, Kaiyu Guan, Shenlong Wang

The ability to automatically build 3D digital twins of plants from images has countless applications in agriculture, environmental science, robotics, and other fields.

3D Reconstruction Bayesian Optimization

AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

1 code implementation4 Nov 2024 Hao-Yu Hsu, Zhi-Hao Lin, Albert Zhai, Hongchi Xia, Shenlong Wang

Modern visual effects (VFX) software has made it possible for skilled artists to create imagery of virtually anything.

Code Generation Video Editing

PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation

1 code implementation27 Sep 2024 Shaowei Liu, Zhongzheng Ren, Saurabh Gupta, Shenlong Wang

We present PhysGen, a novel image-to-video generation method that converts a single image and an input condition (e. g., force and torque applied to an object in the image) to produce a realistic, physically plausible, and temporally consistent video.

Image to Video Generation

Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB

no code implementations24 Sep 2024 Jae Yong Lee, Yuqun Wu, Chuhang Zou, Derek Hoiem, Shenlong Wang

The goal of this paper is to encode a 3D scene into an extremely compact representation from 2D images and to enable its transmittance, decoding and rendering in real-time across various platforms.

Towards Energy-Efficiency by Navigating the Trilemma of Energy, Latency, and Accuracy

no code implementations6 Sep 2024 Boyuan Tian, Yihan Pang, Muhammad Huzaifa, Shenlong Wang, Sarita Adve

We identify a Pareto-optimal curve and show that the designs on the curve are achievable only through synergistic co-optimization of all three optimization classes and by considering the latency and accuracy needs of downstream scene reconstruction consumers.

SuperGaussian: Repurposing Video Models for 3D Super Resolution

no code implementations2 Jun 2024 Yuan Shen, Duygu Ceylan, Paul Guerrero, Zexiang Xu, Niloy J. Mitra, Shenlong Wang, Anna Frühstück

We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models.

Super-Resolution

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

no code implementations15 Apr 2024 Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang

Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes.

NeRF

MonoPatchNeRF: Improving Neural Radiance Fields with Patch-based Monocular Guidance

no code implementations12 Apr 2024 Yuqun Wu, Jae Yong Lee, Chuhang Zou, Shenlong Wang, Derek Hoiem

The latest regularized Neural Radiance Field (NeRF) approaches produce poor geometry and view extrapolation for large scale sparse view scenes, such as ETH3D.

NeRF Novel View Synthesis +1

LidarDM: Generative LiDAR Simulation in a Generated World

1 code implementation3 Apr 2024 Vlas Zyrianov, Henry Che, Zhijian Liu, Shenlong Wang

We present LidarDM, a novel LiDAR generative model capable of producing realistic, layout-aware, physically plausible, and temporally coherent LiDAR videos.

Autonomous Driving Point Cloud Generation

RoboEXP: Action-Conditioned Scene Graph via Interactive Exploration for Robotic Manipulation

1 code implementation23 Feb 2024 Hanxiao Jiang, Binghao Huang, Ruihai Wu, Zhuoran Li, Shubham Garg, Hooshang Nayyeri, Shenlong Wang, Yunzhu Li

We introduce the novel task of interactive scene exploration, wherein robots autonomously explore environments and produce an action-conditioned scene graph (ACSG) that captures the structure of the underlying environment.

IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images

no code implementations23 Jan 2024 Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li, Zhao Dong, Christian Richardt, Tuotuo Li, Michael Zollhöfer, Johannes Kopf, Shenlong Wang, Changil Kim

Our results show that IRIS effectively recovers HDR lighting, accurate material, and plausible camera response functions, supporting photorealistic relighting and object insertion.

3D geometry 3D Reconstruction +2

Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

1 code implementation17 Jan 2024 Benjamin Ummenhofer, Sanskar Agrawal, Rene Sepulveda, Yixing Lao, Kai Zhang, Tianhang Cheng, Stephan Richter, Shenlong Wang, German Ros

Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis.

Inverse Rendering Novel View Synthesis

Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video

no code implementations CVPR 2024 Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang

Creating high-quality and interactive virtual environments such as games and simulators often involves complex and costly manual modeling processes.

NeRF

On the Overconfidence Problem in Semantic 3D Mapping

1 code implementation16 Nov 2023 Joao Marcos Correia Marques, Albert Zhai, Shenlong Wang, Kris Hauser

Semantic 3D mapping, the process of fusing depth and image segmentation information between multiple views to build 3D maps annotated with object classes in real-time, is a recent topic of interest.

Image Segmentation Semantic Segmentation

MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models

no code implementations ICCV 2023 Xiyue Zhu, Vlas Zyrianov, Zhijian Liu, Shenlong Wang

Despite tremendous advancements in bird's-eye view (BEV) perception, existing models fall short in generating realistic and coherent semantic map layouts, and they fail to account for uncertainties arising from partial sensor information (such as occlusion or limited coverage).

UrbanIR: Large-Scale Urban Scene Inverse Rendering from a Single Video

no code implementations15 Jun 2023 Chih-Hao Lin, Bohan Liu, Yi-Ting Chen, Kuan-Sheng Chen, David Forsyth, Jia-Bin Huang, Anand Bhattad, Shenlong Wang

We present UrbanIR (Urban Scene Inverse Rendering), a new inverse graphics model that enables realistic, free-viewpoint renderings of scenes under various lighting conditions with a single video.

Inverse Rendering

Building Rearticulable Models for Arbitrary 3D Objects from 4D Point Clouds

1 code implementation CVPR 2023 Shaowei Liu, Saurabh Gupta, Shenlong Wang

We build rearticulable models for arbitrary everyday man-made objects containing an arbitrary number of parts that are connected together in arbitrary ways via 1 degree-of-freedom joints.

PEANUT: Predicting and Navigating to Unseen Targets

no code implementations ICCV 2023 Albert J. Zhai, Shenlong Wang

In this work, we present a straightforward method for learning these regularities by predicting the locations of unobserved objects from incomplete semantic maps.

ObjectGoal Navigation Prediction

QFF: Quantized Fourier Features for Neural Field Representations

no code implementations2 Dec 2022 Jae Yong Lee, Yuqun Wu, Chuhang Zou, Shenlong Wang, Derek Hoiem

Instead, we propose to encode features in bins of Fourier features that are commonly used for positional encoding.

NeRF

CASA: Category-agnostic Skeletal Animal Reconstruction

no code implementations4 Nov 2022 Yuefan Wu, Zeyuan Chen, Shaowei Liu, Zhongzheng Ren, Shenlong Wang

Recovering the skeletal shape of an animal from a monocular video is a longstanding challenge.

Retrieval

Learning to Generate Realistic LiDAR Point Clouds

1 code implementation8 Sep 2022 Vlas Zyrianov, Xiyue Zhu, Shenlong Wang

We present LiDARGen, a novel, effective, and controllable generative model that produces realistic LiDAR point cloud sensory readings.

Denoising Point Cloud Generation

Virtual Correspondence: Humans as a Cue for Extreme-View Geometry

no code implementations CVPR 2022 Wei-Chiu Ma, Anqi Joyce Yang, Shenlong Wang, Raquel Urtasun, Antonio Torralba

Similar to classic correspondences, VCs conform with epipolar geometry; unlike classic correspondences, VCs do not need to be co-visible across views.

3D Reconstruction Camera Pose Estimation +2

NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

1 code implementation CVPR 2022 Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang, Shenlong Wang

We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance.

Novel View Synthesis

Deep Feedback Inverse Problem Solver

no code implementations ECCV 2020 Wei-Chiu Ma, Shenlong Wang, Jiayuan Gu, Sivabalan Manivasagam, Antonio Torralba, Raquel Urtasun

Specifically, at each iteration, the neural network takes the feedback as input and outputs an update on the current estimation.

Pose Estimation

Non-parametric Memory for Spatio-Temporal Segmentation of Construction Zones for Self-Driving

no code implementations18 Jan 2021 Min Bai, Shenlong Wang, Kelvin Wong, Ersin Yumer, Raquel Urtasun

In this paper, we introduce a non-parametric memory representation for spatio-temporal segmentation that captures the local space and time around an autonomous vehicle (AV).

Mending Neural Implicit Modeling for 3D Vehicle Reconstruction in the Wild

no code implementations18 Jan 2021 Shivam Duggal, ZiHao Wang, Wei-Chiu Ma, Sivabalan Manivasagam, Justin Liang, Shenlong Wang, Raquel Urtasun

Reconstructing high-quality 3D objects from sparse, partial observations from a single view is of crucial importance for various applications in computer vision, robotics, and graphics.

3D Object Reconstruction

Deep Parametric Continuous Convolutional Neural Networks

no code implementations CVPR 2018 Shenlong Wang, Simon Suo, Wei-Chiu Ma, Andrei Pokrovsky, Raquel Urtasun

Standard convolutional neural networks assume a grid structured input is available and exploit discrete convolutions as their fundamental building blocks.

Ranked #2 on Semantic Segmentation on S3DIS Area5 (Number of params metric)

Motion Estimation Point Cloud Segmentation +1

S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling

no code implementations CVPR 2021 Ze Yang, Shenlong Wang, Sivabalan Manivasagam, Zeng Huang, Wei-Chiu Ma, Xinchen Yan, Ersin Yumer, Raquel Urtasun

Constructing and animating humans is an important component for building virtual worlds in a wide variety of applications such as virtual reality or robotics testing in simulation.

Asynchronous Multi-View SLAM

no code implementations17 Jan 2021 Anqi Joyce Yang, Can Cui, Ioan Andrei Bârsan, Raquel Urtasun, Shenlong Wang

Existing multi-camera SLAM systems assume synchronized shutters for all cameras, which is often not the case in practice.

Sensor Modeling

SceneGen: Learning to Generate Realistic Traffic Scenes

no code implementations CVPR 2021 Shuhan Tan, Kelvin Wong, Shenlong Wang, Sivabalan Manivasagam, Mengye Ren, Raquel Urtasun

Existing methods typically insert actors into the scene according to a set of hand-crafted heuristics and are limited in their ability to model the true complexity and diversity of real traffic scenes, thus inducing a content gap between synthesized traffic scenes versus real ones.

Diversity

Pit30M: A Benchmark for Global Localization in the Age of Self-Driving Cars

1 code implementation23 Dec 2020 Julieta Martinez, Sasha Doubov, Jack Fan, Ioan Andrei Bârsan, Shenlong Wang, Gellért Máttyus, Raquel Urtasun

We are interested in understanding whether retrieval-based localization approaches are good enough in the context of self-driving vehicles.

LIDAR Semantic Segmentation Retrieval +2

Convolutional Recurrent Network for Road Boundary Extraction

no code implementations CVPR 2019 Justin Liang, Namdar Homayounfar, Wei-Chiu Ma, Shenlong Wang, Raquel Urtasun

Creating high definition maps that contain precise information of static elements of the scene is of utmost importance for enabling self driving cars to drive safely.

Self-Driving Cars

Learning to Localize Using a LiDAR Intensity Map

no code implementations20 Dec 2020 Ioan Andrei Bârsan, Shenlong Wang, Andrei Pokrovsky, Raquel Urtasun

In this paper we propose a real-time, calibration-agnostic and effective localization system for self-driving cars.

Self-Driving Cars

Learning to Localize Through Compressed Binary Maps

no code implementations CVPR 2019 Xinkai Wei, Ioan Andrei Bârsan, Shenlong Wang, Julieta Martinez, Raquel Urtasun

One of the main difficulties of scaling current localization systems to large environments is the on-board storage required for the maps.

Deep Continuous Fusion for Multi-Sensor 3D Object Detection

no code implementations ECCV 2018 Ming Liang, Bin Yang, Shenlong Wang, Raquel Urtasun

In this paper, we propose a novel 3D object detector that can exploit both LIDAR as well as cameras to perform very accurate localization.

3D Object Detection Object +1

MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models

no code implementations NeurIPS 2020 Sourav Biswas, Jerry Liu, Kelvin Wong, Shenlong Wang, Raquel Urtasun

Our model exploits spatio-temporal relationships across multiple LiDAR sweeps to reduce the bitrate of both geometry and intensity values.

Conditional Entropy Coding for Efficient Video Compression

no code implementations ECCV 2020 Jerry Liu, Shenlong Wang, Wei-Chiu Ma, Meet Shah, Rui Hu, Pranaab Dhawan, Raquel Urtasun

We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames.

MS-SSIM SSIM +1

DSDNet: Deep Structured self-Driving Network

no code implementations ECCV 2020 Wenyuan Zeng, Shenlong Wang, Renjie Liao, Yun Chen, Bin Yang, Raquel Urtasun

In this paper, we propose the Deep Structured self-Driving Network (DSDNet), which performs object detection, motion prediction, and motion planning with a single neural network.

Motion Planning motion prediction +2

LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World

no code implementations CVPR 2020 Sivabalan Manivasagam, Shenlong Wang, Kelvin Wong, Wenyuan Zeng, Mikita Sazanovich, Shuhan Tan, Bin Yang, Wei-Chiu Ma, Raquel Urtasun

We first utilize ray casting over the 3D scene and then use a deep neural network to produce deviations from the physics-based simulation, producing realistic LiDAR point clouds.

Identifying Unknown Instances for Autonomous Driving

no code implementations24 Oct 2019 Kelvin Wong, Shenlong Wang, Mengye Ren, Ming Liang, Raquel Urtasun

In the past few years, we have seen great progress in perception algorithms, particular through the use of deep learning.

Autonomous Driving Instance Segmentation +1

DSIC: Deep Stereo Image Compression

1 code implementation ICCV 2019 Jerry Liu, Shenlong Wang, Raquel Urtasun

In this paper we tackle the problem of stereo image compression, and leverage the fact that the two images have overlapping fields of view to further compress the representations.

Decoder Image Compression

Proximal Deep Structured Models

no code implementations NeurIPS 2016 Shenlong Wang, Sanja Fidler, Raquel Urtasun

Many problems in real-world applications involve predicting continuous-valued random variables that are statistically related.

Image Denoising Optical Flow Estimation

TorontoCity: Seeing the World with a Million Eyes

no code implementations ICCV 2017 Shenlong Wang, Min Bai, Gellert Mattyus, Hang Chu, Wenjie Luo, Bin Yang, Justin Liang, Joel Cheverie, Sanja Fidler, Raquel Urtasun

In this paper we introduce the TorontoCity benchmark, which covers the full greater Toronto area (GTA) with 712. 5 $km^2$ of land, 8439 $km$ of road and around 400, 000 buildings.

Instance Segmentation Semantic Segmentation

AutoScaler: Scale-Attention Networks for Visual Correspondence

no code implementations17 Nov 2016 Shenlong Wang, Linjie Luo, Ning Zhang, Jia Li

We propose AutoScaler, a scale-attention network to explicitly optimize this trade-off in visual correspondence tasks.

Optical Flow Estimation

Find your Way by Observing the Sun and Other Semantic Cues

no code implementations23 Jun 2016 Wei-Chiu Ma, Shenlong Wang, Marcus A. Brubaker, Sanja Fidler, Raquel Urtasun

In this paper we present a robust, efficient and affordable approach to self-localization which does not require neither GPS nor knowledge about the appearance of the world.

HD Maps: Fine-Grained Road Segmentation by Parsing Ground and Aerial Images

no code implementations CVPR 2016 Gellert Mattyus, Shenlong Wang, Sanja Fidler, Raquel Urtasun

In this paper we present an approach to enhance existing maps with fine grained segmentation categories such as parking spots and sidewalk, as well as the number and location of road lanes.

Road Segmentation

The Global Patch Collider

no code implementations CVPR 2016 Shenlong Wang, Sean Ryan Fanello, Christoph Rhemann, Shahram Izadi, Pushmeet Kohli

In contrast to conventional approaches that rely on pairwise distance computation, our algorithm isolates distinctive pixel pairs that hit the same leaf during traversal through multiple learned tree structures.

Optical Flow Estimation Stereo Matching +1

Enhancing Road Maps by Parsing Aerial Images Around the World

no code implementations ICCV 2015 Gellert Mattyus, Shenlong Wang, Sanja Fidler, Raquel Urtasun

In recent years, contextual models that exploit maps have been shown to be very effective for many recognition and localization tasks.

Semantic Segmentation

Lost Shopping! Monocular Localization in Large Indoor Spaces

no code implementations ICCV 2015 Shenlong Wang, Sanja Fidler, Raquel Urtasun

In this paper we propose a novel approach to localization in very large indoor spaces (i. e., 200+ store shopping malls) that takes a single image and a floor plan of the environment as input.

Text Detection Translation

Efficient Inference of Continuous Markov Random Fields with Polynomial Potentials

no code implementations NeurIPS 2014 Shenlong Wang, Alex Schwing, Raquel Urtasun

In this paper, we prove that every multivariate polynomial with even degree can be decomposed into a sum of convex and concave polynomials.

3D Reconstruction Image Denoising

Cannot find the paper you are looking for? You can Submit a new open access paper.