Scene Synthesis from Human Motion

4 Jan 2023  ·  Sifan Ye, Yixing Wang, Jiaman Li, Dennis Park, C. Karen Liu, Huazhe Xu, Jiajun Wu ·

Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Our framework, Scene Synthesis from HUMan MotiON (SUMMON), includes two steps. It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion. Based on these predictions, SUMMON then chooses interacting objects and optimizes physical plausibility losses; it further populates the scene with objects that do not interact with humans. Experimental results demonstrate that SUMMON synthesizes feasible, plausible, and diverse scenes and has the potential to generate extensive human-scene interaction data for the community.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Indoor Scene Synthesis PRO-teXt SUMMON CD 2.1437 # 4
EMD 1.3994 # 3
F1 0.0673 # 3
3D Semantic Scene Completion PRO-teXt SUMMON F1 0.0673 # 3
CD 2.1437 # 4
CMD 1.3994 # 3

Methods


No methods listed for this paper. Add relevant methods here