1 code implementation • CVPR 2019 • Ji Hou, Angela Dai, Matthias Nießner
We introduce 3D-SIS, a novel neural network architecture for 3D semantic instance segmentation in commodity RGB-D scans.
Ranked #3 on 3D Semantic Instance Segmentation on ScanNetV2
no code implementations • CVPR 2020 • Ji Hou, Angela Dai, Matthias Nießner
Thus, we introduce the task of semantic instance completion: from an incomplete RGB-D scan of a scene, we aim to detect the individual object instances and infer their complete object geometry.
1 code implementation • 6 Apr 2020 • Nika Dogonadze, Jana Obernosterer, Ji Hou
Rapid progress in deep learning is continuously making it easier and cheaper to generate video forgeries.
1 code implementation • CVPR 2021 • Yinyu Nie, Ji Hou, Xiaoguang Han, Matthias Nießner
In this work, we introduce RfD-Net that jointly detects and reconstructs dense object surfaces directly from raw point clouds.
2 code implementations • CVPR 2021 • Ji Hou, Benjamin Graham, Matthias Nießner, Saining Xie
The rapid progress in 3D scene understanding has come with growing demand for data; however, collecting and annotating 3D scenes (e. g. point clouds) are notoriously hard.
Ranked #9 on 3D Semantic Segmentation on ScanNet200
1 code implementation • ICCV 2021 • Ji Hou, Saining Xie, Benjamin Graham, Angela Dai, Matthias Nießner
Inspired by these advances in geometric understanding, we aim to imbue image-based perception with representations learned under geometric constraints.
1 code implementation • NeurIPS 2021 • Manuel Dahnert, Ji Hou, Matthias Nießner, Angela Dai
Inspired by 2D panoptic segmentation, we propose to unify the tasks of geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into the task of panoptic 3D scene reconstruction - from a single RGB image, predicting the complete geometric reconstruction of the scene in the camera frustum of the image, along with semantic and instance segmentations.
1 code implementation • 27 Sep 2022 • Hao Yu, Ji Hou, Zheng Qin, Mahdi Saleh, Ivan Shugurov, Kai Wang, Benjamin Busam, Slobodan Ilic
More specifically, 3D structures of the whole frame are first represented by our global PPF signatures, from which structural descriptors are learned to help geometric descriptors sense the 3D world beyond local regions.
no code implementations • CVPR 2023 • Ji Hou, Xiaoliang Dai, Zijian He, Angela Dai, Matthias Nießner
Current popular backbones in computer vision, such as Vision Transformers (ViT) and ResNets are trained to perceive the world from 2D images.
1 code implementation • 28 Feb 2023 • Yu Zhang, Junle Yu, Xiaolin Huang, Wenhui Zhou, Ji Hou
Different from previous methods that only use geometry representation, our module is specifically designed to effectively correlate color into geometry for the point cloud registration task.
1 code implementation • CVPR 2023 • Hao Yu, Zheng Qin, Ji Hou, Mahdi Saleh, Dongsheng Li, Benjamin Busam, Slobodan Ilic
To this end, we introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task.
no code implementations • 21 Mar 2023 • Yu-Jhe Li, Tao Xu, Ji Hou, Bichen Wu, Xiaoliang Dai, Albert Pumarola, Peizhao Zhang, Peter Vajda, Kris Kitani
We note that the novelty of our model lies in that we introduce contrastive learning during training the diffusion prior which enables the generation of the valid view-invariant latent code.
2 code implementations • ICCV 2023 • Chenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, RuiLong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi Tomizuka
We present NeRF-Det, a novel method for indoor 3D detection with posed RGB images as input.
no code implementations • 27 Sep 2023 • Xiaoliang Dai, Ji Hou, Chih-Yao Ma, Sam Tsai, Jialiang Wang, Rui Wang, Peizhao Zhang, Simon Vandenhende, Xiaofang Wang, Abhimanyu Dubey, Matthew Yu, Abhishek Kadian, Filip Radenovic, Dhruv Mahajan, Kunpeng Li, Yue Zhao, Vladan Petrovic, Mitesh Kumar Singh, Simran Motwani, Yi Wen, Yiwen Song, Roshan Sumbaly, Vignesh Ramanathan, Zijian He, Peter Vajda, Devi Parikh
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.
no code implementations • 6 Dec 2023 • Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang
However, one of the major drawbacks of diffusion models is that the image generation process is costly.
no code implementations • 8 Dec 2023 • Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma, Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou
Central to our approach is a user-defined 3D semantic proxy room that outlines a rough room layout based on semantic bounding boxes and a textual description of the overall room style.