no code implementations • 17 Feb 2025 • Jingnan Gao, Weizhe Liu, Weixuan Sun, Senbo Wang, Xibin Song, Taizhang Shang, Shenzhou Chen, Hongdong Li, Xiaokang Yang, Yichao Yan, Pan Ji
In this paper, we introduce MARS, a novel approach for 3D shape detailization.
no code implementations • 27 Jan 2025 • Zhongjin Luo, Yang Li, Mingrui Zhang, Senbo Wang, Han Yan, Xibin Song, Taizhang Shang, Wei Mao, Hongdong Li, Xiaoguang Han, Pan Ji
Finally, by recovering the similarity transformation using multiview silhouette supervision and addressing asset-body penetration with physics simulators, the 3D asset can be accurately fitted onto the target human body.
1 code implementation • 6 Jan 2025 • Xuyang Wang, Ziang Cheng, Zhenyu Li, Jiayu Yang, Haorui Ji, Pan Ji, Mehrtash Harandi, Richard Hartley, Hongdong Li
This paper proposes DoubleDiffusion, a novel framework that combines heat dissipation diffusion and denoising diffusion for direct generative learning on 3D mesh surfaces.
1 code implementation • 18 Dec 2024 • Zhenhong Sun, Yifu Wang, Yonhon Ng, Yunfei Duan, Daoyi Dong, Hongdong Li, Pan Ji
This scheme revitalizes the existing ControlNet model, enabling effective handling of multi-instance generations, involving prompt balance, characteristics prominence, and dense tuning.
no code implementations • 27 Nov 2024 • Han Yan, Mingrui Zhang, Yang Li, Chao Ma, Pan Ji
We present PhyCAGE, the first approach for physically plausible compositional 3D asset generation from a single image.
3 code implementations • 21 Nov 2024 • Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D'Arcy, David Wadden, Matt Latzke, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig, Dan Weld, Doug Downey, Wen-tau Yih, Pang Wei Koh, Hannaneh Hajishirzi
Scientific progress depends on researchers' ability to synthesize the growing body of literature.
no code implementations • 9 Sep 2024 • Chengzeng Feng, Jiacheng Wei, Cheng Chen, Yang Li, Pan Ji, Fayao Liu, Hongdong Li, Guosheng Lin
We propose Prim2Room, a novel framework for controllable room mesh generation leveraging 2D layout conditions and 3D primitive retrieval to facilitate precise 3D layout specification.
no code implementations • 8 Aug 2024 • Yongzhi Xu, Yonhon Ng, Yifu Wang, Inkyu Sa, Yunfei Duan, Yang Li, Pan Ji, Hongdong Li
From the generated isometric image, we use a pre-trained image understanding method to segment the image into meaningful parts, such as off-ground objects, trees, and buildings, and extract the 2D scene layout.
no code implementations • 18 Jul 2024 • Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du, Tianhua Tao, Ofir Press, Jamie Callan, Eliu Huerta, Hao Peng
Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations.
no code implementations • 24 May 2024 • Ruikai Cui, Xibin Song, Weixuan Sun, Senbo Wang, Weizhe Liu, Shenzhou Chen, Taizhang Shang, Yang Li, Nick Barnes, Hongdong Li, Pan Ji
Large Reconstruction Models have made significant strides in the realm of automated 3D content generation from single or multiple input images.
no code implementations • 27 Mar 2024 • Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan Ji
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints.
no code implementations • 24 Mar 2024 • Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji
We present Frankenstein, a diffusion-based framework that can generate semantic-compositional 3D scenes in a single pass.
1 code implementation • 30 Jan 2024 • Zhennan Wu, Yang Li, Han Yan, Taizhang Shang, Weixuan Sun, Senbo Wang, Ruikai Cui, Weizhe Liu, Hiroyuki Sato, Hongdong Li, Pan Ji
A variational auto-encoder is employed to compress the tri-planes into the latent tri-plane space, on which the denoising diffusion process is performed.
1 code implementation • CVPR 2024 • Jiayu Yang, Ziang Cheng, Yunfei Duan, Pan Ji, Hongdong Li
Given a single image of a 3D object, this paper proposes a novel method (named ConsistNet) that is able to generate multiple images of the same object, as if seen they are captured from different viewpoints, while the 3D (multi-view) consistencies among those multiple generated images are effectively exploited.
1 code implementation • 19 Sep 2023 • Jiaxin Wei, Xibin Song, Weizhe Liu, Laurent Kneip, Hongdong Li, Pan Ji
While showing promising results, recent RGB-D camera-based category-level object pose estimation methods have restricted applications due to the heavy reliance on depth sensors.
no code implementations • 12 Apr 2023 • Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu
Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules.
no code implementations • 25 Oct 2022 • Zhiqi Zhang, Nitin Bansal, Changjiang Cai, Pan Ji, Qingan Yan, Xiangyu Xu, Yi Xu
To this end, we propose CLIP-FLow, a semi-supervised iterative pseudo-labeling framework to transfer the pretraining knowledge to the target real domain.
no code implementations • 18 Jul 2022 • Runze Li, Pan Ji, Yi Xu, Bir Bhanu
As compared to outdoor environments, estimating depth of monocular videos for indoor environments, using self-supervised methods, results in two additional challenges: (i) the depth range of indoor video sequences varies a lot across different frames, making it difficult for the depth network to induce consistent depth cues for training; (ii) the indoor sequences recorded with handheld devices often contain much more rotational motions, which cause difficulties for the pose network to predict accurate relative camera poses.
no code implementations • 21 Jun 2022 • Nitin Bansal, Pan Ji, Junsong Yuan, Yi Xu
Multi-task learning (MTL) paradigm focuses on jointly learning two or more tasks, aiming for significant improvement w. r. t model's generalizability, performance, and training/inference memory footprint.
1 code implementation • CVPR 2023 • Changjiang Cai, Pan Ji, Qingan Yan, Yi Xu
At the pixel level, we propose to break the symmetry of the Siamese network (which is typically used in MVS to extract image features) by introducing a transformer block to the reference image (but not to the source images).
no code implementations • 28 May 2022 • Zhenyue Qin, Pan Ji, Dongwoo Kim, Yang Liu, Saeed Anwar, Tom Gedeon
Skeleton sequences are compact and lightweight.
no code implementations • 5 May 2022 • Pan Ji, Yuan Tian, Qingan Yan, Yuxin Ma, Yi Xu
The CNN depth effectively bootstraps the back-end optimization of SLAM and meanwhile the CNN uncertainty adaptively weighs the contribution of each feature point to the back-end optimization.
no code implementations • 5 May 2022 • Qingan Yan, Pan Ji, Nitin Bansal, Yuxin Ma, Yuan Tian, Yi Xu
In this paper, we deal with the problem of monocular depth estimation for fisheye cameras in a self-supervised manner.
no code implementations • 4 May 2022 • Zhenyue Qin, Yang Liu, Madhawa Perera, Tom Gedeon, Pan Ji, Dongwoo Kim, Saeed Anwar
To this end, we present a review in the form of a taxonomy on existing works of skeleton-based action recognition.
no code implementations • 3 May 2022 • Pan Ji, Qingan Yan, Yuxin Ma, Yi Xu
We present a robust and accurate depth refinement system, named GeoRefine, for geometrically-consistent dense mapping from monocular sequences.
1 code implementation • CVPR 2022 • Jiachen Liu, Pan Ji, Nitin Bansal, Changjiang Cai, Qingan Yan, Xiaolei Huang, Yi Xu
The semantic plane detection branch is based on a single-view plane detection framework but with differences.
1 code implementation • 12 Mar 2022 • Sudhir Yarram, Jialian Wu, Pan Ji, Yi Xu, Junsong Yuan
To improve the training efficiency, we propose Deformable VisTR, leveraging spatio-temporal deformable attention module that only attends to a small fixed set of key spatio-temporal sampling points around a reference point.
no code implementations • ICCV 2021 • Pan Ji, Runze Li, Bir Bhanu, Yi Xu
The effectiveness of each module is shown through a carefully conducted ablation study and the demonstration of the state-of-the-art performance on three indoor datasets, \ie, EuRoC, NYUv2, and 7-scenes.
1 code implementation • 24 May 2021 • Zhenyue Qin, Saeed Anwar, Dongwoo Kim, Yang Liu, Pan Ji, Tom Gedeon
Such GNNs are incapable of learning relative positions between graph nodes within a graph.
1 code implementation • 11 May 2021 • Yang Liu, Saeed Anwar, Zhenyue Qin, Pan Ji, Sabrina Caldwell, Tom Gedeon
The prevalent convolutional neural network (CNN) based image denoising methods extract features of images to restore the clean ground truth, achieving high denoising accuracy.
1 code implementation • 4 May 2021 • Zhenyue Qin, Yang Liu, Pan Ji, Dongwoo Kim, Lei Wang, Bob McKay, Saeed Anwar, Tom Gedeon
Recent skeleton-based action recognition methods extract features from 3D joint coordinates as spatial-temporal cues, using these representations in a graph neural network for feature fusion to boost recognition performance.
Ranked #27 on
Skeleton Based Action Recognition
on NTU RGB+D 120
1 code implementation • CVPR 2021 • Yang Liu, Zhenyue Qin, Saeed Anwar, Pan Ji, Dongwoo Kim, Sabrina Caldwell, Tom Gedeon
InvDN transforms the noisy input into a low-resolution clean image and a latent representation containing noise.
no code implementations • 2 Apr 2021 • Ze Ma, Yifan Yao, Pan Ji, Chao Ma
Estimating 3D human pose and shape from a single image is highly under-constrained.
no code implementations • 2 Nov 2020 • Pengfei Fang, Pan Ji, Lars Petersson, Mehrtash Harandi
Modern video person re-identification (re-ID) machines are often trained using a metric learning approach, supervised by a triplet loss.
3 code implementations • NeurIPS 2020 • Jianyuan Wang, Yiran Zhong, Yuchao Dai, Kaihao Zhang, Pan Ji, Hongdong Li
Learning matching costs has been shown to be critical to the success of the state-of-the-art deep stereo matching methods, in which 3D convolutions are applied on a 4D feature volume to learn a 3D cost volume.
no code implementations • 7 Oct 2020 • Pengfei Fang, Pan Ji, Jieming Zhou, Lars Petersson, Mehrtash Harandi
Full attention, which generates an attention value per element of the input feature maps, has been successfully demonstrated to be beneficial in visual tasks.
no code implementations • 16 Aug 2020 • Ming Zhu, Chao Ma, Pan Ji, Xiaokang Yang
In this paper, we focus on exploring the fusion of images and point clouds for 3D object detection in view of the complementary nature of the two modalities, i. e., images possess more semantic information while point clouds specialize in distance sensing.
no code implementations • ECCV 2020 • Yuliang Zou, Pan Ji, Quoc-Huy Tran, Jia-Bin Huang, Manmohan Chandraker
Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation.
no code implementations • CVPR 2020 • Buyu Liu, Bingbing Zhuang, Samuel Schulter, Pan Ji, Manmohan Chandraker
(2) Introducing the LSTM and FTM modules improves the prediction consistency in videos.
1 code implementation • ECCV 2020 • Lokender Tiwari, Pan Ji, Quoc-Huy Tran, Bingbing Zhuang, Saket Anand, Manmohan Chandraker
Classical monocular Simultaneous Localization And Mapping (SLAM) and the recently emerging convolutional neural networks (CNNs) for monocular depth prediction represent two largely disjoint approaches towards building a 3D map of the surrounding environment.
no code implementations • 30 Jul 2019 • Bingbing Zhuang, Quoc-Huy Tran, Pan Ji, Gim Hee Lee, Loong Fah Cheong, Manmohan Chandraker
Self-calibration of camera intrinsics and radial distortion has a long history of research in the computer vision community.
no code implementations • 24 Apr 2019 • Tong Zhang, Pan Ji, Mehrtash Harandi, Wenbing Huang, Hongdong Li
We introduce the Neural Collaborative Subspace Clustering, a neural model that discovers clusters of data points drawn from a union of low-dimensional subspaces.
no code implementations • CVPR 2019 • Yiran Zhong, Pan Ji, Jianyuan Wang, Yuchao Dai, Hongdong Li
In this paper, we propose Deep Epipolar Flow, an unsupervised optical flow method which incorporates global geometric constraints into network learning.
3 code implementations • CVPR 2019 • Xuelian Cheng, Yiran Zhong, Yuchao Dao, Pan Ji, Hongdong Li
In this paper, we present LidarStereoNet, the first unsupervised Lidar-stereo fusion network, which can be trained in an end-to-end manner without the need of ground truth depth maps.
no code implementations • 2 Nov 2018 • Tong Zhang, Pan Ji, Mehrtash Harandi, Richard Hartley, Ian Reid
In this paper, we introduce a method that simultaneously learns an embedding space along subspaces within it to minimize a notion of reconstruction error, thus addressing the problem of subspace clustering in an end-to-end learning paradigm.
3 code implementations • NeurIPS 2017 • Pan Ji, Tong Zhang, Hongdong Li, Mathieu Salzmann, Ian Reid
We present a novel deep neural network architecture for unsupervised subspace clustering.
Ranked #3 on
Image Clustering
on Extended Yale-B
1 code implementation • 17 Jul 2017 • Pan Ji, Ian Reid, Ravi Garg, Hongdong Li, Mathieu Salzmann
In this paper, we present a kernel subspace clustering method that can handle non-linear models.
no code implementations • ICCV 2017 • Pan Ji, Hongdong Li, Yuchao Dai, Ian Reid
Rigid structure-from-motion (RSfM) and non-rigid structure-from-motion (NRSfM) have long been treated in the literature as separate (different) problems.
no code implementations • CVPR 2016 • Pan Ji, Hongdong Li, Mathieu Salzmann, Yiran Zhong
Feature tracking is a fundamental problem in computer vision, with applications in many computer vision tasks, such as visual SLAM and action recognition.
1 code implementation • ICCV 2015 • Pan Ji, Mathieu Salzmann, Hongdong Li
The Shape Interaction Matrix (SIM) is one of the earliest approaches to performing subspace clustering (i. e., separating points drawn from a union of subspaces).
Ranked #2 on
Motion Segmentation
on Hopkins155