Search Results for author: Zhongang Cai

Found 42 papers, 25 papers with code

3D Convolution on RGB-D Point Clouds for Accurate Model-free Object Pose Estimation

no code implementations29 Dec 2018 Zhongang Cai, Cunjun Yu, Quang-Cuong Pham

The conventional pose estimation of a 3D object usually requires the knowledge of the 3D model of the object.

Pose Estimation Robotic Grasping +1

Leveraging Temporal Information for 3D Detection and Domain Adaptation

1 code implementation30 Jun 2020 Cunjun Yu, Zhongang Cai, Daxuan Ren, Haiyu Zhao

Ever since the prevalent use of the LiDARs in autonomous driving, tremendous improvements have been made to the learning on the point clouds.

Autonomous Driving Domain Adaptation

MessyTable: Instance Association in Multiple Camera Views

no code implementations ECCV 2020 Zhongang Cai, Junzhe Zhang, Daxuan Ren, Cunjun Yu, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Chen Change Loy

We present an interesting and challenging dataset that features a large number of scenes with messy tables captured from multiple camera views.

Leveraging Localization for Multi-camera Association

no code implementations7 Aug 2020 Zhongang Cai, Cunjun Yu, Junzhe Zhang, Jiawei Ren, Haiyu Zhao

We present McAssoc, a deep learning approach to the as-sociation of detection bounding boxes in different views ofa multi-camera system.

BiPointNet: Binary Neural Network for Point Clouds

1 code implementation ICLR 2021 Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Liu, Hao Su

To alleviate the resource constraint for real-time point cloud applications that run on edge devices, in this paper we present BiPointNet, the first model binarization approach for efficient deep learning on point clouds.

Binarization

REFINE: Prediction Fusion Network for Panoptic Segmentation

no code implementations15 Dec 2020 Jiawei Ren, Cunjun Yu, Zhongang Cai, Mingyuan Zhang, Chongsong Chen, Haiyu Zhao, Shuai Yi, Hongsheng Li

Panoptic segmentation aims at generating pixel-wise class and instance predictions for each pixel in the input image, which is a challenging task and far more complicated than naively fusing the semantic and instance segmentation results.

Instance Segmentation Panoptic Segmentation +1

Variational Relational Point Completion Network

1 code implementation CVPR 2021 Liang Pan, Xinyi Chen, Zhongang Cai, Junzhe Zhang, Haiyu Zhao, Shuai Yi, Ziwei Liu

In particular, we propose a dual-path architecture to enable principled probabilistic modeling across partial and complete clouds.

Point Cloud Completion

Unsupervised 3D Shape Completion through GAN Inversion

no code implementations CVPR 2021 Junzhe Zhang, Xinyi Chen, Zhongang Cai, Liang Pan, Haiyu Zhao, Shuai Yi, Chai Kiat Yeo, Bo Dai, Chen Change Loy

In contrast to previous fully supervised approaches, in this paper we present ShapeInversion, which introduces Generative Adversarial Network (GAN) inversion to shape completion for the first time.

Generative Adversarial Network valid

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

1 code implementation CVPR 2022 Chongzhi Zhang, Mingyuan Zhang, Shanghang Zhang, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Xianglong Liu, Ziwei Liu

By comprehensively investigating these GE-ViTs and comparing with their corresponding CNN models, we observe: 1) For the enhanced model, larger ViTs still benefit more for the OOD generalization.

Out-of-Distribution Generalization Self-Supervised Learning

Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency

1 code implementation ICCV 2021 Zhipeng Luo, Zhongang Cai, Changqing Zhou, Gongjie Zhang, Haiyu Zhao, Shuai Yi, Shijian Lu, Hongsheng Li, Shanghang Zhang, Ziwei Liu

In addition, existing 3D domain adaptive detection methods often assume prior access to the target domain annotations, which is rarely feasible in the real world.

3D Object Detection Autonomous Driving +1

MeshInversion: 3D textured mesh reconstruction with generative prior

no code implementations29 Sep 2021 Junzhe Zhang, Daxuan Ren, Zhongang Cai, Chai Kiat Yeo, Bo Dai, Chen Change Loy

Reconstruction is achieved by searching for a latent space in the 3D GAN that best resembles the target mesh in accordance with the single view observation.

Playing for 3D Human Recovery

no code implementations14 Oct 2021 Zhongang Cai, Mingyuan Zhang, Jiawei Ren, Chen Wei, Daxuan Ren, Zhengyu Lin, Haiyu Zhao, Lei Yang, Chen Change Loy, Ziwei Liu

Specifically, we contribute GTA-Human, a large-scale 3D human dataset generated with the GTA-V game engine, featuring a highly diverse set of subjects, actions, and scenarios.

Robust Partial-to-Partial Point Cloud Registration in a Full Range

1 code implementation30 Nov 2021 Liang Pan, Zhongang Cai, Ziwei Liu

\textbf{3)} Based on a synergy of hierarchical graph networks and graphical modeling, we propose the {H}ierarchical {G}raphical {M}odeling (\textbf{HGM}) architecture to encode robust descriptors consisting of i) a unary term learned from {\textit{RI}} features; and ii) multiple smoothness terms encoded from neighboring point relations at different scales through our TPT modules.

Graph Matching Point Cloud Registration

PTTR: Relational 3D Point Cloud Object Tracking with Transformer

1 code implementation CVPR 2022 Changqing Zhou, Zhipeng Luo, Yueru Luo, Tianrui Liu, Liang Pan, Zhongang Cai, Haiyu Zhao, Shijian Lu

In a point cloud sequence, 3D object tracking aims to predict the location and orientation of an object in the current search point cloud given a template point cloud.

3D Object Tracking Object +3

Garment4D: Garment Reconstruction from Point Cloud Sequences

1 code implementation NeurIPS 2021 Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

The main challenges are two-fold: 1) effective 3D feature learning for fine details, and 2) capture of garment dynamics caused by the interaction between garments and the human body, especially for loose garments like skirts.

Garment Reconstruction

Versatile Multi-Modal Pre-Training for Human-Centric Perception

1 code implementation CVPR 2022 Fangzhou Hong, Liang Pan, Zhongang Cai, Ziwei Liu

To tackle the challenges, we design the novel Dense Intra-sample Contrastive Learning and Sparse Structure-aware Contrastive Learning targets by hierarchically learning a modal-invariant latent space featured with continuous and ordinal feature distribution and structure-aware semantic consistency.

Contrastive Learning Human Parsing +1

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

1 code implementation17 May 2022 Fangzhou Hong, Mingyuan Zhang, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu

Our key insight is to take advantage of the powerful vision-language model CLIP for supervising neural human generation, in terms of 3D geometry, texture and animation.

Language Modelling Motion Synthesis +1

Monocular 3D Object Reconstruction with GAN Inversion

1 code implementation20 Jul 2022 Junzhe Zhang, Daxuan Ren, Zhongang Cai, Chai Kiat Yeo, Bo Dai, Chen Change Loy

Reconstruction is achieved by searching for a latent space in the 3D GAN that best resembles the target mesh in accordance with the single view observation.

3D Object Reconstruction Object

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

2 code implementations31 Aug 2022 Mingyuan Zhang, Zhongang Cai, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu

Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected.

Denoising Motion Synthesis

Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond Algorithms

1 code implementation21 Sep 2022 Hui En Pang, Zhongang Cai, Lei Yang, Tianwei Zhang, Ziwei Liu

Experiments with 10 backbones, ranging from CNNs to transformers, show the knowledge learnt from a proximity task is readily transferable to human mesh recovery.

3D human pose and shape estimation Benchmarking +1

BiBench: Benchmarking and Analyzing Network Binarization

1 code implementation26 Jan 2023 Haotong Qin, Mingyuan Zhang, Yifu Ding, Aoyu Li, Zhongang Cai, Ziwei Liu, Fisher Yu, Xianglong Liu

Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings by minimizing the bit-width.

Benchmarking Binarization

Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction

1 code implementation ICCV 2023 Wenjia Wang, Yongtao Ge, Haiyi Mei, Zhongang Cai, Qingping Sun, Yanjun Wang, Chunhua Shen, Lei Yang, Taku Komura

As it is hard to calibrate single-view RGB images in the wild, existing 3D human mesh reconstruction (3DHMR) methods either use a constant large focal length or estimate one based on the background environment context, which can not tackle the problem of the torso, limb, hand or face distortion caused by perspective camera projection when the camera is close to the human body.

3D Human Pose Estimation 3D Reconstruction

Learning Dense UV Completion for Human Mesh Recovery

no code implementations20 Jul 2023 Yanjun Wang, Qingping Sun, Wenjia Wang, Jun Ling, Zhongang Cai, Rong Xie, Li Song

Our method utilizes a dense correspondence map to separate visible human features and completes human features on a structured UV map dense human with an attention-based feature completion module.

Human Mesh Recovery

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

1 code implementation22 Aug 2023 YiWen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin

Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs).

3D Generation Text to 3D

PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

no code implementations28 Aug 2023 Zhongang Cai, Liang Pan, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu

To tackle these challenges, we propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings, which iteratively refines point features through a cascaded architecture.

3D human pose and shape estimation

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

no code implementations13 Nov 2023 Zhongfei Qing, Zhongang Cai, Zhitao Yang, Lei Yang

Generating natural human motion from a story has the potential to transform the landscape of animation, gaming, and film industries.

Motion Synthesis Position

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

no code implementations3 Dec 2023 Fan Yang, Tianyi Chen, Xiaosheng He, Zhongang Cai, Lei Yang, Si Wu, Guosheng Lin

We propose AttriHuman-3D, an editable 3D human generation model, which address the aforementioned problems with attribute decomposition and indexing.

Attribute Disentanglement

Digital Life Project: Autonomous 3D Characters with Social Intelligence

no code implementations7 Dec 2023 Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment.

Motion Captioning Motion Synthesis

Towards Robust and Expressive Whole-body Human Pose and Shape Estimation

1 code implementation NeurIPS 2023 Hui EnPang, Zhongang Cai, Lei Yang, Qingyi Tao, Zhonghua Wu, Tianwei Zhang, Ziwei Liu

Whole-body pose and shape estimation aims to jointly predict different behaviors (e. g., pose, hand gesture, facial expression) of the entire human body from a monocular image.

FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

1 code implementation NeurIPS 2023 Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu

Notably, FineMoGen further enables zero-shot motion editing capabilities with the aid of modern large language models (LLM), which faithfully manipulates motion sequences with fine-grained instructions.

Motion Synthesis

WHAC: World-grounded Humans and Cameras

1 code implementation19 Mar 2024 Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

In this study, we aim to recover expressive parametric human models (i. e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera.

Pose Estimation

Large Motion Model for Unified Multi-Modal Motion Generation

no code implementations1 Apr 2024 Mingyuan Zhang, Daisheng Jin, Chenyang Gu, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu

In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model.

Cannot find the paper you are looking for? You can Submit a new open access paper.