Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations ECCV 2020 Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

Human motion prediction motion prediction

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

no code implementations1 Jun 2023 Minghui Hu, Jianbin Zheng, Daqing Liu, Chuanxia Zheng, Chaoyue Wang, DaCheng Tao, Tat-Jen Cham

In this work, we propose Cocktail, a pipeline to mix various modalities into one embedding, amalgamated with a generalized ControlNet (gControlNet), a controllable normalisation (ControlNorm), and a spatial guidance sampling method, to actualize multi-modal and spatially-refined control for text-conditional diffusion models.

Conditional Image Generation

Explicit Correspondence Matching for Generalizable Neural Radiance Fields

1 code implementation24 Apr 2023 Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

The key to our approach lies in the explicitly modeled correspondence matching information, so as to provide the geometry prior to the prediction of NeRF color and density for volume rendering.

Novel View Synthesis

ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field

1 code implementation CVPR 2023 Zhe Jun Tang, Tat-Jen Cham, Haiyu Zhao

Our method, which we call ABLE-NeRF, significantly reduces `blurry' glossy surfaces in rendering and produces realistic translucent surfaces which lack in prior art.


Unified Discrete Diffusion for Simultaneous Vision-Language Generation

1 code implementation27 Nov 2022 Minghui Hu, Chuanxia Zheng, Heliang Zheng, Tat-Jen Cham, Chaoyue Wang, Zuopeng Yang, DaCheng Tao, Ponnuthurai N. Suganthan

The recently developed discrete diffusion models perform extraordinarily well in the text-to-image task, showing significant promise for handling the multi-modality signals.

multimodal generation Text Generation +1

Entry-Flipped Transformer for Inference and Prediction of Participant Behavior

no code implementations13 Jul 2022 Bo Hu, Tat-Jen Cham

Our key idea is to model the spatio-temporal relations among participants in a manner that is robust to error accumulation during frame-wise inference and prediction.

High-Quality Pluralistic Image Completion via Code Shared VQGAN

no code implementations5 Apr 2022 Chuanxia Zheng, Guoxian Song, Tat-Jen Cham, Jianfei Cai, Dinh Phung, Linjie Luo

In this work, we present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.

Image Reconstruction Vocal Bursts Intensity Prediction

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

1 code implementation21 Mar 2022 Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input.

3D-Aware Image Synthesis Translation

Global Context with Discrete Diffusion in Vector Quantised Modelling for Image Generation

no code implementations CVPR 2022 Minghui Hu, Yujie Wang, Tat-Jen Cham, Jianfei Yang, P. N. Suganthan

We show that with the help of a content-rich discrete visual codebook from VQ-VAE, the discrete diffusion model can also generate high fidelity images with global context, which compensates for the deficiency of the classical autoregressive model along pixel space.

Denoising Image Inpainting +1

Towards Unbiased Visual Emotion Recognition via Causal Intervention

1 code implementation26 Jul 2021 Yuedong Chen, Xu Yang, Tat-Jen Cham, Jianfei Cai

In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation.

Causal Inference Emotion Recognition

AgileGAN: stylizing portraits by inversion-consistent transfer learning

1 code implementation ACM Transactions on Graphics 2021 Guoxian Song, Linjie Luo, Jing Liu, Wan-Chun Ma, Chun-Pong Lai, Chuanxia Zheng, Tat-Jen Cham

While substantial progress has been made in automated stylization, generating high quality stylistic portraits is still a challenge, and even the recent popular Toonify suffers from several artifacts when used on real input images.

motion retargeting Transfer Learning

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

1 code implementation12 Apr 2021 Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai

In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.

Instance Segmentation Scene Understanding +1

The Spatially-Correlative Loss for Various Image Translation Tasks

2 code implementations CVPR 2021 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation.

Self-Supervised Learning Translation

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

no code implementations ICCV 2021 Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, Xiaohui Shen, Ding Liu, Nadia Magnenat Thalmann

Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series.

motion prediction Motion Synthesis

GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition

no code implementations6 Mar 2020 Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Jianming Zheng

Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture.

Face Model Facial Action Unit Detection

Recovering Facial Reflectance and Geometry from Multi-view Images

no code implementations27 Nov 2019 Guoxian Song, Jianmin Zheng, Jianfei Cai, Tat-Jen Cham

While the problem of estimating shapes and diffuse reflectances of human faces from images has been extensively studied, there is relatively less work done on recovering the specular albedo.

Face Model

Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

no code implementations6 May 2019 Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Unconstrained Facial Action Unit Detection via Latent Feature Domain

1 code implementation25 Mar 2019 Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Xuequan Lu, Lizhuang Ma

Due to the combination of source AU-related information and target AU-free information, the latent feature domain with transferred source label can be learned by maximizing the target-domain AU detection performance.

Action Unit Detection Domain Adaptation +2

Pluralistic Image Completion

1 code implementation CVPR 2019 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

In this paper, we present an approach for \textbf{pluralistic image completion} -- the task of generating multiple and diverse plausible solutions for image completion.

Image Inpainting

Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos

no code implementations1 Mar 2019 Bo Hu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan

Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions.

object-detection Object Detection +2

Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset

no code implementations21 Jan 2019 Guoxian Song, Jianfei Cai, Tat-Jen Cham, Jianmin Zheng, Juyong Zhang, Henry Fuchs

Teleconference or telepresence based on virtual reality (VR) headmount display (HMD) device is a very interesting and promising application since HMD can provide immersive feelings for users.

T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

1 code implementation ECCV 2018 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire.

Depth Estimation Depth Prediction +1

Conditional Adversarial Synthesis of 3D Facial Action Units

no code implementations21 Feb 2018 Zhilei Liu, Guoxian Song, Jianfei Cai, Tat-Jen Cham, Juyong Zhang

Employing deep learning-based approaches for fine-grained facial expression analysis, such as those involving the estimation of Action Unit (AU) intensities, is difficult due to the lack of a large-scale dataset of real faces with sufficiently diverse AU labels for training.

Data Augmentation Image Generation

A Generative Model for Depth-Based Robust 3D Facial Pose Tracking

no code implementations CVPR 2017 Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

We consider the problem of depth-based robust 3D facial pose tracking under unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Modality and Component Aware Feature Fusion For RGB-D Scene Classification

no code implementations CVPR 2016 Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

While convolutional neural networks (CNN) have been excellent for object recognition, the greater spatial variability in scene images typically meant that the standard full-image CNN features are suboptimal for scene classification.

General Classification Object Recognition +1

MMSS: Multi-Modal Sharable and Specific Feature Learning for RGB-D Object Recognition

no code implementations ICCV 2015 Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multi-modal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities.

Object Recognition

Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes

no code implementations10 Jul 2015 Hai X. Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic, Jianfei Cai, Tat-Jen Cham

We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user.

3D Reconstruction Face Model

Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo

no code implementations CVPR 2014 Di Xu, Qi Duan, Jianming Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham

As a result, our approach is robust, stable and is able to efficiently recover high quality of surface details even starting with a coarse MVS.

