Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations ECCV 2020 Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

Human motion prediction motion prediction

Splitting vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation

no code implementations ECCV 2020 Tianyi Zhang, Guosheng Lin, Weide Liu, Jianfei Cai, Alex Kot

Finally, by training the segmentation model with the masks generated by our Splitting vs Merging strategy, we achieve the state-of-the-art weakly-supervised segmentation results on the Pascal VOC 2012 benchmark.

Weakly supervised segmentation Weakly-Supervised Semantic Segmentation

Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching

no code implementations18 Jan 2022 Hengcan Shi, Munawar Hayat, Jianfei Cai

To avoid the laborious annotation in conventional referring grounding, unpaired referring grounding is introduced, where the training data only contains a number of images and queries without correspondences.

ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

no code implementations18 Jan 2022 Hengcan Shi, Munawar Hayat, Yicheng Wu, Jianfei Cai

Firstly, we analyze CLIP for unsupervised open-category proposal generation and design an objectness score based on our empirical analysis on proposal selection.

GMFlow: Learning Optical Flow via Global Matching

1 code implementation26 Nov 2021 Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, DaCheng Tao

Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements.

Optical Flow Estimation

Sharpness-aware Quantization for Deep Neural Networks

1 code implementation24 Nov 2021 Jing Liu, Jianfei Cai, Bohan Zhuang

Network quantization is an effective compression method to reduce the model size and computational cost.


Pruning Self-attentions into Convolutional Layers in Single Path

1 code implementation23 Nov 2021 Haoyu He, Jing Liu, Zizheng Pan, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks.

Neural Architecture Search

Mesa: A Memory-saving Training Framework for Transformers

1 code implementation22 Nov 2021 Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang

Specifically, Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.


Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

no code implementations10 Nov 2021 Chuang Lin, Yi Jiang, Jianfei Cai, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan

Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position, which relies on the ongoing interactions with the environment during moving.

Vision and Language Navigation

Enforcing Mutual Consistency of Hard Regions for Semi-supervised Medical Image Segmentation

1 code implementation21 Sep 2021 Yicheng Wu, ZongYuan Ge, Donghao Zhang, Minfeng Xu, Lei Zhang, Yong Xia, Jianfei Cai

In this paper, we proposed a novel mutual consistency network (MC-Net+) to effectively exploit the unlabeled hard regions for semi-supervised medical image segmentation.

Medical Image Segmentation

Auto-Parsing Network for Image Captioning and Visual Question Answering

no code implementations ICCV 2021 Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.

Image Captioning Question Answering +1

Learning Meta-class Memory for Few-Shot Semantic Segmentation

1 code implementation ICCV 2021 Zhonghua Wu, Xiangxi Shi, Guosheng Lin, Jianfei Cai

To explicitly learn meta-class representations in few-shot segmentation task, we propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer to novel classes during the inference stage.

Few-Shot Semantic Segmentation Semantic Segmentation

Rapid Elastic Architecture Search under Specialized Classes and Resource Constraints

no code implementations3 Aug 2021 Jing Liu, Bohan Zhuang, Mingkui Tan, Xu Liu, Dinh Phung, Yuanqing Li, Jianfei Cai

In many real-world applications, we often need to handle various deployment scenarios, where the resource constraint and the superclass of interest corresponding to a group of classes are dynamically specified.

Image Classification

Remember What You have drawn: Semantic Image Manipulation with Memory

no code implementations27 Jul 2021 Xiangxi Shi, Zhonghua Wu, Guosheng Lin, Jianfei Cai, Shafiq Joty

Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description.

Image Manipulation

Towards Unbiased Visual Emotion Recognition via Causal Intervention

no code implementations26 Jul 2021 Yuedong Chen, Xu Yang, Tat-Jen Cham, Jianfei Cai

Although much progress has been made in visual emotion recognition, researchers have realized that modern deep networks tend to exploit dataset characteristics to learn spurious statistical associations between the input and the target.

Causal Inference Emotion Recognition

Multi-Label Image Classification with Contrastive Learning

no code implementations24 Jul 2021 Son D. Dao, Ethan Zhao, Dinh Phung, Jianfei Cai

Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains.

Contrastive Learning Multi-Label Classification +1

RSG: A Simple but Effective Module for Learning Imbalanced Datasets

1 code implementation CVPR 2021 JianFeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu

Imbalanced datasets widely exist in practice and area great challenge for training deep neural models with agood generalization on infrequent classes.

Long-tail Learning

Less is More: Pay Less Attention in Vision Transformers

2 code implementations29 May 2021 Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.

Image Classification Instance Segmentation +2

End-to-end One-shot Human Parsing

1 code implementation4 May 2021 Haoyu He, Jing Zhang, Bohan Zhuang, Jianfei Cai, DaCheng Tao

Previous human parsing models are limited to parsing humans into pre-defined classes, which is inflexible for applications that need to handle new classes.

Human Parsing Metric Learning +1

High-Resolution Optical Flow from 1D Attention and Correlation

1 code implementation ICCV 2021 Haofei Xu, Jiaolong Yang, Jianfei Cai, Juyong Zhang, Xin Tong

Optical flow is inherently a 2D search problem, and thus the computational complexity grows quadratically with respect to the search window, making large displacements matching infeasible for high-resolution images.

Optical Flow Estimation

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

1 code implementation12 Apr 2021 Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai

In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.

Instance Segmentation Scene Understanding +1

The Spatially-Correlative Loss for Various Image Translation Tasks

1 code implementation CVPR 2021 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation.

Self-Supervised Learning Translation

Bridging Global Context Interactions for High-Fidelity Image Completion

1 code implementation2 Apr 2021 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Dinh Phung

Bridging global context interactions correctly is important for high-fidelity image completion with large masks.

Scalable Vision Transformers with Hierarchical Pooling

2 code implementations ICCV 2021 Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai

However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.

Image Classification

Causal Attention for Vision-Language Tasks

no code implementations CVPR 2021 Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai

Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.

Semi-supervised Left Atrium Segmentation with Mutual Consistency Training

2 code implementations4 Mar 2021 Yicheng Wu, Minfeng Xu, ZongYuan Ge, Jianfei Cai, Lei Zhang

Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions.

Left Atrium Segmentation Medical Image Segmentation

LBS: Loss-aware Bit Sharing for Automatic Model Compression

no code implementations13 Jan 2021 Jing Liu, Bohan Zhuang, Peng Chen, Yong Guo, Chunhua Shen, Jianfei Cai, Mingkui Tan

Low-bitwidth model compression is an effective method to reduce the model size and computational overhead.

Model Compression Quantization

Domain-Invariant Disentangled Network for Generalizable Object Detection

no code implementations ICCV 2021 Chuang Lin, Zehuan Yuan, Sicheng Zhao, Peize Sun, Changhu Wang, Jianfei Cai

By disentangling representations on both image and instance levels, DIDN is able to learn domain-invariant representations that are suitable for generalized object detection.

Domain Generalization Image Classification +1

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

no code implementations ICCV 2021 Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, Xiaohui Shen, Ding Liu, Nadia Magnenat Thalmann

Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series.

motion prediction motion synthesis

Self-Supervised Relationship Probing

no code implementations NeurIPS 2020 Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun

Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.

Contrastive Learning Language Modelling

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

no code implementations ECCV 2020 Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.

Modeling Caricature Expressions by 3D Blendshape and Dynamic Texture

no code implementations13 Aug 2020 Keyu Chen, Jianmin Zheng, Jianfei Cai, Juyong Zhang

The problem of deforming an artist-drawn caricature according to a given normal face expression is of interest in applications such as social media, animation and entertainment.

Caricature Texture Synthesis

MED-TEX: Transferring and Explaining Knowledge with Less Data from Pretrained Medical Imaging Models

no code implementations6 Aug 2020 Thanh Nguyen-Duc, He Zhao, Jianfei Cai, Dinh Phung

To interpret the teacher model and assist the learning of the student, an explainer module is introduced to highlight the regions of an input that are important for the predictions of the teacher model.

Knowledge Distillation

Expert Training: Task Hardness Aware Meta-Learning for Few-Shot Classification

no code implementations13 Jul 2020 Yucan Zhou, Yu Wang, Jianfei Cai, Yu Zhou, QinGhua Hu, Weiping Wang

Some works in the optimization of deep neural networks have shown that a better arrangement of training data can make the classifier converge faster and perform better.

General Classification Meta-Learning

Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation

no code implementations13 Jun 2020 Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li

Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue.

Graph Generation Scene Graph Generation +1

Image Co-skeletonization via Co-segmentation

no code implementations12 Apr 2020 Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan

Object skeletonization in a single natural image is a challenging problem because there is hardly any prior knowledge about the object.

Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly Supervised Object Detection

no code implementations CVPR 2020 Zhonghua Wu, Qingyi Tao, Guosheng Lin, Jianfei Cai

To reduce the human labeling effort, we propose a novel webly supervised object detection (WebSOD) method for novel classes which only requires the web images without further annotations.

Object Detection Transfer Learning

J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention

1 code implementation18 Mar 2020 Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma

Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively.

Action Unit Detection Face Alignment +1

Deconfounded Image Captioning: A Causal Retrospect

no code implementations9 Mar 2020 Xu Yang, Hanwang Zhang, Jianfei Cai

The dataset bias in vision-language tasks is becoming one of the main problems that hinder the progress of our community.

Causal Inference Image Captioning

GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition

no code implementations6 Mar 2020 Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Jianming Zheng

Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture.

Face Model Facial Action Unit Detection

Spatio-Temporal Relation and Attention Learning for Facial Action Unit Detection

no code implementations5 Jan 2020 Zhiwen Shao, Lixin Zou, Jianfei Cai, Yunsheng Wu, Lizhuang Ma

Specifically, we introduce a spatio-temporal graph convolutional network to capture both spatial and temporal relations from dynamic AUs, in which the AU relations are formulated as a spatio-temporal graph with adaptively learned instead of predefined edge weights.

Action Unit Detection Facial Action Unit Detection +1

Recovering Facial Reflectance and Geometry from Multi-view Images

no code implementations27 Nov 2019 Guoxian Song, Jianmin Zheng, Jianfei Cai, Tat-Jen Cham

While the problem of estimating shapes and diffuse reflectances of human faces from images has been extensively studied, there is relatively less work done on recovering the specular albedo.

Face Model

Scalable Gaussian Process Classification with Additive Noise for Various Likelihoods

1 code implementation14 Sep 2019 Haitao Liu, Yew-Soon Ong, Ziwei Yu, Jianfei Cai, Xiaobo Shen

Gaussian process classification (GPC) provides a flexible and powerful statistical framework describing joint distributions over function space.

General Classification GPR +2

Watch It Twice: Video Captioning with a Refocused Video Encoder

no code implementations21 Jul 2019 Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu

With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields.

Video Captioning

Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention

1 code implementation9 Jul 2019 Qingyi Tao, ZongYuan Ge, Jianfei Cai, Jianxiong Yin, Simon See

Secondly, in CT scans, the lesions are often indistinguishable from the background since the lesion and non-lesion areas may have very similar appearances.

Computed Tomography (CT) Object Detection

Disentangled Human Body Embedding Based on Deep Hierarchical Neural Network

1 code implementation14 May 2019 Boyi Jiang, Juyong Zhang, Jianfei Cai, Jianmin Zheng

Human bodies exhibit various shapes for different identities or poses, but the body shape has certain similarities in structure and thus can be embedded in a low-dimensional space.

Representation Learning

Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

no code implementations6 May 2019 Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Learning to Collocate Neural Modules for Image Captioning

no code implementations ICCV 2019 Xu Yang, Hanwang Zhang, Jianfei Cai

To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).

Image Captioning Visual Question Answering +1

Exploring Uncertainty Measures for Image-Caption Embedding-and-Retrieval Task

no code implementations9 Apr 2019 Kenta Hama, Takashi Matsubara, Kuniaki Uehara, Jianfei Cai

With the wide development of black-box machine learning algorithms, particularly deep neural network (DNN), the practical demand for the reliability assessment is rapidly rising.

General Classification

Scene Graph Generation with External Knowledge and Image Reconstruction

no code implementations CVPR 2019 Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling

Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc.

Graph Generation Image Reconstruction +3

Unconstrained Facial Action Unit Detection via Latent Feature Domain

1 code implementation25 Mar 2019 Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Xuequan Lu, Lizhuang Ma

Due to the combination of source AU-related information and target AU-free information, the latent feature domain with transferred source label can be learned by maximizing the target-domain AU detection performance.

Action Unit Detection Domain Adaptation +2

Pluralistic Image Completion

1 code implementation CVPR 2019 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

In this paper, we present an approach for \textbf{pluralistic image completion} -- the task of generating multiple and diverse plausible solutions for image completion.

Image Inpainting

3D Hand Shape and Pose Estimation from a Single RGB Image

2 code implementations CVPR 2019 Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image.

3D Hand Pose Estimation

Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos

no code implementations1 Mar 2019 Bo Hu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan

Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions.

Object Detection Temporal Action Localization

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

1 code implementation26 Feb 2019 Haofei Xu, Jianmin Zheng, Jianfei Cai, Juyong Zhang

In this paper, we propose a new learning based method consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate depth from unconstrained monocular videos without ground truth supervision.

Depth Estimation

Facial Motion Prior Networks for Facial Expression Recognition

4 code implementations23 Feb 2019 Yuedong Chen, Jian-Feng Wang, Shikai Chen, Zhongchao shi, Jianfei Cai

Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years.

Facial Expression Recognition

Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset

no code implementations21 Jan 2019 Guoxian Song, Jianfei Cai, Tat-Jen Cham, Jianmin Zheng, Juyong Zhang, Henry Fuchs

Teleconference or telepresence based on virtual reality (VR) headmount display (HMD) device is a very interesting and promising application since HMD can provide immersive feelings for users.

Auto-Encoding Scene Graphs for Image Captioning

1 code implementation CVPR 2019 Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.

Image Captioning

M2E-Try On Net: Fashion from Model to Everyone

no code implementations21 Nov 2018 Zhonghua Wu, Guosheng Lin, Qingyi Tao, Jianfei Cai

Instead, we present a novel virtual Try-On network, M2E-Try On Net, which transfers the clothes from a model image to a person image without the need of any clean product images.

Virtual Try-on

Understanding and Comparing Scalable Gaussian Process Regression for Big Data

no code implementations3 Nov 2018 Haitao Liu, Jianfei Cai, Yew-Soon Ong, Yi Wang

This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness.

Large-scale Heteroscedastic Regression via Gaussian Process

no code implementations3 Nov 2018 Haitao Liu, Yew-Soon Ong, Jianfei Cai

To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets.

Variational Inference

Keypoint Based Weakly Supervised Human Parsing

no code implementations14 Sep 2018 Zhonghua Wu, Guosheng Lin, Jianfei Cai

We develop an iterative learning method to generate pseudo part segmentation masks from keypoint labels.

Human Parsing Semantic Segmentation

Weakly-supervised 3D Hand Pose Estimation from Monocular RGB Images

no code implementations ECCV 2018 Yujun Cai, Liuhao Ge, Jianfei Cai, Junsong Yuan

Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to substantial depth ambiguity and the difficulty of obtaining fully-annotated training data.

3D Hand Pose Estimation

Quadtree Convolutional Neural Networks

no code implementations ECCV 2018 Pradeep Kumar Jayaraman, Jianhan Mei, Jianfei Cai, Jianmin Zheng

Specifically, the computational and memory costs in QCNN grow linearly in the number of non-zero pixels, as opposed to traditional CNNs where the costs are quadratic in the number of pixels.

Facial Action Unit Detection Using Attention and Relation Learning

no code implementations10 Aug 2018 Zhiwen Shao, Zhilei Liu, Jianfei Cai, Yunsheng Wu, Lizhuang Ma

By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured.

Action Unit Detection Facial Action Unit Detection

T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

1 code implementation ECCV 2018 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire.

Depth Estimation Translation

Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features

1 code implementation ECCV 2018 Xu Yang, Hanwang Zhang, Jianfei Cai

By "agnostic", we mean that the feature is less likely biased to the classes of paired objects.

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

no code implementations8 Jul 2018 Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty

In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model.

Language Modelling Text Generation +2

When Gaussian Process Meets Big Data: A Review of Scalable GPs

no code implementations3 Jul 2018 Haitao Liu, Yew-Soon Ong, Xiaobo Shen, Jianfei Cai

The review of scalable GPs in the GP community is timely and important due to the explosion of data size.

Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression

1 code implementation ICML 2018 Haitao Liu, Jianfei Cai, Yi Wang, Yew-Soon Ong

In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts.

Distributed Computing

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

no code implementations ECCV 2018 Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, Jiebo Luo

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations.

Multi-Task Learning Question Answering +1

Alive Caricature from 2D to 3D

1 code implementation CVPR 2018 Qianyi Wu, Juyong Zhang, Yu-Kun Lai, Jianmin Zheng, Jianfei Cai

Caricature is an art form that expresses subjects in abstract, simple and exaggerated view.


Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

1 code implementation ECCV 2018 Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma

Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection.

Action Unit Detection Face Alignment +1

Unpaired Image Captioning by Language Pivoting

no code implementations ECCV 2018 Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description.

Image Captioning

Conditional Adversarial Synthesis of 3D Facial Action Units

no code implementations21 Feb 2018 Zhilei Liu, Guoxian Song, Jianfei Cai, Tat-Jen Cham, Juyong Zhang

Employing deep learning-based approaches for fine-grained facial expression analysis, such as those involving the estimation of Action Unit (AU) intensities, is difficult due to the lack of a large-scale dataset of real faces with sufficiently diverse AU labels for training.

Data Augmentation Image Generation

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

no code implementations CVPR 2018 Jiuxiang Gu, Jianfei Cai, Shafiq Joty, Li Niu, Gang Wang

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities.

Cross-Modal Retrieval

Zero-Shot Learning via Category-Specific Visual-Semantic Mapping

no code implementations16 Nov 2017 Li Niu, Jianfei Cai, Ashok Veeraraghavan

Zero-Shot Learning (ZSL) aims to classify a test instance from an unseen category based on the training instances from seen categories, in which the gap between seen categories and unseen categories is generally bridged via visual-semantic mapping between the low-level visual feature space and the intermediate semantic space.

General Classification Image Classification +1

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

1 code implementation11 Sep 2017 Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen

On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem.

Image Captioning

CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images

no code implementations3 Aug 2017 Yudong Guo, Juyong Zhang, Jianfei Cai, Boyi Jiang, Jianmin Zheng

With the powerfulness of convolution neural networks (CNN), CNN based face reconstruction has recently shown promising performance in reconstructing detailed face shape from 2D face images.

3D Face Reconstruction Face Model

Exploiting Web Images for Weakly Supervised Object Detection

no code implementations27 Jul 2017 Qingyi Tao, Hao Yang, Jianfei Cai

Object detection without bounding box annotations, i. e, weakly supervised detection methods, are still lagging far behind.

Ranked #15 on Weakly Supervised Object Detection on PASCAL VOC 2012 test (using extra training data)

Curriculum Learning Transfer Learning +1

A Generative Model for Depth-Based Robust 3D Facial Pose Tracking

no code implementations CVPR 2017 Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

We consider the problem of depth-based robust 3D facial pose tracking under unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Object Co-Skeletonization With Co-Segmentation

no code implementations CVPR 2017 Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan

Recent advances in the joint processing of images have certainly shown its advantages over the individual processing.

Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition

no code implementations12 Jun 2017 Artsiom Ablavatski, Shijian Lu, Jianfei Cai

We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition.

Object Recognition

WordFence: Text Detection in Natural Images with Border Awareness

no code implementations15 May 2017 Andrei Polzounov, Artsiom Ablavatski, Sergio Escalera, Shijian Lu, Jianfei Cai

In recent years, text recognition has achieved remarkable success in recognizing scanned document text.

Semantic Segmentation

MIML-FCN+: Multi-instance Multi-label Learning via Fully Convolutional Networks with Privileged Information

no code implementations CVPR 2017 Hao Yang, Joey Tianyi Zhou, Jianfei Cai, Yew Soon Ong

As the proposed PI loss is convex and SGD compatible and the framework itself is a fully convolutional network, MIML-FCN+ can be easily integrated with state of-the-art deep learning networks.

Image Captioning Multi-Label Learning +1

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations

no code implementations4 Aug 2016 Hao Yang, Joey Tianyi Zhou, Jianfei Cai

Experimental results demonstrate the effectiveness of the proposed semantic descriptor and the usefulness of incorporating the structured semantic correlations.

Multi-Label Learning Object Recognition

Modality and Component Aware Feature Fusion For RGB-D Scene Classification

no code implementations CVPR 2016 Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

While convolutional neural networks (CNN) have been excellent for object recognition, the greater spatial variability in scene images typically meant that the standard full-image CNN features are suboptimal for scene classification.

General Classification Object Recognition +1

Recent Advances in Convolutional Neural Networks

no code implementations22 Dec 2015 Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen

In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing.

Speech Recognition

MMSS: Multi-Modal Sharable and Specific Feature Learning for RGB-D Object Recognition

no code implementations ICCV 2015 Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multi-modal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities.

Object Recognition

Diagnosing State-Of-The-Art Object Proposal Methods

no code implementations16 Jul 2015 Hongyuan Zhu, Shijian Lu, Jianfei Cai, Quangqing Lee

Recently, Hosang et al. conduct the first unified study of existing methods' in terms of various image-level degradations.

Object Detection

Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes

no code implementations10 Jul 2015 Hai X. Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic, Jianfei Cai, Tat-Jen Cham

We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user.

3D Reconstruction Face Model

Weakly Supervised Fine-Grained Image Categorization

no code implementations20 Apr 2015 Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Jianfei Cai, Jiangbo Lu, Viet-Anh Nguyen, Minh N. Do

Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using object or object part annotations inside training images.

Fine-Grained Image Classification Image Categorization

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

no code implementations3 Feb 2015 Hongyuan Zhu, Fanman Meng, Jianfei Cai, Shijian Lu

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision.

Semantic Segmentation

Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo

no code implementations CVPR 2014 Di Xu, Qi Duan, Jianming Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham

As a result, our approach is robust, stable and is able to efficiently recover high quality of surface details even starting with a coarse MVS.

Compact Representation for Image Classification: To Choose or to Compress?

no code implementations CVPR 2014 Yu Zhang, Jianxin Wu, Jianfei Cai

In spite of the popularity of various feature compression methods, this paper argues that feature selection is a better choice than feature compression.

Dimensionality Reduction Feature Selection +3

