Search Results for author: Jianfei Cai

Found 145 papers, 64 papers with code

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

2 code implementations19 Sep 2022 Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures.

Image Generation Quantization +1

Mutual Consistency Learning for Semi-supervised Medical Image Segmentation

2 code implementations21 Sep 2021 Yicheng Wu, ZongYuan Ge, Donghao Zhang, Minfeng Xu, Lei Zhang, Yong Xia, Jianfei Cai

In this paper, we propose a novel mutual consistency network (MC-Net+) to effectively exploit the unlabeled data for semi-supervised medical image segmentation.

Image Segmentation Segmentation +2

Scalable Vision Transformers with Hierarchical Pooling

2 code implementations ICCV 2021 Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai

However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.

Efficient ViTs

GMFlow: Learning Optical Flow via Global Matching

4 code implementations CVPR 2022 Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, DaCheng Tao

Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements.

Optical Flow Estimation regression

Unifying Flow, Stereo and Depth Estimation

1 code implementation10 Nov 2022 Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.

Optical Flow Estimation Stereo Depth Estimation +1

Pluralistic Image Completion

1 code implementation CVPR 2019 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

In this paper, we present an approach for \textbf{pluralistic image completion} -- the task of generating multiple and diverse plausible solutions for image completion.

Image Inpainting

3D Hand Shape and Pose Estimation from a Single RGB Image

2 code implementations CVPR 2019 Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image.

3D Hand Pose Estimation

Fast Vision Transformers with HiLo Attention

5 code implementations26 May 2022 Zizheng Pan, Jianfei Cai, Bohan Zhuang

Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group encodes low frequencies by performing global attention between the average-pooled low-frequency keys and values from each window and each query position in the input feature map.

Benchmarking Efficient ViTs +2

Semi-supervised Left Atrium Segmentation with Mutual Consistency Training

3 code implementations4 Mar 2021 Yicheng Wu, Minfeng Xu, ZongYuan Ge, Jianfei Cai, Lei Zhang

Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions.

Image Segmentation Left Atrium Segmentation +4

Stitchable Neural Networks

2 code implementations CVPR 2023 Zizheng Pan, Jianfei Cai, Bohan Zhuang

As each model family consists of pretrained models with diverse scales (e. g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime.

Image Classification

Auto-Encoding Scene Graphs for Image Captioning

2 code implementations CVPR 2019 Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.

Image Captioning Inductive Bias +1

MARLIN: Masked Autoencoder for facial video Representation LearnINg

1 code implementation CVPR 2023 Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).

Action Classification Attribute +9

Object-Compositional Neural Implicit Surfaces

1 code implementation20 Jul 2022 Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, Jianmin Zheng

This paper proposes a novel framework, ObjectSDF, to build an object-compositional neural implicit representation with high fidelity in 3D reconstruction and object representation.

3D Reconstruction Novel View Synthesis +1

T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

1 code implementation ECCV 2018 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire.

Depth Estimation Depth Prediction +1

Learning Object-Language Alignments for Open-Vocabulary Object Detection

1 code implementation27 Nov 2022 Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai

In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data.

Object object-detection +3

Explicit Correspondence Matching for Generalizable Neural Radiance Fields

1 code implementation24 Apr 2023 Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

The key to our approach lies in the explicitly modeled correspondence matching information, so as to provide the geometry prior to the prediction of NeRF color and density for volume rendering.

Novel View Synthesis

ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces

1 code implementation ICCV 2023 Qianyi Wu, Kaisiyuan Wang, Kejie Li, Jianmin Zheng, Jianfei Cai

Unlike traditional multi-view stereo approaches, the neural implicit surface-based methods leverage neural networks to represent 3D scenes as signed distance functions (SDFs).

3D Reconstruction Multi-View 3D Reconstruction +3

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

1 code implementation21 Mar 2022 Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input.

3D-Aware Image Synthesis Translation

RSG: A Simple but Effective Module for Learning Imbalanced Datasets

1 code implementation CVPR 2021 JianFeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu

Imbalanced datasets widely exist in practice and area great challenge for training deep neural models with agood generalization on infrequent classes.

Long-tail Learning

Mesa: A Memory-saving Training Framework for Transformers

3 code implementations22 Nov 2021 Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang

While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences.

Quantization

High-Resolution Optical Flow from 1D Attention and Correlation

1 code implementation ICCV 2021 Haofei Xu, Jiaolong Yang, Jianfei Cai, Juyong Zhang, Xin Tong

Optical flow is inherently a 2D search problem, and thus the computational complexity grows quadratically with respect to the search window, making large displacements matching infeasible for high-resolution images.

4k Optical Flow Estimation +1

The Spatially-Correlative Loss for Various Image Translation Tasks

2 code implementations CVPR 2021 Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation.

Self-Supervised Learning Translation

J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention

1 code implementation18 Mar 2020 Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma

Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively.

Action Unit Detection Face Alignment +1

Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features

1 code implementation ECCV 2018 Xu Yang, Hanwang Zhang, Jianfei Cai

By "agnostic", we mean that the feature is less likely biased to the classes of paired objects.

Object

Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

1 code implementation ECCV 2018 Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma

Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection.

Action Unit Detection Face Alignment +1

Less is More: Pay Less Attention in Vision Transformers

2 code implementations29 May 2021 Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.

Image Classification Instance Segmentation +3

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

1 code implementation21 Mar 2024 Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin

3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity.

Attribute Novel View Synthesis +1

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

1 code implementation21 Feb 2024 Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.

Image Generation

Generative Region-Language Pretraining for Open-Ended Object Detection

1 code implementation15 Mar 2024 Chuang Lin, Yi Jiang, Lizhen Qu, Zehuan Yuan, Jianfei Cai

To address it, we formulate object detection as a generative problem and propose a simple framework named GenerateU, which can detect dense objects and generate their names in a free-form way.

Language Modelling Object +3

End-to-end One-shot Human Parsing

1 code implementation4 May 2021 Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao

To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).

Human Parsing Metric Learning +1

EcoFormer: Energy-Saving Attention with Linear Complexity

1 code implementation19 Sep 2022 Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang

To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space.

Binarization

Alive Caricature from 2D to 3D

1 code implementation CVPR 2018 Qianyi Wu, Juyong Zhang, Yu-Kun Lai, Jianmin Zheng, Jianfei Cai

Caricature is an art form that expresses subjects in abstract, simple and exaggerated view.

Caricature

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations CVPR 2023 Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Semantic Segmentation

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

1 code implementation ICCV 2023 Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.

Disentangled Human Body Embedding Based on Deep Hierarchical Neural Network

1 code implementation14 May 2019 Boyi Jiang, Juyong Zhang, Jianfei Cai, Jianmin Zheng

Human bodies exhibit various shapes for different identities or poses, but the body shape has certain similarities in structure and thus can be embedded in a low-dimensional space.

Representation Learning

Sharpness-aware Quantization for Deep Neural Networks

3 code implementations24 Nov 2021 Jing Liu, Jianfei Cai, Bohan Zhuang

However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.

Image Classification Model Compression +1

Taming Stable Diffusion for Text to 360° Panorama Image Generation

1 code implementation11 Apr 2024 Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai

Generative models, e. g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts.

Denoising Image Generation

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

1 code implementation11 Sep 2017 Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen

On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem.

Image Captioning Sentence

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

1 code implementation26 Feb 2019 Haofei Xu, Jianmin Zheng, Jianfei Cai, Juyong Zhang

In this paper, we propose a new learning based method consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate depth from unconstrained monocular videos without ground truth supervision.

Depth Estimation

Unconstrained Facial Action Unit Detection via Latent Feature Domain

1 code implementation25 Mar 2019 Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Xuequan Lu, Lizhuang Ma

Due to the combination of source AU-related information and target AU-free information, the latent feature domain with transferred source label can be learned by maximizing the target-domain AU detection performance.

Action Unit Detection Domain Adaptation +2

Diversified and Personalized Multi-rater Medical Image Segmentation

1 code implementation20 Mar 2024 Yicheng Wu, Xiangde Luo, Zhe Xu, Xiaoqing Guo, Lie Ju, ZongYuan Ge, Wenjun Liao, Jianfei Cai

To address it, the common practice is to gather multiple annotations from different experts, leading to the setting of multi-rater medical image segmentation.

Image Segmentation Medical Image Segmentation +2

CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation

1 code implementation10 Jul 2023 Yicheng Wu, Zhonghua Wu, Hengcan Shi, Bjoern Picker, Winston Chong, Jianfei Cai

Moreover, a simple and effective relation regularization is proposed to ensure the longitudinal relations among the three outputs to improve the model learning.

Lesion Segmentation Segmentation

ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing

1 code implementation30 Sep 2022 Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, Junzhe Zhang

This paper studies the problem of learning the shape given in the form of point clouds by inverse sketch-and-extrude.

Learning Meta-class Memory for Few-Shot Semantic Segmentation

1 code implementation ICCV 2021 Zhonghua Wu, Xiangxi Shi, Guosheng Lin, Jianfei Cai

To explicitly learn meta-class representations in few-shot segmentation task, we propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer to novel classes during the inference stage.

Few-Shot Semantic Segmentation Segmentation +1

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

1 code implementation12 Apr 2021 Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai

In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.

Instance Segmentation Scene Understanding +1

Image Captioning In the Transformer Age

1 code implementation15 Apr 2022 Yang Xu, Li Li, Haiyang Xu, Songfang Huang, Fei Huang, Jianfei Cai

This drawback inspires the researchers to develop a homogeneous architecture that facilitates end-to-end training, for which Transformer is the perfect one that has proven its huge potential in both vision and language domains and thus can be used as the basic component of the visual encoder and language decoder in an IC pipeline.

Image Captioning Self-Supervised Learning

Stitched ViTs are Flexible Vision Backbones

1 code implementation30 Jun 2023 Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang

With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

2 code implementations12 Oct 2023 Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.

Quantization

Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

1 code implementation10 Nov 2021 Chuang Lin, Yi Jiang, Jianfei Cai, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan

Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position, which relies on the ongoing interactions with the environment during moving.

Navigate Vision and Language Navigation

Towards Unbiased Visual Emotion Recognition via Causal Intervention

1 code implementation26 Jul 2021 Yuedong Chen, Xu Yang, Tat-Jen Cham, Jianfei Cai

In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation.

Causal Inference Emotion Recognition

Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression

1 code implementation ICML 2018 Haitao Liu, Jianfei Cai, Yi Wang, Yew-Soon Ong

In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts.

Distributed Computing regression

Reliability-Adaptive Consistency Regularization for Weakly-Supervised Point Cloud Segmentation

1 code implementation9 Mar 2023 Zhonghua Wu, Yicheng Wu, Guosheng Lin, Jianfei Cai

Weakly-supervised point cloud segmentation with extremely limited labels is highly desirable to alleviate the expensive costs of collecting densely annotated 3D points.

Point Cloud Segmentation Segmentation +1

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

1 code implementation4 Oct 2022 Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai

This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning.

Image Captioning Sentence +2

Efficient Stitchable Task Adaptation

1 code implementation29 Nov 2023 Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang

In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.

Chatbot

Scalable Gaussian Process Classification with Additive Noise for Various Likelihoods

1 code implementation14 Sep 2019 Haitao Liu, Yew-Soon Ong, Ziwei Yu, Jianfei Cai, Xiaobo Shen

Gaussian process classification (GPC) provides a flexible and powerful statistical framework describing joint distributions over function space.

Classification General Classification +3

Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention

1 code implementation9 Jul 2019 Qingyi Tao, ZongYuan Ge, Jianfei Cai, Jianxiong Yin, Simon See

Secondly, in CT scans, the lesions are often indistinguishable from the background since the lesion and non-lesion areas may have very similar appearances.

Computed Tomography (CT) Lesion Detection +2

CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images

no code implementations3 Aug 2017 Yudong Guo, Juyong Zhang, Jianfei Cai, Boyi Jiang, Jianmin Zheng

With the powerfulness of convolution neural networks (CNN), CNN based face reconstruction has recently shown promising performance in reconstructing detailed face shape from 2D face images.

3D Face Reconstruction Face Model +1

Conditional Adversarial Synthesis of 3D Facial Action Units

no code implementations21 Feb 2018 Zhilei Liu, Guoxian Song, Jianfei Cai, Tat-Jen Cham, Juyong Zhang

Employing deep learning-based approaches for fine-grained facial expression analysis, such as those involving the estimation of Action Unit (AU) intensities, is difficult due to the lack of a large-scale dataset of real faces with sufficiently diverse AU labels for training.

Data Augmentation Image Generation

Unpaired Image Captioning by Language Pivoting

no code implementations ECCV 2018 Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description.

Image Captioning Sentence

Zero-Shot Learning via Category-Specific Visual-Semantic Mapping

no code implementations16 Nov 2017 Li Niu, Jianfei Cai, Ashok Veeraraghavan

Zero-Shot Learning (ZSL) aims to classify a test instance from an unseen category based on the training instances from seen categories, in which the gap between seen categories and unseen categories is generally bridged via visual-semantic mapping between the low-level visual feature space and the intermediate semantic space.

General Classification Image Classification +1

Recent Advances in Convolutional Neural Networks

no code implementations22 Dec 2015 Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen

In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing.

speech-recognition Speech Recognition

Exploiting Web Images for Weakly Supervised Object Detection

no code implementations27 Jul 2017 Qingyi Tao, Hao Yang, Jianfei Cai

Object detection without bounding box annotations, i. e, weakly supervised detection methods, are still lagging far behind.

Ranked #17 on Weakly Supervised Object Detection on PASCAL VOC 2012 test (using extra training data)

Object object-detection +2

Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition

no code implementations12 Jun 2017 Artsiom Ablavatski, Shijian Lu, Jianfei Cai

We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition.

Object Recognition

MIML-FCN+: Multi-instance Multi-label Learning via Fully Convolutional Networks with Privileged Information

no code implementations CVPR 2017 Hao Yang, Joey Tianyi Zhou, Jianfei Cai, Yew Soon Ong

As the proposed PI loss is convex and SGD compatible and the framework itself is a fully convolutional network, MIML-FCN+ can be easily integrated with state of-the-art deep learning networks.

Image Captioning Multi-Label Learning +1

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations

no code implementations4 Aug 2016 Hao Yang, Joey Tianyi Zhou, Jianfei Cai

Experimental results demonstrate the effectiveness of the proposed semantic descriptor and the usefulness of incorporating the structured semantic correlations.

Missing Labels Object Recognition

Diagnosing State-Of-The-Art Object Proposal Methods

no code implementations16 Jul 2015 Hongyuan Zhu, Shijian Lu, Jianfei Cai, Quangqing Lee

Recently, Hosang et al. conduct the first unified study of existing methods' in terms of various image-level degradations.

Object object-detection +1

Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes

no code implementations10 Jul 2015 Hai X. Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic, Jianfei Cai, Tat-Jen Cham

We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user.

3D Reconstruction Face Model

Weakly Supervised Fine-Grained Image Categorization

no code implementations20 Apr 2015 Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Jianfei Cai, Jiangbo Lu, Viet-Anh Nguyen, Minh N. Do

Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using object or object part annotations inside training images.

Fine-Grained Image Classification Image Categorization +1

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

no code implementations3 Feb 2015 Hongyuan Zhu, Fanman Meng, Jianfei Cai, Shijian Lu

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision.

Image Segmentation Segmentation +1

When Gaussian Process Meets Big Data: A Review of Scalable GPs

no code implementations3 Jul 2018 Haitao Liu, Yew-Soon Ong, Xiaobo Shen, Jianfei Cai

The review of scalable GPs in the GP community is timely and important due to the explosion of data size.

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

no code implementations8 Jul 2018 Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty

In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model.

Language Modelling Sentence +3

Facial Action Unit Detection Using Attention and Relation Learning

no code implementations10 Aug 2018 Zhiwen Shao, Zhilei Liu, Jianfei Cai, Yunsheng Wu, Lizhuang Ma

By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured.

Action Unit Detection Facial Action Unit Detection +1

Keypoint Based Weakly Supervised Human Parsing

no code implementations14 Sep 2018 Zhonghua Wu, Guosheng Lin, Jianfei Cai

We develop an iterative learning method to generate pseudo part segmentation masks from keypoint labels.

Human Parsing Segmentation +1

Large-scale Heteroscedastic Regression via Gaussian Process

no code implementations3 Nov 2018 Haitao Liu, Yew-Soon Ong, Jianfei Cai

To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets.

regression Variational Inference

Understanding and Comparing Scalable Gaussian Process Regression for Big Data

no code implementations3 Nov 2018 Haitao Liu, Jianfei Cai, Yew-Soon Ong, Yi Wang

This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness.

regression

M2E-Try On Net: Fashion from Model to Everyone

no code implementations21 Nov 2018 Zhonghua Wu, Guosheng Lin, Qingyi Tao, Jianfei Cai

Instead, we present a novel virtual Try-On network, M2E-Try On Net, which transfers the clothes from a model image to a person image without the need of any clean product images.

Virtual Try-on

Weakly-supervised 3D Hand Pose Estimation from Monocular RGB Images

no code implementations ECCV 2018 Yujun Cai, Liuhao Ge, Jianfei Cai, Junsong Yuan

Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to substantial depth ambiguity and the difficulty of obtaining fully-annotated training data.

3D Hand Pose Estimation

Quadtree Convolutional Neural Networks

no code implementations ECCV 2018 Pradeep Kumar Jayaraman, Jianhan Mei, Jianfei Cai, Jianmin Zheng

Specifically, the computational and memory costs in QCNN grow linearly in the number of non-zero pixels, as opposed to traditional CNNs where the costs are quadratic in the number of pixels.

Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset

no code implementations21 Jan 2019 Guoxian Song, Jianfei Cai, Tat-Jen Cham, Jianmin Zheng, Juyong Zhang, Henry Fuchs

Teleconference or telepresence based on virtual reality (VR) headmount display (HMD) device is a very interesting and promising application since HMD can provide immersive feelings for users.

Compact Representation for Image Classification: To Choose or to Compress?

no code implementations CVPR 2014 Yu Zhang, Jianxin Wu, Jianfei Cai

In spite of the popularity of various feature compression methods, this paper argues that feature selection is a better choice than feature compression.

Classification Dimensionality Reduction +5

Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo

no code implementations CVPR 2014 Di Xu, Qi Duan, Jianming Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham

As a result, our approach is robust, stable and is able to efficiently recover high quality of surface details even starting with a coarse MVS.

Modality and Component Aware Feature Fusion For RGB-D Scene Classification

no code implementations CVPR 2016 Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

While convolutional neural networks (CNN) have been excellent for object recognition, the greater spatial variability in scene images typically meant that the standard full-image CNN features are suboptimal for scene classification.

General Classification Object Recognition +1

A Generative Model for Depth-Based Robust 3D Facial Pose Tracking

no code implementations CVPR 2017 Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

We consider the problem of depth-based robust 3D facial pose tracking under unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Object Co-Skeletonization With Co-Segmentation

no code implementations CVPR 2017 Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan

Recent advances in the joint processing of images have certainly shown its advantages over the individual processing.

Object Segmentation

MMSS: Multi-Modal Sharable and Specific Feature Learning for RGB-D Object Recognition

no code implementations ICCV 2015 Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multi-modal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities.

Object Object Recognition

Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos

no code implementations1 Mar 2019 Bo Hu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan

Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions.

object-detection Object Detection +3

Scene Graph Generation with External Knowledge and Image Reconstruction

no code implementations CVPR 2019 Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling

Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc.

Graph Generation Image Reconstruction +6

Learning to Collocate Neural Modules for Image Captioning

no code implementations ICCV 2019 Xu Yang, Hanwang Zhang, Jianfei Cai

To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).

Image Captioning Sentence +2

Exploring Uncertainty Measures for Image-Caption Embedding-and-Retrieval Task

no code implementations9 Apr 2019 Kenta Hama, Takashi Matsubara, Kuniaki Uehara, Jianfei Cai

With the wide development of black-box machine learning algorithms, particularly deep neural network (DNN), the practical demand for the reliability assessment is rapidly rising.

General Classification regression +1

Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

no code implementations6 May 2019 Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Watch It Twice: Video Captioning with a Refocused Video Encoder

no code implementations21 Jul 2019 Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu

With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields.

Video Captioning

Recovering Facial Reflectance and Geometry from Multi-view Images

no code implementations27 Nov 2019 Guoxian Song, Jianmin Zheng, Jianfei Cai, Tat-Jen Cham

While the problem of estimating shapes and diffuse reflectances of human faces from images has been extensively studied, there is relatively less work done on recovering the specular albedo.

Face Model

Facial Action Unit Detection via Adaptive Attention and Relation

no code implementations5 Jan 2020 Zhiwen Shao, Yong Zhou, Jianfei Cai, Hancheng Zhu, Rui Yao

Specifically, we propose an adaptive attention regression network to regress the global attention map of each AU under the constraint of attention predefinition and the guidance of AU detection, which is beneficial for capturing both specified dependencies by landmarks in strongly correlated regions and facial globally distributed dependencies in weakly correlated regions.

Action Unit Detection Facial Action Unit Detection +2

GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition

no code implementations6 Mar 2020 Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Jianming Zheng

Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture.

Face Model Facial Action Unit Detection

Deconfounded Image Captioning: A Causal Retrospect

no code implementations9 Mar 2020 Xu Yang, Hanwang Zhang, Jianfei Cai

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community.

Causal Inference Image Captioning

Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly Supervised Object Detection

no code implementations CVPR 2020 Zhonghua Wu, Qingyi Tao, Guosheng Lin, Jianfei Cai

To reduce the human labeling effort, we propose a novel webly supervised object detection (WebSOD) method for novel classes which only requires the web images without further annotations.

Object object-detection +2

Image Co-skeletonization via Co-segmentation

no code implementations12 Apr 2020 Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan

Object skeletonization in a single natural image is a challenging problem because there is hardly any prior knowledge about the object.

Object Segmentation

Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation

no code implementations13 Jun 2020 Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li

Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue.

Graph Generation Object +2

Expert Training: Task Hardness Aware Meta-Learning for Few-Shot Classification

no code implementations13 Jul 2020 Yucan Zhou, Yu Wang, Jianfei Cai, Yu Zhou, QinGhua Hu, Weiping Wang

Some works in the optimization of deep neural networks have shown that a better arrangement of training data can make the classifier converge faster and perform better.

General Classification Meta-Learning

MED-TEX: Transferring and Explaining Knowledge with Less Data from Pretrained Medical Imaging Models

no code implementations6 Aug 2020 Thanh Nguyen-Duc, He Zhao, Jianfei Cai, Dinh Phung

To interpret the teacher model and assist the learning of the student, an explainer module is introduced to highlight the regions of an input that are important for the predictions of the teacher model.

Image Classification Knowledge Distillation +1

Modeling Caricature Expressions by 3D Blendshape and Dynamic Texture

no code implementations13 Aug 2020 Keyu Chen, Jianmin Zheng, Jianfei Cai, Juyong Zhang

The problem of deforming an artist-drawn caricature according to a given normal face expression is of interest in applications such as social media, animation and entertainment.

Caricature Generative Adversarial Network +1

Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations ECCV 2020 Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

Human motion prediction motion prediction

Splitting vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation

no code implementations ECCV 2020 Tianyi Zhang, Guosheng Lin, Weide Liu, Jianfei Cai, Alex Kot

Finally, by training the segmentation model with the masks generated by our Splitting vs Merging strategy, we achieve the state-of-the-art weakly-supervised segmentation results on the Pascal VOC 2012 benchmark.

Segmentation Weakly supervised segmentation +2

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

no code implementations ECCV 2020 Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.

Reinforcement Learning (RL)

Self-Supervised Relationship Probing

no code implementations NeurIPS 2020 Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun

Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.

Contrastive Learning Language Modelling +1

Single-path Bit Sharing for Automatic Loss-aware Model Compression

no code implementations13 Jan 2021 Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan

By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.

Model Compression Network Pruning +1

Causal Attention for Vision-Language Tasks

no code implementations CVPR 2021 Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai

Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.

Multi-Label Image Classification with Contrastive Learning

no code implementations24 Jul 2021 Son D. Dao, Ethan Zhao, Dinh Phung, Jianfei Cai

Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains.

Classification Contrastive Learning +2

Remember What You have drawn: Semantic Image Manipulation with Memory

no code implementations27 Jul 2021 Xiangxi Shi, Zhonghua Wu, Guosheng Lin, Jianfei Cai, Shafiq Joty

Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description.

Image Manipulation

Auto-Parsing Network for Image Captioning and Visual Question Answering

no code implementations ICCV 2021 Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.

Image Captioning Question Answering +1

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

no code implementations ICCV 2021 Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, Xiaohui Shen, Ding Liu, Nadia Magnenat Thalmann

Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series.

motion prediction Motion Synthesis

Domain-Invariant Disentangled Network for Generalizable Object Detection

no code implementations ICCV 2021 Chuang Lin, Zehuan Yuan, Sicheng Zhao, Peize Sun, Changhu Wang, Jianfei Cai

By disentangling representations on both image and instance levels, DIDN is able to learn domain-invariant representations that are suitable for generalized object detection.

Disentanglement Domain Generalization +4

Contrastively Enforcing Distinctiveness for Multi-Label Classification

no code implementations29 Sep 2021 Son Duy Dao, He Zhao, Dinh Phung, Jianfei Cai

Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains.

Classification Contrastive Learning +2

ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

no code implementations CVPR 2022 Hengcan Shi, Munawar Hayat, Yicheng Wu, Jianfei Cai

Firstly, we analyze CLIP for unsupervised open-category proposal generation and design an objectness score based on our empirical analysis on proposal selection.

Object object-detection +2

Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching

no code implementations18 Jan 2022 Hengcan Shi, Munawar Hayat, Jianfei Cai

To avoid the laborious annotation in conventional referring grounding, unpaired referring grounding is introduced, where the training data only contains a number of images and queries without correspondences.

Image-text matching Referring Expression +1

High-Quality Pluralistic Image Completion via Code Shared VQGAN

no code implementations5 Apr 2022 Chuanxia Zheng, Guoxian Song, Tat-Jen Cham, Jianfei Cai, Dinh Phung, Linjie Luo

In this work, we present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.

Image Reconstruction Vocal Bursts Intensity Prediction

Transformer Scale Gate for Semantic Segmentation

no code implementations CVPR 2023 Hengcan Shi, Munawar Hayat, Jianfei Cai

Effectively encoding multi-scale contextual information is crucial for accurate semantic segmentation.

feature selection Segmentation +1

Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation

no code implementations19 Jul 2022 Zhonghua Wu, Yicheng Wu, Guosheng Lin, Jianfei Cai, Chen Qian

Weakly supervised point cloud segmentation, i. e. semantically segmenting a point cloud with only a few labeled points in the whole 3D scene, is highly desirable due to the heavy burden of collecting abundant dense annotations for the model training.

Point Cloud Segmentation Segmentation

FocusFormer: Focusing on What We Need via Architecture Sampler

no code implementations23 Aug 2022 Jing Liu, Jianfei Cai, Bohan Zhuang

During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment.

Neural Architecture Search

JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking

no code implementations CVPR 2023 Edward Vendrow, Duy Tho Le, Jianfei Cai, Hamid Rezatofighi

In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking.

Multi-Person Pose Estimation Multi-Person Pose Estimation and Tracking +1

Vector Quantized Wasserstein Auto-Encoder

no code implementations12 Feb 2023 Tung-Long Vuong, Trung Le, He Zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung

Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks.

Clustering Image Reconstruction

Open-Vocabulary Object Detection via Scene Graph Discovery

no code implementations7 Jul 2023 Hengcan Shi, Munawar Hayat, Jianfei Cai

However, they only use pairs of nouns and individual objects in VL data, while these data usually contain much more information, such as scene graphs, which are also crucial for OV detection.

Graph Generation Object +5

Cross-adversarial local distribution regularization for semi-supervised medical image segmentation

no code implementations2 Oct 2023 Thanh Nguyen-Duc, Trung Le, Roland Bammer, He Zhao, Jianfei Cai, Dinh Phung

Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data.

Image Segmentation Segmentation +2

GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering

no code implementations4 Feb 2024 Ziyu Ma, Shutao Li, Bin Sun, Jianfei Cai, Zuxiang Long, Fuyan Ma

Therefore, we propose GeReA, a generate-reason framework that prompts a MLLM like InstructBLIP with question relevant vision and language information to generate knowledge-relevant descriptions and reasons those descriptions for knowledge-based VQA.

Language Modelling Large Language Model +3

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

no code implementations2 Apr 2024 Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi

JRDB-PanoTrack includes (1) various data involving indoor and outdoor crowded scenes, as well as comprehensive 2D and 3D synchronized data modalities; (2) high-quality 2D spatial panoptic segmentation and temporal tracking annotations, with additional 3D label projections for further spatial understanding; (3) diverse object classes for closed- and open-world recognition benchmarks, with OSPA-based metrics for evaluation.

Decision Making Panoptic Segmentation +1

DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

no code implementations6 Apr 2024 Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi

Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains.

3D Object Detection Denoising +2

Cannot find the paper you are looking for? You can Submit a new open access paper.