Search Results for author: Jianfei Cai

Found 145 papers, 64 papers with code

MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation

2 code implementations • 19 Sep 2022 • Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures.

Image Generation Quantization +1

2,686

Paper
Code

Mutual Consistency Learning for Semi-supervised Medical Image Segmentation

2 code implementations • 21 Sep 2021 • Yicheng Wu, ZongYuan Ge, Donghao Zhang, Minfeng Xu, Lei Zhang, Yong Xia, Jianfei Cai

In this paper, we propose a novel mutual consistency network (MC-Net+) to effectively exploit the unlabeled data for semi-supervised medical image segmentation.

Image Segmentation Segmentation +2

1,980

Paper
Code

Exploring Smoothness and Class-Separation for Semi-supervised Medical Image Segmentation

2 code implementations • 2 Mar 2022 • Yicheng Wu, Zhonghua Wu, Qianyi Wu, ZongYuan Ge, Jianfei Cai

The pixel-level smoothness forces the model to generate invariant results under adversarial perturbations.

Image Segmentation Semantic Segmentation +1

1,980

Paper
Code

Scalable Vision Transformers with Hierarchical Pooling

2 code implementations • ICCV 2021 • Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai

However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.

Ranked #22 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs

1,183

Paper
Code

GMFlow: Learning Optical Flow via Global Matching

4 code implementations • CVPR 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, DaCheng Tao

Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements.

Ranked #8 on Optical Flow Estimation on Spring

Optical Flow Estimation regression

880

Paper
Code

Unifying Flow, Stereo and Depth Estimation

1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.

Ranked #1 on Optical Flow Estimation on Sintel-clean

Optical Flow Estimation Stereo Depth Estimation +1

880

Paper
Code

Pluralistic Image Completion

1 code implementation • CVPR 2019 • Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

In this paper, we present an approach for \textbf{pluralistic image completion} -- the task of generating multiple and diverse plausible solutions for image completion.

Image Inpainting

662

Paper
Code

3D Hand Shape and Pose Estimation from a Single RGB Image

2 code implementations • CVPR 2019 • Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan

This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image.

3D Hand Pose Estimation

585

Paper
Code

Fast Vision Transformers with HiLo Attention

5 code implementations • 26 May 2022 • Zizheng Pan, Jianfei Cai, Bohan Zhuang

Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group encodes low frequencies by performing global attention between the average-pooled low-frequency keys and values from each window and each query position in the input feature map.

Ranked #281 on Image Classification on ImageNet

Benchmarking Efficient ViTs +2

565

Paper
Code

MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images

1 code implementation • 21 Mar 2024 • Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai

We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images.

Ranked #1 on Generalizable Novel View Synthesis on ACID

3D Reconstruction Generalizable Novel View Synthesis +2

330

Paper
Code

Semi-supervised Left Atrium Segmentation with Mutual Consistency Training

3 code implementations • 4 Mar 2021 • Yicheng Wu, Minfeng Xu, ZongYuan Ge, Jianfei Cai, Lei Zhang

Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions.

Image Segmentation Left Atrium Segmentation +4

272

Paper
Code

Stitchable Neural Networks

2 code implementations • CVPR 2023 • Zizheng Pan, Jianfei Cai, Bohan Zhuang

As each model family consists of pretrained models with diverse scales (e. g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime.

Image Classification

254

Paper
Code

Auto-Encoding Scene Graphs for Image Captioning

2 code implementations • CVPR 2019 • Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.

Image Captioning Inductive Bias +1

218

Paper
Code

MARLIN: Masked Autoencoder for facial video Representation LearnINg

1 code implementation • CVPR 2023 • Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).

Ranked #1 on Emotion Classification on CMU-MOSEI

Action Classification Attribute +9

185

Paper
Code

Object-Compositional Neural Implicit Surfaces

1 code implementation • 20 Jul 2022 • Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, Jianmin Zheng

This paper proposes a novel framework, ObjectSDF, to build an object-compositional neural implicit representation with high fidelity in 3D reconstruction and object representation.

3D Reconstruction Novel View Synthesis +1

181

Paper
Code

T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

1 code implementation • ECCV 2018 • Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire.

Ranked #3 on Depth Estimation on eBDtheque

Depth Estimation Depth Prediction +1

178

Paper
Code

Learning Object-Language Alignments for Open-Vocabulary Object Detection

1 code implementation • 27 Nov 2022 • Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai

In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data.

Object object-detection +3

168

Paper
Code

Explicit Correspondence Matching for Generalizable Neural Radiance Fields

1 code implementation • 24 Apr 2023 • Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

The key to our approach lies in the explicitly modeled correspondence matching information, so as to provide the geometry prior to the prediction of NeRF color and density for volume rendering.

Novel View Synthesis

159

Paper
Code

Bridging Global Context Interactions for High-Fidelity Image Completion

1 code implementation • CVPR 2022 • Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Dinh Phung

Bridging global context interactions correctly is important for high-fidelity image completion with large masks.

Ranked #2 on Image Inpainting on FFHQ 512 x 512

Image Inpainting Vocal Bursts Intensity Prediction

154

Paper
Code

ObjectSDF++: Improved Object-Compositional Neural Implicit Surfaces

1 code implementation • ICCV 2023 • Qianyi Wu, Kaisiyuan Wang, Kejie Li, Jianmin Zheng, Jianfei Cai

Unlike traditional multi-view stereo approaches, the neural implicit surface-based methods leverage neural networks to represent 3D scenes as signed distance functions (SDFs).

3D Reconstruction Multi-View 3D Reconstruction +3

135

Paper
Code

Sem2NeRF: Converting Single-View Semantic Masks to Neural Radiance Fields

1 code implementation • 21 Mar 2022 • Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input.

Ranked #1 on 3D-Aware Image Synthesis on CelebAMask-HQ

3D-Aware Image Synthesis Translation

122

Paper
Code

RSG: A Simple but Effective Module for Learning Imbalanced Datasets

1 code implementation • CVPR 2021 • JianFeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu

Imbalanced datasets widely exist in practice and area great challenge for training deep neural models with agood generalization on infrequent classes.

Ranked #17 on Long-tail Learning on Places-LT

Long-tail Learning

121

Paper
Code

Mesa: A Memory-saving Training Framework for Transformers

3 code implementations • 22 Nov 2021 • Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang

While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences.

Quantization

119

Paper
Code

High-Resolution Optical Flow from 1D Attention and Correlation

1 code implementation • ICCV 2021 • Haofei Xu, Jiaolong Yang, Jianfei Cai, Juyong Zhang, Xin Tong

Optical flow is inherently a 2D search problem, and thus the computational complexity grows quadratically with respect to the search window, making large displacements matching infeasible for high-resolution images.

4k Optical Flow Estimation +1

103

Paper
Code

Pruning Self-attentions into Convolutional Layers in Single Path

3 code implementations • 23 Nov 2021 • Haoyu He, Jianfei Cai, Jing Liu, Zizheng Pan, Jing Zhang, DaCheng Tao, Bohan Zhuang

Relying on the single-path space, we introduce learnable binary gates to encode the operation choices in MSA layers.

Ranked #18 on Efficient ViTs on ImageNet-1K (with DeiT-T)

Efficient ViTs Inductive Bias +1

Paper
Code

The Spatially-Correlative Loss for Various Image Translation Tasks

2 code implementations • CVPR 2021 • Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai

We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation.

Self-Supervised Learning Translation

Paper
Code

Facial Motion Prior Networks for Facial Expression Recognition

4 code implementations • 23 Feb 2019 • Yuedong Chen, Jian-Feng Wang, Shikai Chen, Zhongchao shi, Jianfei Cai

Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years.

Ranked #2 on Facial Expression Recognition (FER) on MMI

Facial Expression Recognition Facial Expression Recognition (FER)

Paper
Code

J$\hat{\text{A}}$A-Net: Joint Facial Action Unit Detection and Face Alignment via Adaptive Attention

1 code implementation • 18 Mar 2020 • Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma

Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively.

Action Unit Detection Face Alignment +1

Paper
Code

Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features

1 code implementation • ECCV 2018 • Xu Yang, Hanwang Zhang, Jianfei Cai

By "agnostic", we mean that the feature is less likely biased to the classes of paired objects.

Object

Paper
Code

Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment

1 code implementation • ECCV 2018 • Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma

Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection.

Ranked #5 on Facial Action Unit Detection on DISFA

Action Unit Detection Face Alignment +1

Paper
Code

Less is More: Pay Less Attention in Vision Transformers

2 code implementations • 29 May 2021 • Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.

Image Classification Instance Segmentation +3

Paper
Code

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

1 code implementation • 21 Mar 2024 • Yihang Chen, Qianyi Wu, Jianfei Cai, Mehrtash Harandi, Weiyao Lin

3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity.

Attribute Novel View Synthesis +1

Paper
Code

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

1 code implementation • 21 Feb 2024 • Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model.

Image Generation

Paper
Code

Generative Region-Language Pretraining for Open-Ended Object Detection

1 code implementation • 15 Mar 2024 • Chuang Lin, Yi Jiang, Lizhen Qu, Zehuan Yuan, Jianfei Cai

To address it, we formulate object detection as a generative problem and propose a simple framework named GenerateU, which can detect dense objects and generate their names in a free-form way.

Language Modelling Object +3

Paper
Code

End-to-end One-shot Human Parsing

1 code implementation • 4 May 2021 • Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao

To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).

Human Parsing Metric Learning +1

Paper
Code

EcoFormer: Energy-Saving Attention with Linear Complexity

1 code implementation • 19 Sep 2022 • Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang

To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space.

Binarization

Paper
Code

Alive Caricature from 2D to 3D

1 code implementation • CVPR 2018 • Qianyi Wu, Juyong Zhang, Yu-Kun Lai, Jianmin Zheng, Jianfei Cai

Caricature is an art form that expresses subjects in abstract, simple and exaggerated view.

Caricature

Paper
Code

Dynamic Focus-aware Positional Queries for Semantic Segmentation

2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang

In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.

Ranked #21 on Semantic Segmentation on ADE20K

Semantic Segmentation

Paper
Code

Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning

1 code implementation • ICCV 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.

Paper
Code

CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

1 code implementation • ICCV 2021 • Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, Haiyong Jiang, Zhongang Cai, Junzhe Zhang, Liang Pan, Mingyuan Zhang, Haiyu Zhao, Shuai Yi

Generating an interpretable and compact representation of 3D shapes from point clouds is an important and challenging problem.

Paper
Code

Disentangled Human Body Embedding Based on Deep Hierarchical Neural Network

1 code implementation • 14 May 2019 • Boyi Jiang, Juyong Zhang, Jianfei Cai, Jianmin Zheng

Human bodies exhibit various shapes for different identities or poses, but the body shape has certain similarities in structure and thus can be embedded in a low-dimensional space.

Representation Learning

Paper
Code

Sharpness-aware Quantization for Deep Neural Networks

3 code implementations • 24 Nov 2021 • Jing Liu, Jianfei Cai, Bohan Zhuang

However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.

Ranked #196 on Image Classification on CIFAR-100

Image Classification Model Compression +1

Paper
Code

Taming Stable Diffusion for Text to 360° Panorama Image Generation

1 code implementation • 11 Apr 2024 • Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xiaoshui Huang, Dinh Phung, Wanli Ouyang, Jianfei Cai

Generative models, e. g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts.

Denoising Image Generation

Paper
Code

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

1 code implementation • 11 Sep 2017 • Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen

On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem.

Image Captioning Sentence

Paper
Code

An Empirical Study of Language CNN for Image Captioning

2 code implementations • ICCV 2017 • Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen

Language Models based on recurrent neural networks have dominated recent image caption generation tasks.

Caption Generation Image Captioning +1

Paper
Code

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

1 code implementation • 26 Feb 2019 • Haofei Xu, Jianmin Zheng, Jianfei Cai, Juyong Zhang

In this paper, we propose a new learning based method consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate depth from unconstrained monocular videos without ground truth supervision.

Depth Estimation

Paper
Code

Unconstrained Facial Action Unit Detection via Latent Feature Domain

1 code implementation • 25 Mar 2019 • Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Xuequan Lu, Lizhuang Ma

Due to the combination of source AU-related information and target AU-free information, the latent feature domain with transferred source label can be learned by maximizing the target-domain AU detection performance.

Action Unit Detection Domain Adaptation +2

Paper
Code

Diversified and Personalized Multi-rater Medical Image Segmentation

1 code implementation • 20 Mar 2024 • Yicheng Wu, Xiangde Luo, Zhe Xu, Xiaoqing Guo, Lie Ju, ZongYuan Ge, Wenjun Liao, Jianfei Cai

To address it, the common practice is to gather multiple annotations from different experts, leading to the setting of multi-rater medical image segmentation.

Image Segmentation Medical Image Segmentation +2

Paper
Code

CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation

1 code implementation • 10 Jul 2023 • Yicheng Wu, Zhonghua Wu, Hengcan Shi, Bjoern Picker, Winston Chong, Jianfei Cai

Moreover, a simple and effective relation regularization is proposed to ensure the longitudinal relations among the three outputs to improve the model learning.

Lesion Segmentation Segmentation

Paper
Code

ExtrudeNet: Unsupervised Inverse Sketch-and-Extrude for Shape Parsing

1 code implementation • 30 Sep 2022 • Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, Junzhe Zhang

This paper studies the problem of learning the shape given in the form of point clouds by inverse sketch-and-extrude.

Paper
Code

Learning Meta-class Memory for Few-Shot Semantic Segmentation

1 code implementation • ICCV 2021 • Zhonghua Wu, Xiangxi Shi, Guosheng Lin, Jianfei Cai

To explicitly learn meta-class representations in few-shot segmentation task, we propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer to novel classes during the inference stage.

Few-Shot Semantic Segmentation Segmentation +1

Paper
Code

Accurate and Real-time 3D Pedestrian Detection Using an Efficient Attentive Pillar Network

1 code implementation • 31 Dec 2021 • Duy-Tho Le, Hengcan Shi, Hamid Rezatofighi, Jianfei Cai

Efficiently and accurately detecting people from 3D point cloud data is of great importance in many robotic and autonomous driving applications.

Ranked #1 on Birds Eye View Object Detection on KITTI Pedestrian Hard

3D Object Detection Autonomous Driving +3

Paper
Code

Visiting the Invisible: Layer-by-Layer Completed Scene Decomposition

1 code implementation • 12 Apr 2021 • Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai

In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.

Instance Segmentation Scene Understanding +1

Paper
Code

Image Captioning In the Transformer Age

1 code implementation • 15 Apr 2022 • Yang Xu, Li Li, Haiyang Xu, Songfang Huang, Fei Huang, Jianfei Cai

This drawback inspires the researchers to develop a homogeneous architecture that facilitates end-to-end training, for which Transformer is the perfect one that has proven its huge potential in both vision and language domains and thus can be used as the basic component of the visual encoder and language decoder in an IC pipeline.

Image Captioning Self-Supervised Learning

Paper
Code

Stitched ViTs are Flexible Vision Backbones

1 code implementation • 30 Jun 2023 • Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang

With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K and NYUv2, SN-Netv2 demonstrates superior performance over SN-Netv1 on downstream dense predictions and shows strong ability as a flexible vision backbone, achieving great advantages in both training efficiency and deployment flexibility.

Paper
Code

QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

2 code implementations • 12 Oct 2023 • Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang

Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly.

Quantization

Paper
Code

Multimodal Transformer with Variable-length Memory for Vision-and-Language Navigation

1 code implementation • 10 Nov 2021 • Chuang Lin, Yi Jiang, Jianfei Cai, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan

Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position, which relies on the ongoing interactions with the environment during moving.

Navigate Vision and Language Navigation

Paper
Code

Towards Unbiased Visual Emotion Recognition via Causal Intervention

1 code implementation • 26 Jul 2021 • Yuedong Chen, Xu Yang, Tat-Jen Cham, Jianfei Cai

In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation.

Causal Inference Emotion Recognition

Paper
Code

Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression

1 code implementation • ICML 2018 • Haitao Liu, Jianfei Cai, Yi Wang, Yew-Soon Ong

In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts.

Distributed Computing regression

Paper
Code

Reliability-Adaptive Consistency Regularization for Weakly-Supervised Point Cloud Segmentation

1 code implementation • 9 Mar 2023 • Zhonghua Wu, Yicheng Wu, Guosheng Lin, Jianfei Cai

Weakly-supervised point cloud segmentation with extremely limited labels is highly desirable to alleviate the expensive costs of collecting densely annotated 3D points.

Point Cloud Segmentation Segmentation +1

Paper
Code

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

1 code implementation • 4 Oct 2022 • Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai

This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning.

Image Captioning Sentence +2

Paper
Code

Efficient Stitchable Task Adaptation

1 code implementation • 29 Nov 2023 • Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang

In this work, we present a novel framework, Efficient Stitchable Task Adaptation (ESTA), to efficiently produce a palette of fine-tuned models that adhere to diverse resource constraints.

Chatbot

Paper
Code

Scalable Gaussian Process Classification with Additive Noise for Various Likelihoods

1 code implementation • 14 Sep 2019 • Haitao Liu, Yew-Soon Ong, Ziwei Yu, Jianfei Cai, Xiaobo Shen

Gaussian process classification (GPC) provides a flexible and powerful statistical framework describing joint distributions over function space.

Classification General Classification +3

Paper
Code

Improving Deep Lesion Detection Using 3D Contextual and Spatial Attention

1 code implementation • 9 Jul 2019 • Qingyi Tao, ZongYuan Ge, Jianfei Cai, Jianxiong Yin, Simon See

Secondly, in CT scans, the lesions are often indistinguishable from the background since the lesion and non-lesion areas may have very similar appearances.

Computed Tomography (CT) Lesion Detection +2

Paper
Code

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

no code implementations • CVPR 2018 • Jiuxiang Gu, Jianfei Cai, Shafiq Joty, Li Niu, Gang Wang

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities.

Cross-Modal Retrieval Retrieval +1

Paper
Add Code

CNN-based Real-time Dense Face Reconstruction with Inverse-rendered Photo-realistic Face Images

no code implementations • 3 Aug 2017 • Yudong Guo, Juyong Zhang, Jianfei Cai, Boyi Jiang, Jianmin Zheng

With the powerfulness of convolution neural networks (CNN), CNN based face reconstruction has recently shown promising performance in reconstructing detailed face shape from 2D face images.

3D Face Reconstruction Face Model +1

Paper
Add Code

VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions

no code implementations • ECCV 2018 • Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, Jiebo Luo

Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations.

Ranked #4 on Explanatory Visual Question Answering on GQA-REX

Explanatory Visual Question Answering Multi-Task Learning +1

Paper
Add Code

Conditional Adversarial Synthesis of 3D Facial Action Units

no code implementations • 21 Feb 2018 • Zhilei Liu, Guoxian Song, Jianfei Cai, Tat-Jen Cham, Juyong Zhang

Employing deep learning-based approaches for fine-grained facial expression analysis, such as those involving the estimation of Action Unit (AU) intensities, is difficult due to the lack of a large-scale dataset of real faces with sufficiently diverse AU labels for training.

Data Augmentation Image Generation

Paper
Add Code

Unpaired Image Captioning by Language Pivoting

no code implementations • ECCV 2018 • Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang

Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description.

Image Captioning Sentence

Paper
Add Code

Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation

no code implementations • 7 Mar 2018 • Tianyi Zhang, Guosheng Lin, Jianfei Cai, Tong Shen, Chunhua Shen, Alex C. Kot

In our work, we focus on the weakly supervised semantic segmentation with image label annotations.

Image Captioning Segmentation +2

Paper
Add Code

Zero-Shot Learning via Category-Specific Visual-Semantic Mapping

no code implementations • 16 Nov 2017 • Li Niu, Jianfei Cai, Ashok Veeraraghavan

Zero-Shot Learning (ZSL) aims to classify a test instance from an unseen category based on the training instances from seen categories, in which the gap between seen categories and unseen categories is generally bridged via visual-semantic mapping between the low-level visual feature space and the intermediate semantic space.

General Classification Image Classification +1

Paper
Add Code

Zero-Annotation Object Detection with Web Knowledge Transfer

no code implementations • ECCV 2018 • Qingyi Tao, Hao Yang, Jianfei Cai

Object detection is one of the major problems in computer vision, and has been extensively studied.

Domain Adaptation Object +3

Paper
Add Code

Recent Advances in Convolutional Neural Networks

no code implementations • 22 Dec 2015 • Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen

In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing.

speech-recognition Speech Recognition

Paper
Add Code

Exploiting Web Images for Weakly Supervised Object Detection

no code implementations • 27 Jul 2017 • Qingyi Tao, Hao Yang, Jianfei Cai

Object detection without bounding box annotations, i. e, weakly supervised detection methods, are still lagging far behind.

Ranked #17 on Weakly Supervised Object Detection on PASCAL VOC 2012 test (using extra training data)

Object object-detection +2

Paper
Add Code

Enriched Deep Recurrent Visual Attention Model for Multiple Object Recognition

no code implementations • 12 Jun 2017 • Artsiom Ablavatski, Shijian Lu, Jianfei Cai

We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition.

Object Recognition

Paper
Add Code

WordFence: Text Detection in Natural Images with Border Awareness

no code implementations • 15 May 2017 • Andrei Polzounov, Artsiom Ablavatski, Sergio Escalera, Shijian Lu, Jianfei Cai

In recent years, text recognition has achieved remarkable success in recognizing scanned document text.

Semantic Segmentation Text Detection

Paper
Add Code

MIML-FCN+: Multi-instance Multi-label Learning via Fully Convolutional Networks with Privileged Information

no code implementations • CVPR 2017 • Hao Yang, Joey Tianyi Zhou, Jianfei Cai, Yew Soon Ong

As the proposed PI loss is convex and SGD compatible and the framework itself is a fully convolutional network, MIML-FCN+ can be easily integrated with state of-the-art deep learning networks.

Image Captioning Multi-Label Learning +1

Paper
Add Code

Improving Multi-label Learning with Missing Labels by Structured Semantic Correlations

no code implementations • 4 Aug 2016 • Hao Yang, Joey Tianyi Zhou, Jianfei Cai

Experimental results demonstrate the effectiveness of the proposed semantic descriptor and the usefulness of incorporating the structured semantic correlations.

Missing Labels Object Recognition

Paper
Add Code

Exploit Bounding Box Annotations for Multi-label Object Recognition

no code implementations • CVPR 2016 • Hao Yang, Joey Tianyi Zhou, Yu Zhang, Bin-Bin Gao, Jianxin Wu, Jianfei Cai

With strong labels, our framework is able to achieve state-of-the-art results in both datasets.

Ranked #16 on Multi-Label Classification on PASCAL VOC 2007

Multi-Label Classification Object +1

Paper
Add Code

Diagnosing State-Of-The-Art Object Proposal Methods

no code implementations • 16 Jul 2015 • Hongyuan Zhu, Shijian Lu, Jianfei Cai, Quangqing Lee

Recently, Hosang et al. conduct the first unified study of existing methods' in terms of various image-level degradations.

Object object-detection +1

Paper
Add Code

Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes

no code implementations • 10 Jul 2015 • Hai X. Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic, Jianfei Cai, Tat-Jen Cham

We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user.

3D Reconstruction Face Model

Paper
Add Code

Weakly Supervised Fine-Grained Image Categorization

no code implementations • 20 Apr 2015 • Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Jianfei Cai, Jiangbo Lu, Viet-Anh Nguyen, Minh N. Do

Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using object or object part annotations inside training images.

Fine-Grained Image Classification Image Categorization +1

Paper
Add Code

Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation

no code implementations • 3 Feb 2015 • Hongyuan Zhu, Fanman Meng, Jianfei Cai, Shijian Lu

Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision.

Image Segmentation Segmentation +1

Paper
Add Code

When Gaussian Process Meets Big Data: A Review of Scalable GPs

no code implementations • 3 Jul 2018 • Haitao Liu, Yew-Soon Ong, Xiaobo Shen, Jianfei Cai

The review of scalable GPs in the GP community is timely and important due to the explosion of data size.

Paper
Add Code

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

no code implementations • 8 Jul 2018 • Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty

In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model.

Language Modelling Sentence +3

Paper
Add Code

Facial Action Unit Detection Using Attention and Relation Learning

no code implementations • 10 Aug 2018 • Zhiwen Shao, Zhilei Liu, Jianfei Cai, Yunsheng Wu, Lizhuang Ma

By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured.

Action Unit Detection Facial Action Unit Detection +1

Paper
Add Code

Keypoint Based Weakly Supervised Human Parsing

no code implementations • 14 Sep 2018 • Zhonghua Wu, Guosheng Lin, Jianfei Cai

We develop an iterative learning method to generate pseudo part segmentation masks from keypoint labels.

Human Parsing Segmentation +1

Paper
Add Code

Large-scale Heteroscedastic Regression via Gaussian Process

no code implementations • 3 Nov 2018 • Haitao Liu, Yew-Soon Ong, Jianfei Cai

To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets.

regression Variational Inference

Paper
Add Code

Understanding and Comparing Scalable Gaussian Process Regression for Big Data

no code implementations • 3 Nov 2018 • Haitao Liu, Jianfei Cai, Yew-Soon Ong, Yi Wang

This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness.

regression

Paper
Add Code

M2E-Try On Net: Fashion from Model to Everyone

no code implementations • 21 Nov 2018 • Zhonghua Wu, Guosheng Lin, Qingyi Tao, Jianfei Cai

Instead, we present a novel virtual Try-On network, M2E-Try On Net, which transfers the clothes from a model image to a person image without the need of any clean product images.

Virtual Try-on

Paper
Add Code

Weakly-supervised 3D Hand Pose Estimation from Monocular RGB Images

no code implementations • ECCV 2018 • Yujun Cai, Liuhao Ge, Jianfei Cai, Junsong Yuan

Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to substantial depth ambiguity and the difficulty of obtaining fully-annotated training data.

3D Hand Pose Estimation

Paper
Add Code

Quadtree Convolutional Neural Networks

no code implementations • ECCV 2018 • Pradeep Kumar Jayaraman, Jianhan Mei, Jianfei Cai, Jianmin Zheng

Specifically, the computational and memory costs in QCNN grow linearly in the number of non-zero pixels, as opposed to traditional CNNs where the costs are quadratic in the number of pixels.

Paper
Add Code

Real-time 3D Face-Eye Performance Capture of a Person Wearing VR Headset

no code implementations • 21 Jan 2019 • Guoxian Song, Jianfei Cai, Tat-Jen Cham, Jianmin Zheng, Juyong Zhang, Henry Fuchs

Teleconference or telepresence based on virtual reality (VR) headmount display (HMD) device is a very interesting and promising application since HMD can provide immersive feelings for users.

Paper
Add Code

Compact Representation for Image Classification: To Choose or to Compress?

no code implementations • CVPR 2014 • Yu Zhang, Jianxin Wu, Jianfei Cai

In spite of the popularity of various feature compression methods, this paper argues that feature selection is a better choice than feature compression.

Classification Dimensionality Reduction +5

Paper
Add Code

Recovering Surface Details under General Unknown Illumination Using Shading and Coarse Multi-view Stereo

no code implementations • CVPR 2014 • Di Xu, Qi Duan, Jianming Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham

As a result, our approach is robust, stable and is able to efficiently recover high quality of surface details even starting with a coarse MVS.

Paper
Add Code

Modality and Component Aware Feature Fusion For RGB-D Scene Classification

no code implementations • CVPR 2016 • Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

While convolutional neural networks (CNN) have been excellent for object recognition, the greater spatial variability in scene images typically meant that the standard full-image CNN features are suboptimal for scene classification.

General Classification Object Recognition +1

Paper
Add Code

A Generative Model for Depth-Based Robust 3D Facial Pose Tracking

no code implementations • CVPR 2017 • Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

We consider the problem of depth-based robust 3D facial pose tracking under unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Paper
Add Code

Object Co-Skeletonization With Co-Segmentation

no code implementations • CVPR 2017 • Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan

Recent advances in the joint processing of images have certainly shown its advantages over the individual processing.

Object Segmentation

Paper
Add Code

MMSS: Multi-Modal Sharable and Specific Feature Learning for RGB-D Object Recognition

no code implementations • ICCV 2015 • Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham

We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multi-modal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities.

Object Object Recognition

Paper
Add Code

Progress Regression RNN for Online Spatial-Temporal Action Localization in Unconstrained Videos

no code implementations • 1 Mar 2019 • Bo Hu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan

Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions.

object-detection Object Detection +3

Paper
Add Code

Unpaired Image Captioning via Scene Graph Alignments

no code implementations • ICCV 2019 • Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang

Most of current image captioning models heavily rely on paired image-caption datasets.

Image Captioning Sentence

Paper
Add Code

Scene Graph Generation with External Knowledge and Image Reconstruction

no code implementations • CVPR 2019 • Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling

Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc.

Graph Generation Image Reconstruction +6

Paper
Add Code

Learning to Collocate Neural Modules for Image Captioning

no code implementations • ICCV 2019 • Xu Yang, Hanwang Zhang, Jianfei Cai

To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).

Image Captioning Sentence +2

Paper
Add Code

Exploring Uncertainty Measures for Image-Caption Embedding-and-Retrieval Task

no code implementations • 9 Apr 2019 • Kenta Hama, Takashi Matsubara, Kuniaki Uehara, Jianfei Cai

With the wide development of black-box machine learning algorithms, particularly deep neural network (DNN), the practical demand for the reliability assessment is rapidly rising.

General Classification regression +1

Paper
Add Code

Visibility Constrained Generative Model for Depth-based 3D Facial Pose Tracking

no code implementations • 6 May 2019 • Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan

In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.

Face Model Pose Estimation +1

Paper
Add Code

Watch It Twice: Video Captioning with a Refocused Video Encoder

no code implementations • 21 Jul 2019 • Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu

With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields.

Video Captioning

Paper
Add Code

Recovering Facial Reflectance and Geometry from Multi-view Images

no code implementations • 27 Nov 2019 • Guoxian Song, Jianmin Zheng, Jianfei Cai, Tat-Jen Cham

While the problem of estimating shapes and diffuse reflectances of human faces from images has been extensively studied, there is relatively less work done on recovering the specular albedo.

Face Model

Paper
Add Code

Facial Action Unit Detection via Adaptive Attention and Relation

no code implementations • 5 Jan 2020 • Zhiwen Shao, Yong Zhou, Jianfei Cai, Hancheng Zhu, Rui Yao

Specifically, we propose an adaptive attention regression network to regress the global attention map of each AU under the constraint of attention predefinition and the guidance of AU detection, which is beneficial for capturing both specified dependencies by landmarks in strongly correlated regions and facial globally distributed dependencies in weakly correlated regions.

Action Unit Detection Facial Action Unit Detection +2

Paper
Add Code

GeoConv: Geodesic Guided Convolution for Facial Action Unit Recognition

no code implementations • 6 Mar 2020 • Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Jianming Zheng

Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture.

Face Model Facial Action Unit Detection

Paper
Add Code

Deconfounded Image Captioning: A Causal Retrospect

no code implementations • 9 Mar 2020 • Xu Yang, Hanwang Zhang, Jianfei Cai

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community.

Causal Inference Image Captioning

Paper
Add Code

Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly Supervised Object Detection

no code implementations • CVPR 2020 • Zhonghua Wu, Qingyi Tao, Guosheng Lin, Jianfei Cai

To reduce the human labeling effort, we propose a novel webly supervised object detection (WebSOD) method for novel classes which only requires the web images without further annotations.

Object object-detection +2

Paper
Add Code

Image Co-skeletonization via Co-segmentation

no code implementations • 12 Apr 2020 • Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan

Object skeletonization in a single natural image is a challenging problem because there is hardly any prior knowledge about the object.

Object Segmentation

Paper
Add Code

Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation

no code implementations • 13 Jun 2020 • Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li

Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue.

Graph Generation Object +2

Paper
Add Code

Expert Training: Task Hardness Aware Meta-Learning for Few-Shot Classification

no code implementations • 13 Jul 2020 • Yucan Zhou, Yu Wang, Jianfei Cai, Yu Zhou, QinGhua Hu, Weiping Wang

Some works in the optimization of deep neural networks have shown that a better arrangement of training data can make the classifier converge faster and perform better.

General Classification Meta-Learning

Paper
Add Code

MED-TEX: Transferring and Explaining Knowledge with Less Data from Pretrained Medical Imaging Models

no code implementations • 6 Aug 2020 • Thanh Nguyen-Duc, He Zhao, Jianfei Cai, Dinh Phung

To interpret the teacher model and assist the learning of the student, an explainer module is introduced to highlight the regions of an input that are important for the predictions of the teacher model.

Image Classification Knowledge Distillation +1

Paper
Add Code

Modeling Caricature Expressions by 3D Blendshape and Dynamic Texture

no code implementations • 13 Aug 2020 • Keyu Chen, Jianmin Zheng, Jianfei Cai, Juyong Zhang

The problem of deforming an artist-drawn caricature according to a given normal face expression is of interest in applications such as social media, animation and entertainment.

Caricature Generative Adversarial Network +1

Paper
Add Code

Learning Progressive Joint Propagation for Human Motion Prediction

no code implementations • ECCV 2020 • Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann

Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.

Human motion prediction motion prediction

Paper
Add Code

Splitting vs. Merging: Mining Object Regions with Discrepancy and Intersection Loss for Weakly Supervised Semantic Segmentation

no code implementations • ECCV 2020 • Tianyi Zhang, Guosheng Lin, Weide Liu, Jianfei Cai, Alex Kot

Finally, by training the segmentation model with the masks generated by our Splitting vs Merging strategy, we achieve the state-of-the-art weakly-supervised segmentation results on the Pascal VOC 2012 benchmark.

Segmentation Weakly supervised segmentation +2

Paper
Add Code

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning

no code implementations • ECCV 2020 • Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai

In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.

Reinforcement Learning (RL)

Paper
Add Code

Self-Supervised Relationship Probing

no code implementations • NeurIPS 2020 • Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun

Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.

Contrastive Learning Language Modelling +1

Paper
Add Code

Single-path Bit Sharing for Automatic Loss-aware Model Compression

no code implementations • 13 Jan 2021 • Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan

By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.

Model Compression Network Pruning +1

Paper
Add Code

Causal Attention for Vision-Language Tasks

no code implementations • CVPR 2021 • Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai

Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.

Paper
Add Code

Multi-Label Image Classification with Contrastive Learning

no code implementations • 24 Jul 2021 • Son D. Dao, Ethan Zhao, Dinh Phung, Jianfei Cai

Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains.

Classification Contrastive Learning +2

Paper
Add Code

Remember What You have drawn: Semantic Image Manipulation with Memory

no code implementations • 27 Jul 2021 • Xiangxi Shi, Zhonghua Wu, Guosheng Lin, Jianfei Cai, Shafiq Joty

Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description.

Image Manipulation

Paper
Add Code

Rapid Elastic Architecture Search under Specialized Classes and Resource Constraints

no code implementations • 3 Aug 2021 • Jing Liu, Bohan Zhuang, Mingkui Tan, Xu Liu, Dinh Phung, Yuanqing Li, Jianfei Cai

More critically, EAS is able to find compact architectures within 0. 1 second for 50 deployment scenarios.

Image Classification

Paper
Add Code

Semantic Compositional Learning for Low-shot Scene Graph Generation

no code implementations • 19 Aug 2021 • Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li

Scene graphs provide valuable information to many downstream tasks.

Graph Generation Relation +1

Paper
Add Code

Auto-Parsing Network for Image Captioning and Visual Question Answering

no code implementations • ICCV 2021 • Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.

Image Captioning Question Answering +1

Paper
Add Code

A Unified 3D Human Motion Synthesis Model via Conditional Variational Auto-Encoder

no code implementations • ICCV 2021 • Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, Xiaohui Shen, Ding Liu, Nadia Magnenat Thalmann

Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series.

motion prediction Motion Synthesis

Paper
Add Code

Domain-Invariant Disentangled Network for Generalizable Object Detection

no code implementations • ICCV 2021 • Chuang Lin, Zehuan Yuan, Sicheng Zhao, Peize Sun, Changhu Wang, Jianfei Cai

By disentangling representations on both image and instance levels, DIDN is able to learn domain-invariant representations that are suitable for generalized object detection.

Disentanglement Domain Generalization +4

Paper
Add Code

Contrastively Enforcing Distinctiveness for Multi-Label Classification

no code implementations • 29 Sep 2021 • Son Duy Dao, He Zhao, Dinh Phung, Jianfei Cai

Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains.

Classification Contrastive Learning +2

Paper
Add Code

ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

no code implementations • CVPR 2022 • Hengcan Shi, Munawar Hayat, Yicheng Wu, Jianfei Cai

Firstly, we analyze CLIP for unsupervised open-category proposal generation and design an objectness score based on our empirical analysis on proposal selection.

Object object-detection +2

Paper
Add Code

Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching

no code implementations • 18 Jan 2022 • Hengcan Shi, Munawar Hayat, Jianfei Cai

To avoid the laborious annotation in conventional referring grounding, unpaired referring grounding is introduced, where the training data only contains a number of images and queries without correspondences.

Image-text matching Referring Expression +1

Paper
Add Code

High-Quality Pluralistic Image Completion via Code Shared VQGAN

no code implementations • 5 Apr 2022 • Chuanxia Zheng, Guoxian Song, Tat-Jen Cham, Jianfei Cai, Dinh Phung, Linjie Luo

In this work, we present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.

Image Reconstruction Vocal Bursts Intensity Prediction

Paper
Add Code

Transformer Scale Gate for Semantic Segmentation

no code implementations • CVPR 2023 • Hengcan Shi, Munawar Hayat, Jianfei Cai

Effectively encoding multi-scale contextual information is crucial for accurate semantic segmentation.

feature selection Segmentation +1

Paper
Add Code

Dual Adaptive Transformations for Weakly Supervised Point Cloud Segmentation

no code implementations • 19 Jul 2022 • Zhonghua Wu, Yicheng Wu, Guosheng Lin, Jianfei Cai, Chen Qian

Weakly supervised point cloud segmentation, i. e. semantically segmenting a point cloud with only a few labeled points in the whole 3D scene, is highly desirable due to the heavy burden of collecting abundant dense annotations for the model training.

Point Cloud Segmentation Segmentation

Paper
Add Code

FocusFormer: Focusing on What We Need via Architecture Sampler

no code implementations • 23 Aug 2022 • Jing Liu, Jianfei Cai, Bohan Zhuang

During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment.

Neural Architecture Search

Paper
Add Code

JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking

no code implementations • CVPR 2023 • Edward Vendrow, Duy Tho Le, Jianfei Cai, Hamid Rezatofighi

In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking.

Multi-Person Pose Estimation Multi-Person Pose Estimation and Tracking +1

Paper
Add Code

Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation

no code implementations • 18 Jan 2023 • Son Duy Dao, Hengcan Shi, Dinh Phung, Jianfei Cai

Recent mask proposal models have significantly improved the performance of zero-shot semantic segmentation.

Language Modelling Open Vocabulary Semantic Segmentation +4

Paper
Add Code

Vector Quantized Wasserstein Auto-Encoder

no code implementations • 12 Feb 2023 • Tung-Long Vuong, Trung Le, He Zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung

Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks.

Clustering Image Reconstruction

Paper
Add Code

Open-Vocabulary Object Detection via Scene Graph Discovery

no code implementations • 7 Jul 2023 • Hengcan Shi, Munawar Hayat, Jianfei Cai

However, they only use pairs of nouns and individual objects in VL data, while these data usually contain much more information, such as scene graphs, which are also crucial for OV detection.

Graph Generation Object +5

Paper
Add Code

Unified Open-Vocabulary Dense Visual Prediction

no code implementations • 17 Jul 2023 • Hengcan Shi, Munawar Hayat, Jianfei Cai

We present a UOVN training mechanism to reduce such gaps.

object-detection Object Detection

Paper
Add Code

Cross-adversarial local distribution regularization for semi-supervised medical image segmentation

no code implementations • 2 Oct 2023 • Thanh Nguyen-Duc, Trung Le, Roland Bammer, He Zhao, Jianfei Cai, Dinh Phung

Medical semi-supervised segmentation is a technique where a model is trained to segment objects of interest in medical images with limited annotated data.

Image Segmentation Segmentation +2

Paper
Add Code

GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering

no code implementations • 4 Feb 2024 • Ziyu Ma, Shutao Li, Bin Sun, Jianfei Cai, Zuxiang Long, Fuyan Ma

Therefore, we propose GeReA, a generate-reason framework that prompts a MLLM like InstructBLIP with question relevant vision and language information to generate knowledge-relevant descriptions and reasons those descriptions for knowledge-based VQA.

Language Modelling Large Language Model +3

Paper
Add Code

JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments

no code implementations • 2 Apr 2024 • Duy-Tho Le, Chenhui Gou, Stavya Datta, Hengcan Shi, Ian Reid, Jianfei Cai, Hamid Rezatofighi

JRDB-PanoTrack includes (1) various data involving indoor and outdoor crowded scenes, as well as comprehensive 2D and 3D synchronized data modalities; (2) high-quality 2D spatial panoptic segmentation and temporal tracking annotations, with additional 3D label projections for further spatial understanding; (3) diverse object classes for closed- and open-world recognition benchmarks, with OSPA-based metrics for evaluation.

Decision Making Panoptic Segmentation +1

Paper
Add Code

DifFUSER: Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation

no code implementations • 6 Apr 2024 • Duy-Tho Le, Hengcan Shi, Jianfei Cai, Hamid Rezatofighi

Diffusion models have recently gained prominence as powerful deep generative models, demonstrating unmatched performance across various domains.

3D Object Detection Denoising +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.