no code implementations • ECCV 2020 • Tianyi Zhang, Guosheng Lin, Weide Liu, Jianfei Cai, Alex Kot
Finally, by training the segmentation model with the masks generated by our Splitting vs Merging strategy, we achieve the state-of-the-art weakly-supervised segmentation results on the Pascal VOC 2012 benchmark.
Weakly supervised segmentation
Weakly supervised Semantic Segmentation
+1
no code implementations • ECCV 2020 • Yujun Cai, Lin Huang, Yiwei Wang, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Xu Yang, Yiheng Zhu, Xiaohui Shen, Ding Liu, Jing Liu, Nadia Magnenat Thalmann
Last, in order to incorporate a general motion space for high-quality prediction, we build a memory-based dictionary, which aims to preserve the global motion patterns in training data to guide the predictions.
1 code implementation • 24 Apr 2023 • Yuedong Chen, Haofei Xu, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
The key to our approach lies in the explicitly modeled correspondence matching information, so as to provide the geometry prior to the prediction of NeRF color and density for volume rendering.
1 code implementation • 15 Mar 2023 • Haoyu He, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang
Visual Parameter-Efficient Tuning (VPET) has become a powerful alternative for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only tunes a small number of parameters while freezing the vast majority ones to ease storage burden and optimization difficulty.
no code implementations • 9 Mar 2023 • Zhonghua Wu, Yicheng Wu, Guosheng Lin, Jianfei Cai
Weakly-supervised point cloud segmentation with extremely limited labels is highly desirable to alleviate the expensive costs of collecting densely annotated 3D points.
2 code implementations • CVPR 2023 • Zizheng Pan, Jianfei Cai, Bohan Zhuang
As each model family consists of pretrained models with diverse scales (e. g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime.
no code implementations • 12 Feb 2023 • Tung-Long Vuong, Trung Le, He Zhao, Chuanxia Zheng, Mehrtash Harandi, Jianfei Cai, Dinh Phung
Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks.
no code implementations • 18 Jan 2023 • Son Duy Dao, Hengcan Shi, Dinh Phung, Jianfei Cai
Recent mask proposal models have significantly improved the performance of zero-shot semantic segmentation.
1 code implementation • 27 Nov 2022 • Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai
In this paper, we propose a novel open-vocabulary object detection framework directly learning from image-text pair data.
1 code implementation • CVPR 2023 • Zhixi Cai, Shreya Ghosh, Kalin Stefanov, Abhinav Dhall, Jianfei Cai, Hamid Rezatofighi, Reza Haffari, Munawar Hayat
This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS).
1 code implementation • 10 Nov 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger
We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images.
Ranked #1 on
Optical Flow Estimation
on Sintel-clean
no code implementations • CVPR 2023 • Edward Vendrow, Duy Tho Le, Jianfei Cai, Hamid Rezatofighi
In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking.
Multi-Person Pose Estimation
Multi-Person Pose Estimation and Tracking
+1
1 code implementation • 4 Oct 2022 • Xu Yang, Hanwang Zhang, Chongyang Gao, Jianfei Cai
This is because the language is only partially observable, for which we need to dynamically collocate the modules during the process of image captioning.
1 code implementation • 30 Sep 2022 • Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, Junzhe Zhang
This paper studies the problem of learning the shape given in the form of point clouds by inverse sketch-and-extrude.
1 code implementation • 19 Sep 2022 • Chuanxia Zheng, Long Tung Vuong, Jianfei Cai, Dinh Phung
Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact for similar adjacent regions using existing decoder architectures.
1 code implementation • 19 Sep 2022 • Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang
To this end, we propose a new binarization paradigm customized to high-dimensional softmax attention via kernelized hashing, called EcoFormer, to map the original queries and keys into low-dimensional binary codes in Hamming space.
no code implementations • 23 Aug 2022 • Jing Liu, Jianfei Cai, Bohan Zhuang
During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment.
1 code implementation • 20 Jul 2022 • Qianyi Wu, Xian Liu, Yuedong Chen, Kejie Li, Chuanxia Zheng, Jianfei Cai, Jianmin Zheng
This paper proposes a novel framework, ObjectSDF, to build an object-compositional neural implicit representation with high fidelity in 3D reconstruction and object representation.
no code implementations • 19 Jul 2022 • Zhonghua Wu, Yicheng Wu, Guosheng Lin, Jianfei Cai, Chen Qian
Weakly supervised point cloud segmentation, i. e. semantically segmenting a point cloud with only a few labeled points in the whole 3D scene, is highly desirable due to the heavy burden of collecting abundant dense annotations for the model training.
5 code implementations • 26 May 2022 • Zizheng Pan, Jianfei Cai, Bohan Zhuang
Therefore, we propose to disentangle the high/low frequency patterns in an attention layer by separating the heads into two groups, where one group encodes high frequencies via self-attention within each local window, and another group encodes low frequencies by performing global attention between the average-pooled low-frequency keys and values from each window and each query position in the input feature map.
Ranked #254 on
Image Classification
on ImageNet
no code implementations • CVPR 2023 • Hengcan Shi, Munawar Hayat, Jianfei Cai
Effectively encoding multi-scale contextual information is crucial for accurate semantic segmentation.
1 code implementation • 15 Apr 2022 • Yang Xu, Li Li, Haiyang Xu, Songfang Huang, Fei Huang, Jianfei Cai
This drawback inspires the researchers to develop a homogeneous architecture that facilitates end-to-end training, for which Transformer is the perfect one that has proven its huge potential in both vision and language domains and thus can be used as the basic component of the visual encoder and language decoder in an IC pipeline.
no code implementations • 5 Apr 2022 • Chuanxia Zheng, Guoxian Song, Tat-Jen Cham, Jianfei Cai, Dinh Phung, Linjie Luo
In this work, we present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.
2 code implementations • CVPR 2023 • Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, DaCheng Tao, Bohan Zhuang
In this paper, we propose a simple yet effective query design for semantic segmentation termed Dynamic Focus-aware Positional Queries (DFPQ), which dynamically generates positional queries conditioned on the cross-attention scores from the preceding decoder block and the positional encodings for the corresponding image features, simultaneously.
Ranked #20 on
Semantic Segmentation
on ADE20K
1 code implementation • 21 Mar 2022 • Yuedong Chen, Qianyi Wu, Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
In light of recent advances in NeRF-based 3D-aware generative models, we introduce a new task, Semantic-to-NeRF translation, that aims to reconstruct a 3D scene modelled by NeRF, conditioned on one single-view semantic mask as input.
Ranked #1 on
3D-Aware Image Synthesis
on CelebAMask-HQ
2 code implementations • 2 Mar 2022 • Yicheng Wu, Zhonghua Wu, Qianyi Wu, ZongYuan Ge, Jianfei Cai
The pixel-level smoothness forces the model to generate invariant results under adversarial perturbations.
no code implementations • 18 Jan 2022 • Hengcan Shi, Munawar Hayat, Jianfei Cai
To avoid the laborious annotation in conventional referring grounding, unpaired referring grounding is introduced, where the training data only contains a number of images and queries without correspondences.
no code implementations • CVPR 2022 • Hengcan Shi, Munawar Hayat, Yicheng Wu, Jianfei Cai
Firstly, we analyze CLIP for unsupervised open-category proposal generation and design an objectness score based on our empirical analysis on proposal selection.
1 code implementation • 31 Dec 2021 • Duy-Tho Le, Hengcan Shi, Hamid Rezatofighi, Jianfei Cai
Efficiently and accurately detecting people from 3D point cloud data is of great importance in many robotic and autonomous driving applications.
Ranked #1 on
3D Object Detection
on KITTI Pedestrian
2 code implementations • CVPR 2022 • Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, DaCheng Tao
Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements.
3 code implementations • 24 Nov 2021 • Jing Liu, Jianfei Cai, Bohan Zhuang
However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.
3 code implementations • 23 Nov 2021 • Haoyu He, Jing Liu, Zizheng Pan, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang
Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks.
3 code implementations • 22 Nov 2021 • Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang
While Transformers have delivered significant performance improvements, training such networks is extremely memory intensive owing to storing all intermediate activations that are needed for gradient computation during backpropagation, especially for long sequences.
1 code implementation • 10 Nov 2021 • Chuang Lin, Yi Jiang, Jianfei Cai, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan
Vision-and-Language Navigation (VLN) is a task that an agent is required to follow a language instruction to navigate to the goal position, which relies on the ongoing interactions with the environment during moving.
no code implementations • 29 Sep 2021 • Son Duy Dao, He Zhao, Dinh Phung, Jianfei Cai
Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains.
2 code implementations • 21 Sep 2021 • Yicheng Wu, ZongYuan Ge, Donghao Zhang, Minfeng Xu, Lei Zhang, Yong Xia, Jianfei Cai
In this paper, we propose a novel mutual consistency network (MC-Net+) to effectively exploit the unlabeled data for semi-supervised medical image segmentation.
1 code implementation • ICCV 2021 • Daxuan Ren, Jianmin Zheng, Jianfei Cai, Jiatong Li, Haiyong Jiang, Zhongang Cai, Junzhe Zhang, Liang Pan, Mingyuan Zhang, Haiyu Zhao, Shuai Yi
Generating an interpretable and compact representation of 3D shapes from point clouds is an important and challenging problem.
no code implementations • ICCV 2021 • Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai
We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems.
no code implementations • 19 Aug 2021 • Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li
Scene graphs provide valuable information to many downstream tasks.
1 code implementation • ICCV 2021 • Zhonghua Wu, Xiangxi Shi, Guosheng Lin, Jianfei Cai
To explicitly learn meta-class representations in few-shot segmentation task, we propose a novel Meta-class Memory based few-shot segmentation method (MM-Net), where we introduce a set of learnable memory embeddings to memorize the meta-class information during the base class training and transfer to novel classes during the inference stage.
no code implementations • 3 Aug 2021 • Jing Liu, Bohan Zhuang, Mingkui Tan, Xu Liu, Dinh Phung, Yuanqing Li, Jianfei Cai
More critically, EAS is able to find compact architectures within 0. 1 second for 50 deployment scenarios.
no code implementations • 27 Jul 2021 • Xiangxi Shi, Zhonghua Wu, Guosheng Lin, Jianfei Cai, Shafiq Joty
Therefore, in this paper, we propose a memory-based Image Manipulation Network (MIM-Net), where a set of memories learned from images is introduced to synthesize the texture information with the guidance of the textual description.
1 code implementation • 26 Jul 2021 • Yuedong Chen, Xu Yang, Tat-Jen Cham, Jianfei Cai
In this work, we scrutinize this problem from the perspective of causal inference, where such dataset characteristic is termed as a confounder which misleads the system to learn the spurious correlation.
no code implementations • 24 Jul 2021 • Son D. Dao, Ethan Zhao, Dinh Phung, Jianfei Cai
Recently, as an effective way of learning latent representations, contrastive learning has been increasingly popular and successful in various domains.
1 code implementation • CVPR 2021 • JianFeng Wang, Thomas Lukasiewicz, Xiaolin Hu, Jianfei Cai, Zhenghua Xu
Imbalanced datasets widely exist in practice and area great challenge for training deep neural models with agood generalization on infrequent classes.
Ranked #16 on
Long-tail Learning
on Places-LT
2 code implementations • 29 May 2021 • Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai
Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.
1 code implementation • 4 May 2021 • Haoyu He, Bohan Zhuang, Jing Zhang, Jianfei Cai, DaCheng Tao
To address three main challenges in OSHP, i. e., small sizes, testing bias, and similar parts, we devise an End-to-end One-shot human Parsing Network (EOP-Net).
1 code implementation • ICCV 2021 • Haofei Xu, Jiaolong Yang, Jianfei Cai, Juyong Zhang, Xin Tong
Optical flow is inherently a 2D search problem, and thus the computational complexity grows quadratically with respect to the search window, making large displacements matching infeasible for high-resolution images.
1 code implementation • 12 Apr 2021 • Chuanxia Zheng, Duy-Son Dao, Guoxian Song, Tat-Jen Cham, Jianfei Cai
In this work, we propose a higher-level scene understanding system to tackle both visible and invisible parts of objects and backgrounds in a given scene.
2 code implementations • CVPR 2021 • Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation.
1 code implementation • CVPR 2022 • Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai, Dinh Phung
Bridging global context interactions correctly is important for high-fidelity image completion with large masks.
Ranked #2 on
Image Inpainting
on FFHQ 512 x 512
2 code implementations • ICCV 2021 • Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai
However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.
Ranked #706 on
Image Classification
on ImageNet
no code implementations • CVPR 2021 • Xu Yang, Hanwang Zhang, GuoJun Qi, Jianfei Cai
Specifically, CATT is implemented as a combination of 1) In-Sample Attention (IS-ATT) and 2) Cross-Sample Attention (CS-ATT), where the latter forcibly brings other samples into every IS-ATT, mimicking the causal intervention.
3 code implementations • 4 Mar 2021 • Yicheng Wu, Minfeng Xu, ZongYuan Ge, Jianfei Cai, Lei Zhang
Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions.
no code implementations • 13 Jan 2021 • Jing Liu, Bohan Zhuang, Peng Chen, Chunhua Shen, Jianfei Cai, Mingkui Tan
By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined.
no code implementations • ICCV 2021 • Chuang Lin, Zehuan Yuan, Sicheng Zhao, Peize Sun, Changhu Wang, Jianfei Cai
By disentangling representations on both image and instance levels, DIDN is able to learn domain-invariant representations that are suitable for generalized object detection.
no code implementations • ICCV 2021 • Yujun Cai, Yiwei Wang, Yiheng Zhu, Tat-Jen Cham, Jianfei Cai, Junsong Yuan, Jun Liu, Chuanxia Zheng, Sijie Yan, Henghui Ding, Xiaohui Shen, Ding Liu, Nadia Magnenat Thalmann
Notably, by considering this problem as a conditional generation process, we estimate a parametric distribution of the missing regions based on the input conditions, from which to sample and synthesize the full motion series.
no code implementations • NeurIPS 2020 • Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun
Structured representations of images that model visual relationships are beneficial for many vision and vision-language applications.
no code implementations • ECCV 2020 • Xiangxi Shi, Xu Yang, Jiuxiang Gu, Shafiq Joty, Jianfei Cai
In this paper, we propose a novel visual encoder to explicitly distinguish viewpoint changes from semantic changes in the change captioning task.
no code implementations • 13 Aug 2020 • Keyu Chen, Jianmin Zheng, Jianfei Cai, Juyong Zhang
The problem of deforming an artist-drawn caricature according to a given normal face expression is of interest in applications such as social media, animation and entertainment.
no code implementations • 6 Aug 2020 • Thanh Nguyen-Duc, He Zhao, Jianfei Cai, Dinh Phung
To interpret the teacher model and assist the learning of the student, an explainer module is introduced to highlight the regions of an input that are important for the predictions of the teacher model.
no code implementations • 13 Jul 2020 • Yucan Zhou, Yu Wang, Jianfei Cai, Yu Zhou, QinGhua Hu, Weiping Wang
Some works in the optimization of deep neural networks have shown that a better arrangement of training data can make the classifier converge faster and perform better.
no code implementations • 13 Jun 2020 • Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li
Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue.
no code implementations • 12 Apr 2020 • Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan
Object skeletonization in a single natural image is a challenging problem because there is hardly any prior knowledge about the object.
no code implementations • CVPR 2020 • Zhonghua Wu, Qingyi Tao, Guosheng Lin, Jianfei Cai
To reduce the human labeling effort, we propose a novel webly supervised object detection (WebSOD) method for novel classes which only requires the web images without further annotations.
1 code implementation • 18 Mar 2020 • Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma
Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively.
no code implementations • 9 Mar 2020 • Xu Yang, Hanwang Zhang, Jianfei Cai
The dataset bias in vision-language tasks is becoming one of the main problems that hinder the progress of our community.
no code implementations • 6 Mar 2020 • Yuedong Chen, Guoxian Song, Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Jianming Zheng
Automatic facial action unit (AU) recognition has attracted great attention but still remains a challenging task, as subtle changes of local facial muscles are difficult to thoroughly capture.
no code implementations • 5 Jan 2020 • Zhiwen Shao, Yong Zhou, Jianfei Cai, Hancheng Zhu, Rui Yao
Specifically, we propose an adaptive attention regression network to regress the global attention map of each AU under the constraint of attention predefinition and the guidance of AU detection, which is beneficial for capturing both specified dependencies by landmarks in strongly correlated regions and facial globally distributed dependencies in weakly correlated regions.
no code implementations • 27 Nov 2019 • Guoxian Song, Jianmin Zheng, Jianfei Cai, Tat-Jen Cham
While the problem of estimating shapes and diffuse reflectances of human faces from images has been extensively studied, there is relatively less work done on recovering the specular albedo.
1 code implementation • 14 Sep 2019 • Haitao Liu, Yew-Soon Ong, Ziwei Yu, Jianfei Cai, Xiaobo Shen
Gaussian process classification (GPC) provides a flexible and powerful statistical framework describing joint distributions over function space.
no code implementations • 21 Jul 2019 • Xiangxi Shi, Jianfei Cai, Shafiq Joty, Jiuxiang Gu
With the rapid growth of video data and the increasing demands of various applications such as intelligent video search and assistance toward visually-impaired people, video captioning task has received a lot of attention recently in computer vision and natural language processing fields.
1 code implementation • 9 Jul 2019 • Qingyi Tao, ZongYuan Ge, Jianfei Cai, Jianxiong Yin, Simon See
Secondly, in CT scans, the lesions are often indistinguishable from the background since the lesion and non-lesion areas may have very similar appearances.
1 code implementation • 14 May 2019 • Boyi Jiang, Juyong Zhang, Jianfei Cai, Jianmin Zheng
Human bodies exhibit various shapes for different identities or poses, but the body shape has certain similarities in structure and thus can be embedded in a low-dimensional space.
no code implementations • 6 May 2019 • Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan
In this paper, we propose a generative framework that unifies depth-based 3D facial pose tracking and face model adaptation on-the-fly, in the unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.
no code implementations • ICCV 2019 • Xu Yang, Hanwang Zhang, Jianfei Cai
To this end, we make the following technical contributions for CNM training: 1) compact module design --- one for function words and three for visual content words (eg, noun, adjective, and verb), 2) soft module fusion and multi-step module execution, robustifying the visual reasoning in partial observation, 3) a linguistic loss for module controller being faithful to part-of-speech collocations (eg, adjective is before noun).
no code implementations • 9 Apr 2019 • Kenta Hama, Takashi Matsubara, Kuniaki Uehara, Jianfei Cai
With the wide development of black-box machine learning algorithms, particularly deep neural network (DNN), the practical demand for the reliability assessment is rapidly rising.
no code implementations • CVPR 2019 • Jiuxiang Gu, Handong Zhao, Zhe Lin, Sheng Li, Jianfei Cai, Mingyang Ling
Scene graph generation has received growing attention with the advancements in image understanding tasks such as object detection, attributes and relationship prediction,~\etc.
no code implementations • ICCV 2019 • Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Handong Zhao, Xu Yang, Gang Wang
Most of current image captioning models heavily rely on paired image-caption datasets.
1 code implementation • 25 Mar 2019 • Zhiwen Shao, Jianfei Cai, Tat-Jen Cham, Xuequan Lu, Lizhuang Ma
Due to the combination of source AU-related information and target AU-free information, the latent feature domain with transferred source label can be learned by maximizing the target-domain AU detection performance.
1 code implementation • CVPR 2019 • Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
In this paper, we present an approach for \textbf{pluralistic image completion} -- the task of generating multiple and diverse plausible solutions for image completion.
2 code implementations • CVPR 2019 • Liuhao Ge, Zhou Ren, Yuncheng Li, Zehao Xue, Yingying Wang, Jianfei Cai, Junsong Yuan
This work addresses a novel and challenging problem of estimating the full 3D hand shape and pose from a single RGB image.
no code implementations • 1 Mar 2019 • Bo Hu, Jianfei Cai, Tat-Jen Cham, Junsong Yuan
Previous spatial-temporal action localization methods commonly follow the pipeline of object detection to estimate bounding boxes and labels of actions.
1 code implementation • 26 Feb 2019 • Haofei Xu, Jianmin Zheng, Jianfei Cai, Juyong Zhang
In this paper, we propose a new learning based method consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate depth from unconstrained monocular videos without ground truth supervision.
4 code implementations • 23 Feb 2019 • Yuedong Chen, Jian-Feng Wang, Shikai Chen, Zhongchao shi, Jianfei Cai
Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years.
Ranked #2 on
Facial Expression Recognition (FER)
on MMI
no code implementations • 21 Jan 2019 • Guoxian Song, Jianfei Cai, Tat-Jen Cham, Jianmin Zheng, Juyong Zhang, Henry Fuchs
Teleconference or telepresence based on virtual reality (VR) headmount display (HMD) device is a very interesting and promising application since HMD can provide immersive feelings for users.
1 code implementation • CVPR 2019 • Xu Yang, Kaihua Tang, Hanwang Zhang, Jianfei Cai
We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions.
no code implementations • 21 Nov 2018 • Zhonghua Wu, Guosheng Lin, Qingyi Tao, Jianfei Cai
Instead, we present a novel virtual Try-On network, M2E-Try On Net, which transfers the clothes from a model image to a person image without the need of any clean product images.
no code implementations • 3 Nov 2018 • Haitao Liu, Yew-Soon Ong, Jianfei Cai
To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets.
no code implementations • 3 Nov 2018 • Haitao Liu, Jianfei Cai, Yew-Soon Ong, Yi Wang
This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness.
no code implementations • 14 Sep 2018 • Zhonghua Wu, Guosheng Lin, Jianfei Cai
We develop an iterative learning method to generate pseudo part segmentation masks from keypoint labels.
no code implementations • ECCV 2018 • Yujun Cai, Liuhao Ge, Jianfei Cai, Junsong Yuan
Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to substantial depth ambiguity and the difficulty of obtaining fully-annotated training data.
no code implementations • ECCV 2018 • Pradeep Kumar Jayaraman, Jianhan Mei, Jianfei Cai, Jianmin Zheng
Specifically, the computational and memory costs in QCNN grow linearly in the number of non-zero pixels, as opposed to traditional CNNs where the costs are quadratic in the number of pixels.
no code implementations • 10 Aug 2018 • Zhiwen Shao, Zhilei Liu, Jianfei Cai, Yunsheng Wu, Lizhuang Ma
By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured.
1 code implementation • ECCV 2018 • Chuanxia Zheng, Tat-Jen Cham, Jianfei Cai
Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire.
Ranked #3 on
Depth Estimation
on eBDtheque
1 code implementation • ECCV 2018 • Xu Yang, Hanwang Zhang, Jianfei Cai
By "agnostic", we mean that the feature is less likely biased to the classes of paired objects.
no code implementations • 8 Jul 2018 • Xiangxi Shi, Jianfei Cai, Jiuxiang Gu, Shafiq Joty
In this paper, we propose a boundary-aware hierarchical language decoder for video captioning, which consists of a high-level GRU based language decoder, working as a global (caption-level) language model, and a low-level GRU based language decoder, working as a local (phrase-level) language model.
no code implementations • 3 Jul 2018 • Haitao Liu, Yew-Soon Ong, Xiaobo Shen, Jianfei Cai
The review of scalable GPs in the GP community is timely and important due to the explosion of data size.
1 code implementation • ICML 2018 • Haitao Liu, Jianfei Cai, Yi Wang, Yew-Soon Ong
In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts.
no code implementations • ECCV 2018 • Qing Li, Qingyi Tao, Shafiq Joty, Jianfei Cai, Jiebo Luo
Most existing works in visual question answering (VQA) are dedicated to improving the accuracy of predicted answers, while disregarding the explanations.
1 code implementation • CVPR 2018 • Qianyi Wu, Juyong Zhang, Yu-Kun Lai, Jianmin Zheng, Jianfei Cai
Caricature is an art form that expresses subjects in abstract, simple and exaggerated view.
1 code implementation • ECCV 2018 • Zhiwen Shao, Zhilei Liu, Jianfei Cai, Lizhuang Ma
Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection.
Ranked #3 on
Facial Action Unit Detection
on DISFA
no code implementations • ECCV 2018 • Jiuxiang Gu, Shafiq Joty, Jianfei Cai, Gang Wang
Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description.
no code implementations • 7 Mar 2018 • Tianyi Zhang, Guosheng Lin, Jianfei Cai, Tong Shen, Chunhua Shen, Alex C. Kot
In our work, we focus on the weakly supervised semantic segmentation with image label annotations.
no code implementations • 21 Feb 2018 • Zhilei Liu, Guoxian Song, Jianfei Cai, Tat-Jen Cham, Juyong Zhang
Employing deep learning-based approaches for fine-grained facial expression analysis, such as those involving the estimation of Action Unit (AU) intensities, is difficult due to the lack of a large-scale dataset of real faces with sufficiently diverse AU labels for training.
no code implementations • CVPR 2018 • Jiuxiang Gu, Jianfei Cai, Shafiq Joty, Li Niu, Gang Wang
Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities.
no code implementations • 16 Nov 2017 • Li Niu, Jianfei Cai, Ashok Veeraraghavan
Zero-Shot Learning (ZSL) aims to classify a test instance from an unseen category based on the training instances from seen categories, in which the gap between seen categories and unseen categories is generally bridged via visual-semantic mapping between the low-level visual feature space and the intermediate semantic space.
no code implementations • ECCV 2018 • Qingyi Tao, Hao Yang, Jianfei Cai
Object detection is one of the major problems in computer vision, and has been extensively studied.
1 code implementation • 11 Sep 2017 • Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen
On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem.
no code implementations • 3 Aug 2017 • Yudong Guo, Juyong Zhang, Jianfei Cai, Boyi Jiang, Jianmin Zheng
With the powerfulness of convolution neural networks (CNN), CNN based face reconstruction has recently shown promising performance in reconstructing detailed face shape from 2D face images.
no code implementations • 27 Jul 2017 • Qingyi Tao, Hao Yang, Jianfei Cai
Object detection without bounding box annotations, i. e, weakly supervised detection methods, are still lagging far behind.
Ranked #17 on
Weakly Supervised Object Detection
on PASCAL VOC 2012 test
(using extra training data)
no code implementations • CVPR 2017 • Lu Sheng, Jianfei Cai, Tat-Jen Cham, Vladimir Pavlovic, King Ngi Ngan
We consider the problem of depth-based robust 3D facial pose tracking under unconstrained scenarios with heavy occlusions and arbitrary facial expression variations.
no code implementations • CVPR 2017 • Koteswar Rao Jerripothula, Jianfei Cai, Jiangbo Lu, Junsong Yuan
Recent advances in the joint processing of images have certainly shown its advantages over the individual processing.
no code implementations • 12 Jun 2017 • Artsiom Ablavatski, Shijian Lu, Jianfei Cai
We design an Enriched Deep Recurrent Visual Attention Model (EDRAM) - an improved attention-based architecture for multiple object recognition.
no code implementations • 15 May 2017 • Andrei Polzounov, Artsiom Ablavatski, Sergio Escalera, Shijian Lu, Jianfei Cai
In recent years, text recognition has achieved remarkable success in recognizing scanned document text.
no code implementations • CVPR 2017 • Hao Yang, Joey Tianyi Zhou, Jianfei Cai, Yew Soon Ong
As the proposed PI loss is convex and SGD compatible and the framework itself is a fully convolutional network, MIML-FCN+ can be easily integrated with state of-the-art deep learning networks.
2 code implementations • ICCV 2017 • Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen
Language Models based on recurrent neural networks have dominated recent image caption generation tasks.
no code implementations • 4 Aug 2016 • Hao Yang, Joey Tianyi Zhou, Jianfei Cai
Experimental results demonstrate the effectiveness of the proposed semantic descriptor and the usefulness of incorporating the structured semantic correlations.
no code implementations • CVPR 2016 • Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham
While convolutional neural networks (CNN) have been excellent for object recognition, the greater spatial variability in scene images typically meant that the standard full-image CNN features are suboptimal for scene classification.
no code implementations • 22 Dec 2015 • Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Li Wang, Gang Wang, Jianfei Cai, Tsuhan Chen
In the last few years, deep learning has led to very good performance on a variety of problems, such as visual recognition, speech recognition and natural language processing.
no code implementations • ICCV 2015 • Anran Wang, Jianfei Cai, Jiwen Lu, Tat-Jen Cham
We first construct deep CNN layers for color and depth separately, and then connect them with our carefully designed multi-modal layers, which fuse color and depth information by enforcing a common part to be shared by features of different modalities.
no code implementations • 16 Jul 2015 • Hongyuan Zhu, Shijian Lu, Jianfei Cai, Quangqing Lee
Recently, Hosang et al. conduct the first unified study of existing methods' in terms of various image-level degradations.
no code implementations • 10 Jul 2015 • Hai X. Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic, Jianfei Cai, Tat-Jen Cham
We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user.
no code implementations • CVPR 2016 • Hao Yang, Joey Tianyi Zhou, Yu Zhang, Bin-Bin Gao, Jianxin Wu, Jianfei Cai
With strong labels, our framework is able to achieve state-of-the-art results in both datasets.
Ranked #16 on
Multi-Label Classification
on PASCAL VOC 2007
no code implementations • 20 Apr 2015 • Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Jianfei Cai, Jiangbo Lu, Viet-Anh Nguyen, Minh N. Do
Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using object or object part annotations inside training images.
no code implementations • 3 Feb 2015 • Hongyuan Zhu, Fanman Meng, Jianfei Cai, Shijian Lu
Image segmentation refers to the process to divide an image into nonoverlapping meaningful regions according to human perception, which has become a classic topic since the early ages of computer vision.
no code implementations • CVPR 2014 • Di Xu, Qi Duan, Jianming Zheng, Juyong Zhang, Jianfei Cai, Tat-Jen Cham
As a result, our approach is robust, stable and is able to efficiently recover high quality of surface details even starting with a coarse MVS.
no code implementations • CVPR 2014 • Yu Zhang, Jianxin Wu, Jianfei Cai
In spite of the popularity of various feature compression methods, this paper argues that feature selection is a better choice than feature compression.