1 code implementation • 18 Mar 2023 • Yuhang Yang, Wei Zhai, Hongchen Luo, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Comprehensive experiments on PIAD demonstrate the reliability of the proposed task and the superiority of our method.
1 code implementation • 18 Mar 2023 • Pingyu Wu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Specifically, a spatial token is first introduced in the input space to aggregate representations for localization task.
no code implementations • 21 Nov 2022 • Ruili Feng, Kecheng Zheng, Kai Zhu, Yujun Shen, Jian Zhao, Yukun Huang, Deli Zhao, Jingren Zhou, Michael Jordan, Zheng-Jun Zha
Through investigating the properties of the problem solution, we confirm that neural dependency is guaranteed by a redundant logit covariance matrix, which condition is easily met given massive categories, and that neural dependency is highly sparse, implying that one category correlates to only a few others.
1 code implementation • 18 Jul 2022 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang
Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects.
no code implementations • 15 Jul 2022 • Naishan Zheng, Jie Huang, Qi Zhu, Man Zhou, Feng Zhao, Zheng-Jun Zha
Low-light image enhancement is an inherently subjective process whose targets vary with the user's aesthetic.
no code implementations • 13 Jun 2022 • Ruili Feng, Kecheng Zheng, Yukun Huang, Deli Zhao, Michael Jordan, Zheng-Jun Zha
By virtue of our numerical tools, we provide the first empirical analysis of the per-layer behavior of network rank in practical settings, i. e., ResNets, deep MLPs, and Transformers on ImageNet.
no code implementations • 10 Jun 2022 • Jingyi Xie, Jiawei Liu, Zheng-Jun Zha
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data and facilitates model training by generating refined labels as weak supervision.
1 code implementation • CVPR 2022 • Shaofei Cai, Liang Li, Xinzhe Han, Jiebo Luo, Zheng-Jun Zha, Qingming Huang
However, the currently used graph search space overemphasizes learning node features and neglects mining hierarchical relational information.
Ranked #1 on
Link Prediction
on TSP/HCP Benchmark set
no code implementations • 21 May 2022 • Ruili Feng, Jie Xiao, Kecheng Zheng, Deli Zhao, Jingren Zhou, Qibin Sun, Zheng-Jun Zha
Human can extrapolate well, generalize daily knowledge into unseen scenarios, raise and answer counterfactual questions.
no code implementations • CVPR 2022 • Xihao Chen, Zhiwei Xiong, Zhen Cheng, Jiayong Peng, Yueyi Zhang, Zheng-Jun Zha
Interestingly, we find that, although a stereo matching network trained with the photometric loss is not optimal, its feature extractor can produce degradation-agnostic and matching-specific features.
no code implementations • 2 Apr 2022 • Zhenhuan Liu, Liang Li, Huajie Jiang, Xin Jin, Dandan Tu, Shuhui Wang, Zheng-Jun Zha
Furthermore, we devise the spatio-temporal correlative map as a style-independent, global-aware regularization on the perceptual motion consistency.
no code implementations • 24 Mar 2022 • Kecheng Zheng, Yang Cao, Kai Zhu, Ruijing Zhao, Zheng-Jun Zha
However, its generalization performance to heterogeneous tasks is inferior to other architectures (e. g., CNNs and transformers) due to the extensive retention of domain information.
no code implementations • 22 Mar 2022 • Jinze Chen, Yang Wang, Yang Cao, Feng Wu, Zheng-Jun Zha
Dynamic Vision Sensor (DVS) can asynchronously output the events reflecting apparent motion of objects with microsecond resolution, and shows great application potential in monitoring and other fields.
1 code implementation • 18 Mar 2022 • Yangyang Li, Wei Zhai, Yang Cao, Zheng-Jun Zha
However, these methods struggle in 1) efficiently generating camouflage images using foreground and background with arbitrary structure; 2) camouflaging foreground objects to regions with multiple appearances (e. g. the junction of the vegetation and the mountains), which limit their practical application.
2 code implementations • CVPR 2022 • Kai Zhu, Wei Zhai, Yang Cao, Jiebo Luo, Zheng-Jun Zha
Non-exemplar class-incremental learning is to recognize both the old and new classes when old class samples cannot be saved.
1 code implementation • CVPR 2022 • Jiayu Xiao, Liang Li, Chaofei Wang, Zheng-Jun Zha, Qingming Huang
A feasible solution is to start with a GAN well-trained on a large scale source domain and adapt it to the target domain with a few samples, termed as few shot generative model adaption.
no code implementations • 3 Mar 2022 • Zhipeng Huang, Jiawei Liu, Liang Li, Kecheng Zheng, Zheng-Jun Zha
RGB-infrared person re-identification is an emerging cross-modality re-identification task, which is very challenging due to significant modality discrepancy between RGB and infrared images.
no code implementations • 3 Mar 2022 • Jiawei Liu, Zhipeng Huang, Liang Li, Kecheng Zheng, Zheng-Jun Zha
In this paper, we propose a novel Debiased Batch Normalization via Gaussian Process approach (GDNorm) for generalizable person re-identification, which models the feature statistic estimation from BN layers as a dynamically self-refining Gaussian process to alleviate the bias to unseen domain for improving the generalization.
Generalizable Person Re-identification
Representation Learning
1 code implementation • AAAI 2022 • Yurui Zhu, Zeyu Xiao, Yanchi Fang, Xueyang Fu, Zhiwei Xiong, Zheng-Jun Zha
To address these issues, we first propose a new shadow illumination model for the shadow removal task.
2 code implementations • CVPR 2022 • Yurui Zhu, Jie Huang, Xueyang Fu, Feng Zhao, Qibin Sun, Zheng-Jun Zha
Shadow removal, which aims to restore the background in the shadow regions, is challenging due to the highly ill-posed nature.
no code implementations • CVPR 2022 • Ganchao Tan, Yang Wang, Han Han, Yang Cao, Feng Wu, Zheng-Jun Zha
To recognize words from the event data, we propose a novel Multi-grained Spatio-Temporal Features Perceived Network (MSTP) to perceive fine-grained spatio-temporal features from microsecond time-resolved event data.
no code implementations • CVPR 2022 • Wei Wu, Jiawei Liu, Kecheng Zheng, Qibin Sun, Zheng-Jun Zha
Image-to-video person re-identification aims to retrieve the same pedestrian as the image-based query from a video-based gallery set.
Image-To-Video Person Re-Identification
reinforcement-learning
+4
no code implementations • CVPR 2022 • Zhipeng Huang, Zhizheng Zhang, Cuiling Lan, Wenjun Zeng, Peng Chu, Quanzeng You, Jiang Wang, Zicheng Liu, Zheng-Jun Zha
In this paper, to address more practical scenarios, we propose a new task, Lifelong Unsupervised Domain Adaptive (LUDA) person ReID.
Domain Adaptive Person Re-Identification
Knowledge Distillation
+4
1 code implementation • 27 Nov 2021 • Kecheng Zheng, Jiawei Liu, Wei Wu, Liang Li, Zheng-Jun Zha
The calibrated person representation is subtly decomposed into the identity-relevant feature, domain feature, and the remaining entangled one.
Domain Generalization
Generalizable Person Re-identification
1 code implementation • CVPR 2022 • Yaya Shi, Xu Yang, Haiyang Xu, Chunfeng Yuan, Bing Li, Weiming Hu, Zheng-Jun Zha
The datasets will be released to facilitate the development of video captioning metrics.
no code implementations • 3 Sep 2021 • Shaofei Cai, Liang Li, Xinzhe Han, Zheng-Jun Zha, Qingming Huang
Recently, researchers study neural architecture search (NAS) to reduce the dependence of human expertise and explore better GNN architectures, but they over-emphasize entity features and ignore latent relation information concealed in the edges.
1 code implementation • 30 Aug 2021 • Yucheng Zhao, Guangting Wang, Chuanxin Tang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha
Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision.
no code implementations • 26 Aug 2021 • Hao Wang, Zheng-Jun Zha, Liang Li, Xuejin Chen, Jiebo Luo
We propose a novel MultiModulation Network (M2N) to learn the above correlation and leverage it as semantic guidance to modulate the related auditory, visual, and fused features.
no code implementations • ICCV 2021 • Yucheng Zhao, Guangting Wang, Chong Luo, Wenjun Zeng, Zheng-Jun Zha
In this paper, we propose a novel contrastive mask prediction (CMP) task for visual representation learning and design a mask contrast (MaskCo) framework to implement the idea.
1 code implementation • ICCV 2021 • Heliang Zheng, Huan Yang, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
And the reference space is optimized to capture deep image priors that are useful for quality assessment.
no code implementations • 31 Jul 2021 • Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Jiawei Liu, Zhizheng Zhang, Zheng-Jun Zha
Occluded person re-identification (ReID) aims to match person images with occlusion.
1 code implementation • 27 Jul 2021 • Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-Jun Zha, Yonggang Wen, DaCheng Tao
In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains.
1 code implementation • CVPR 2021 • Kai Zhu, Yang Cao, Wei Zhai, Jie Cheng, Zheng-Jun Zha
Few-shot class-incremental learning is to recognize the new classes given few samples and not forget the old classes.
2 code implementations • 7 Jul 2021 • Zehui Chen, Chenhongyi Yang, Qiaofei Li, Feng Zhao, Zheng-Jun Zha, Feng Wu
Extensive experiments on MS COCO benchmark show that our approach can lead to 2. 0 mAP, 2. 4 mAP and 2. 2 mAP absolute improvements on RetinaNet, FCOS, and ATSS baselines with negligible extra overhead.
no code implementations • CVPR 2021 • Hao Wang, Zheng-Jun Zha, Liang Li, Dong Liu, Jiebo Luo
In particular, for cross-modal interaction, we interact the sentence-level query with the whole moment while interact the word-level query with content and boundary, as in a coarse-to-fine manner.
no code implementations • CVPR 2021 • Man Zhou, Jie Xiao, Yifan Chang, Xueyang Fu, Aiping Liu, Jinshan Pan, Zheng-Jun Zha
The proposed model is capable of achieving superior performance on both inhomogeneous and incremental datasets, and is promising for highly compact systems to gradually learn myriad regularities of the different types of rain streaks.
no code implementations • CVPR 2021 • Zhen Cheng, Zhiwei Xiong, Chang Chen, Dong Liu, Zheng-Jun Zha
To fill this gap, we propose a zero-shot learning framework for light field SR, which learns a mapping to super-resolve the reference view with examples extracted solely from the input low-resolution light field itself.
no code implementations • 7 May 2021 • Jiawei Liu, Zhipeng Huang, Kecheng Zheng, Dong Liu, Xiaoyan Sun, Zheng-Jun Zha
It describes unseen target domain as a combination of the known source ones, and explicitly learns domain-specific representation with target distribution to improve the model's generalization by a meta-learning pipeline.
no code implementations • CVPR 2021 • Jiawei Liu, Zheng-Jun Zha, Wei Wu, Kecheng Zheng, Qibin Sun
The key factor for video person re-identification is to effectively exploit both spatial and temporal clues from video sequences.
no code implementations • 29 Mar 2021 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha, Hongtao Xie, Jiebo Luo
The cross-modal memory module is employed to record the instance embeddings of all the datasets for global negative mining.
1 code implementation • CVPR 2021 • Shaofei Cai, Liang Li, Jincan Deng, Beichen Zhang, Zheng-Jun Zha, Li Su, Qingming Huang
Inspired by the strong searching capability of neural architecture search (NAS) in CNN, this paper proposes Graph Neural Architecture Search (GNAS) with novel-designed search space.
1 code implementation • CVPR 2021 • Kecheng Zheng, Wu Liu, Lingxiao He, Tao Mei, Jiebo Luo, Zheng-Jun Zha
In this paper, we propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.
Domain Adaptive Person Re-Identification
Online Clustering
+3
no code implementations • 24 Feb 2021 • Shunxin Xu, Ke Sun, Dong Liu, Zhiwei Xiong, Zheng-Jun Zha
We observe that not only denoising helps combat the drop of segmentation accuracy due to noise, but also pixel-wise semantic information boosts the capability of denoising.
no code implementations • 3 Feb 2021 • Yucheng Zhao, Dacheng Yin, Chong Luo, Zhiyuan Zhao, Chuanxin Tang, Wenjun Zeng, Zheng-Jun Zha
This paper presents a self-supervised learning framework, named MGF, for general-purpose speech representation learning.
no code implementations • 28 Jan 2021 • Yizhou Zhou, Chong Luo, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng
We believe that VAE$^2$ is also applicable to other stochastic sequence prediction problems where training data are lack of stochasticity.
no code implementations • ICCV 2021 • Yukun Huang, Xueyang Fu, Zheng-Jun Zha
In unconstrained real-world surveillance scenarios, person re-identification (Re-ID) models usually suffer from different low-level perceptual variations, e. g., cross-resolution and insufficient lighting.
no code implementations • ICCV 2021 • Jie Xiao, Man Zhou, Xueyang Fu, Aiping Liu, Zheng-Jun Zha
Equipped with our NR algorithm, the deep model can be trained on a list of synthetic rainy datasets by overcoming catastrophic forgetting, making it a general-version de-raining network.
no code implementations • ICCV 2021 • Xueyang Fu, Xi Wang, Aiping Liu, Junwei Han, Zheng-Jun Zha
Specifically, we design a variational model to formulate the image de-blocking problem and propose two prior terms for the image content and gradient, respectively.
no code implementations • ICCV 2021 • Yao Li, Xueyang Fu, Zheng-Jun Zha
However, the real noisy images in practical are mostly of high resolution rather than the cropped small patches and the vanilla training strategies ignore the cross-patch contextual dependency in the whole image.
1 code implementation • 16 Dec 2020 • Kecheng Zheng, Cuiling Lan, Wenjun Zeng, Zhizheng Zhang, Zheng-Jun Zha
Based on this finding, we propose to exploit the uncertainty (measured by consistency levels) to evaluate the reliability of the pseudo-label of a sample and incorporate the uncertainty to re-weight its contribution within various ReID losses, including the identity (ID) classification loss per sample, the triplet loss, and the contrastive loss.
Domain Adaptive Person Re-Identification
Person Re-Identification
+2
1 code implementation • NeurIPS 2020 • Heliang Zheng, Jianlong Fu, Yanhong Zeng, Jiebo Luo, Zheng-Jun Zha
Such a model disentangles latent factors according to the semantic of feature channels by channel-/group- wise fusion of latent codes and feature channels.
no code implementations • NeurIPS 2020 • Shaobo Min, Hongtao Xie, Hantao Yao, Xuran Deng, Zheng-Jun Zha, Yongdong Zhang
In this paper, we introduce a new task, named Hierarchical Granularity Transfer Learning (HGTL), to recognize sub-level categories with basic-level annotations and semantic descriptions for hierarchical categories.
no code implementations • 10 Oct 2020 • Kecheng Zheng, Wu Liu, Jiawei Liu, Zheng-Jun Zha, Tao Mei
This hard selection strategy is able to fuse the strong-relevant multi-modality features for alleviating the problem of matching redundancy.
Ranked #9 on
Text based Person Retrieval
on CUHK-PEDES
no code implementations • 9 Sep 2020 • Jiawei Liu, Xierong Zhu, Zheng-Jun Zha
TALNet simultaneously exploits human attributes and appearance to learn comprehensive and effective pedestrian representations from videos.
1 code implementation • 31 Aug 2020 • Yuhang Li, Xuejin Chen, Binxin Yang, Zihan Chen, Zhihua Cheng, Zheng-Jun Zha
In this paper, we explore the task of generating photo-realistic face images from hand-drawn sketches.
1 code implementation • 10 Aug 2020 • Jing Zhang, Yang Cao, Zheng-Jun Zha, DaCheng Tao
To address this issue, we propose a novel synthetic method called 3R to simulate nighttime hazy images from daytime clear images, which first reconstructs the scene geometry, then simulates the light rays and object reflectance, and finally renders the haze effects.
1 code implementation • 17 Jul 2020 • Ganchao Tan, Daqing Liu, Meng Wang, Zheng-Jun Zha
However, existing visual reasoning methods designed for visual question answering are not appropriate to video captioning, for it requires more complex visual reasoning on videos over both space and time, and dynamic module composition along the generation process.
no code implementations • 9 May 2020 • Jun He, Richang Hong, Xueliang Liu, Mingliang Xu, Zheng-Jun Zha, Meng Wang
Metric-based few-shot learning methods concentrate on learning transferable feature embedding that generalizes well from seen categories to unseen categories under the supervision of limited number of labelled instances.
no code implementations • 12 Apr 2020 • Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao
Few-shot segmentation aims at assigning a category label to each image pixel with few annotated samples.
1 code implementation • CVPR 2020 • Beichen Zhang, Liang Li, Shijie Yang, Shuhui Wang, Zheng-Jun Zha, Qingming Huang
In this paper, we propose a state relabeling adversarial active learning model (SRAAL), that leverages both the annotation and the labeled/unlabeled state information for deriving the most informative unlabeled samples.
no code implementations • 10 Apr 2020 • Rui Zhao, Kecheng Zheng, Zheng-Jun Zha
Existing dominant approaches for cross-modal video-text retrieval task are to learn a joint embedding space to measure the cross-modal similarity.
1 code implementation • CVPR 2020 • Dechao Meng, Liang Li, Xuejing Liu, Yadong Li, Shijie Yang, Zheng-Jun Zha, Xingyu Gao, Shuhui Wang, Qingming Huang
Vehicle Re-Identification is to find images of the same vehicle from various views in the cross-camera scenario.
1 code implementation • CVPR 2020 • Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, Yongdong Zhang
Then a novel Local Orthogonal Texture-aware Module (LOTM) models the local texture information of proposal features in two orthogonal directions and represents text region with a set of contour points.
no code implementations • CVPR 2020 • Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng
Based on the probability space, we further generate new fusion strategies which achieve the state-of-the-art performance on four well-known action recognition datasets.
no code implementations • CVPR 2020 • Yukun Huang, Zheng-Jun Zha, Xueyang Fu, Richang Hong, Liang Li
Person re-identification (Re-ID) in real-world scenarios usually suffers from various degradation factors, e. g., low-resolution, weak illumination, blurring and adverse weather.
no code implementations • 10 Apr 2020 • Jiawei Liu, Zheng-Jun Zha, Xierong Zhu, Na Jiang
Person re-identification aims at identifying a certain pedestrian across non-overlapping camera networks.
1 code implementation • CVPR 2020 • Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts.
Ranked #12 on
Visual Dialog
on VisDial v0.9 val
1 code implementation • 30 Mar 2020 • Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang
In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity.
1 code implementation • CVPR 2020 • Shaobo Min, Hantao Yao, Hongtao Xie, Chaoqun Wang, Zheng-Jun Zha, Yongdong Zhang
Recent methods focus on learning a unified semantic-aligned visual representation to transfer knowledge between two domains, while ignoring the effect of semantic-free visual representation in alleviating the biased recognition problem.
no code implementations • CVPR 2020 • Ziqi Zhang, Yaya Shi, Chunfeng Yuan, Bing Li, Peijin Wang, Weiming Hu, Zheng-Jun Zha
In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy.
Ranked #2 on
Video Captioning
on VATEX
(using extra training data)
no code implementations • 17 Dec 2019 • Zhao Zhang, Yulin Sun, Yang Wang, Zheng-Jun Zha, Shuicheng Yan, Meng Wang
To address this issue, we propose a novel generalized end-to-end representation learning architecture, dubbed Convolutional Dictionary Pair Learning Network (CDPL-Net) in this paper, which integrates the learning schemes of the CNN and dictionary pair learning into a unified framework.
no code implementations • 13 Dec 2019 • Yan Zhang, Zhao Zhang, Zheng Zhang, Mingbo Zhao, Li Zhang, Zheng-Jun Zha, Meng Wang
In this paper, we investigate the unsupervised deep representation learning issue and technically propose a novel framework called Deep Self-representative Concept Factorization Network (DSCF-Net), for clustering deep features.
no code implementations • 13 Dec 2019 • Jialing Lyu, Weichao Qiu, Xinyue Wei, Yi Zhang, Alan Yuille, Zheng-Jun Zha
This can explain why an activity classification model usually fails to generalize to datasets it is not trained on.
1 code implementation • NeurIPS 2019 • Kecheng Zheng, Zheng-Jun Zha, Wei Wei
Abstraction reasoning is a long-standing challenge in artificial intelligence.
no code implementations • 26 Nov 2019 • Yang Wang, Yang Cao, Zheng-Jun Zha, Jing Zhang, Zhiwei Xiong, Wei zhang, Feng Wu
Contrast enhancement and noise removal are coupled problems for low-light image enhancement.
1 code implementation • NeurIPS 2019 • Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
However, the computational cost to learn pairwise interactions between deep feature channels is prohibitively expensive, which restricts this powerful transformation to be used in deep neural networks.
no code implementations • 20 Oct 2019 • Yuhang Li, Xuejin Chen, Feng Wu, Zheng-Jun Zha
The large-scale discriminator enforces the completeness of global structures and the small-scale discriminator encourages fine details, thereby enhancing the realism of generated face images.
1 code implementation • 5 Sep 2019 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Li Su, Qingming Huang
Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the query is unknown in the training stage.
1 code implementation • ICCV 2019 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang
It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction.
no code implementations • 21 Aug 2019 • Zhao Zhang, Lei Wang, Sheng Li, Yang Wang, Zheng Zhang, Zheng-Jun Zha, Meng Wang
Specifically, AS-LRC performs the latent decomposition of given data into a low-rank reconstruction by a block-diagonal codes matrix, a group sparse locality-adaptive salient feature part and a sparse error part.
1 code implementation • 12 Aug 2019 • Shaobo Min, Hantao Yao, Hongtao Xie, Zheng-Jun Zha, Yongdong Zhang
In contrast to previous methods, the DSEN decomposes the domain-shared projection function into one domain-invariant and two domain-specific sub-functions to explore the similarities and differences between two domains.
no code implementations • 4 Aug 2019 • Zhao Zhang, Jiahuan Ren, Sheng Li, Richang Hong, Zheng-Jun Zha, Meng Wang
Leveraging on the Frobenius-norm based latent low-rank representation model, rBDLR jointly learns the coding coefficients and salient features, and improves the results by enhancing the robustness to outliers and errors in given data, preserving local information of salient features adaptively and ensuring the block-diagonal structures of the coefficients.
1 code implementation • 13 Jul 2019 • Xiaotian Chen, Xuejin Chen, Zheng-Jun Zha
We propose a Residual Pyramid Decoder (RPD) which expresses global scene structure in upper levels to represent layouts, and local structure in lower levels to present shape details.
Ranked #34 on
Monocular Depth Estimation
on NYU-Depth V2
no code implementations • ACL 2019 • Jianxing Yu, Zheng-Jun Zha, Jian Yin
This paper focuses on the topic of inferential machine comprehension, which aims to fully understand the meanings of given text to answer generic questions, especially the ones needed reasoning skills.
1 code implementation • 23 Jun 2019 • Yizhou Zhou, Xiaoyan Sun, Chong Luo, Zheng-Jun Zha, Wen-Jun Zeng
Accordingly, a hybrid network representation is presented which enables us to leverage the Variational Dropout so that the approximation of the posterior distribution becomes fully gradient-based and highly efficient.
no code implementations • 9 Jun 2019 • Daqing Liu, Hanwang Zhang, Zheng-Jun Zha, Meng Wang, Qianru Sun
In this paper, we alleviate the missing-annotation problem and enable the joint reasoning by leveraging the language scene graph which covers both labeled referent and unlabeled contexts (other objects, attributes, and relationships).
1 code implementation • 6 Jun 2019 • Zheng-Jun Zha, Daqing Liu, Hanwang Zhang, Yongdong Zhang, Feng Wu
With the maturity of visual detection techniques, we are more ambitious in describing visual content with open-vocabulary, fine-grained and free-form language, i. e., the task of image captioning.
no code implementations • 16 May 2019 • Kai Zhu, Wei Zhai, Zheng-Jun Zha, Yang Cao
In this paper, we tackle one-shot texture retrieval: given an example of a new reference texture, detect and segment all the pixels of the same texture category within an arbitrary image.
no code implementations • 8 May 2019 • Liang Sun, Bing Li, Chunfeng Yuan, Zheng-Jun Zha, Weiming Hu
Inspired by the fact that different modalities in videos carry complementary information, we propose a Multimodal Semantic Attention Network(MSAN), which is a new encoder-decoder framework incorporating multimodal semantic attributes for video captioning.
1 code implementation • CVPR 2019 • Chang Chen, Zhiwei Xiong, Xinmei Tian, Zheng-Jun Zha, Feng Wu
Existing methods for single image super-resolution (SR) are typically evaluated with synthetic degradation models such as bicubic or Gaussian downsampling.
1 code implementation • CVPR 2019 • Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
Learning subtle yet discriminative features (e. g., beak and eyes for a bird) plays a significant role in fine-grained image recognition.
Ranked #1 on
Fine-Grained Image Classification
on iNaturalist
Fine-Grained Image Classification
Fine-Grained Image Recognition
no code implementations • ICCV 2019 • Tianhao Yang, Zheng-Jun Zha, Hanwang Zhang
We study the multi-round response generation in visual dialog, where a response is generated according to a visually grounded conversational history.
Ranked #10 on
Visual Dialog
on VisDial v0.9 val
no code implementations • ICCV 2019 • Daqing Liu, Hanwang Zhang, Feng Wu, Zheng-Jun Zha
In particular, we develop a novel modular network called Neural Module Tree network (NMTree) that regularizes the visual grounding along the dependency parsing tree of the sentence, where each node is a neural module that calculates visual attention according to its linguistic feature, and the grounding score is accumulated in a bottom-up direction where as needed.
no code implementations • 19 Nov 2018 • Jiawei Liu, Zheng-Jun Zha, Hongtao Xie, Zhiwei Xiong, Yongdong Zhang
An appearance network is developed to learn appearance features from the full body, horizontal and vertical body parts of pedestrians with spatial dependencies among body parts.
no code implementations • ECCV 2018 • Jiafan Zhuang, Saihui Hou, Zilei Wang, Zheng-Jun Zha
License plate recognition (LPR) is a fundamental component of various intelligent transport systems, which is always expected to be accurate and efficient enough.
1 code implementation • 16 Aug 2018 • Daqing Liu, Zheng-Jun Zha, Hanwang Zhang, Yongdong Zhang, Feng Wu
To fill the gap, we propose a Context-Aware Visual Policy network (CAVP) for sequence-level image captioning.
no code implementations • 31 Jul 2018 • Shaobo Min, Xuejin Chen, Zheng-Jun Zha, Feng Wu, Yongdong Zhang
\begin{abstract} Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation.
no code implementations • CVPR 2018 • Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wen-Jun Zeng
Recent attempts use 3D convolutional neural networks (CNNs) to explore spatio-temporal information for human action recognition.
1 code implementation • 28 Feb 2018 • Dong Liu, Ke Sun, Zhangyang Wang, Runsheng Liu, Zheng-Jun Zha
We propose an interpretable deep structure namely Frank-Wolfe Network (F-W Net), whose architecture is inspired by unrolling and truncating the Frank-Wolfe algorithm for solving an $L_p$-norm constrained problem with $p\geq 1$.
no code implementations • 21 Feb 2017 • Wei Zhang, Shengnan Hu, Kan Liu, Zheng-Jun Zha
This paper presents a novel approach for video-based person re-identification using multiple Convolutional Neural Networks (CNNs).
no code implementations • CVPR 2016 • Chenyi Lei, Dong Liu, Weiping Li, Zheng-Jun Zha, Houqiang Li
In many image-related tasks, learning expressive and discriminative representations of images is essential, and deep learning has been studied for automating the learning of such representations.