no code implementations • ECCV 2020 • Xinzhe Han, Shuhui Wang, Chi Su, Weigang Zhang, Qingming Huang, Qi Tian
In this paper, we rethink implicit reasoning process in VQA, and propose a new formulation which maximizes the log-likelihood of joint distribution for the observed question and predicted answer.
1 code implementation • 30 Mar 2023 • Xiaodan Li, Yuefeng Chen, Yao Zhu, Shuhui Wang, Rong Zhang, Hui Xue
We also evaluate some robust models including both adversarially trained models and other robust trained models and find that some models show worse robustness against attribute changes than vanilla models.
no code implementations • 1 Feb 2023 • Guanqi Ding, Xinzhe Han, Shuhui Wang, Xin Jin, Dandan Tu, Qingming Huang
SAGE takes use of all given few-shot images and estimates a class center embedding based on the category-relevant attribute dictionary.
no code implementations • 8 Dec 2022 • Chen Zhang, Guorong Li, Yuankai Qi, Shuhui Wang, Laiyun Qing, Qingming Huang, Ming-Hsuan Yang
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
no code implementations • 22 Nov 2022 • Lingxiao Li, Yi Zhang, Shuhui Wang
Existing methods suffer from the trade-off between the quality and diversity of generated images.
no code implementations • 5 Aug 2022 • Wei Dai, Ziyao Zhang, Lixia Tian, Shengyuan Yu, Shuhui Wang, Zhao Dong, Hairong Zheng
The low representation ability of FC leads to poor performance in clinical practice, especially when dealing with multimodal medical data involving multiple types of visual signals and textual records for brain diseases.
1 code implementation • 26 Jul 2022 • Weidong Chen, Dexiang Hong, Yuankai Qi, Zhenjun Han, Shuhui Wang, Laiyun Qing, Qingming Huang, Guorong Li
To address this problem, we propose a multi-attention network which consists of dual-path dual-attention module and a query-based cross-modal Transformer module.
Ranked #2 on
Referring Expression Segmentation
on A2D Sentences
Referring Expression Segmentation
Referring Video Object Segmentation
+2
1 code implementation • 18 Jul 2022 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang
Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects.
no code implementations • 14 Jun 2022 • Xiu Qi Chang, Ann Feng Chew, Benjamin Chen Ming Choong, Shuhui Wang, Rui Han, Wang He, Li Xiaolin, Rajesh C. Panicker, Deepu John
Deep neural networks (DNN) are a promising tool in medical applications.
no code implementations • 2 Apr 2022 • Zhenhuan Liu, Jincan Deng, Liang Li, Shaofei Cai, Qianqian Xu, Shuhui Wang, Qingming Huang
Conditional image generation is an active research topic including text2image and image translation.
no code implementations • 2 Apr 2022 • Zhenhuan Liu, Liang Li, Huajie Jiang, Xin Jin, Dandan Tu, Shuhui Wang, Zheng-Jun Zha
Furthermore, we devise the spatio-temporal correlative map as a style-independent, global-aware regularization on the perceptual motion consistency.
1 code implementation • CVPR 2022 • Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, Xin Jin, Dandan Tu, Qingming Huang
Few-shot image generation is a challenging task even using the state-of-the-art Generative Adversarial Networks (GANs).
1 code implementation • CVPR 2022 • Zhengcong Fei, Xu Yan, Shuhui Wang, Qi Tian
On one hand, the representation in shallow layers lacks high-level semantic and sufficient cross-modal fusion information for accurate prediction.
1 code implementation • 20 Dec 2021 • Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian
Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios.
1 code implementation • CVPR 2022 • Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang
(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.
1 code implementation • 23 Nov 2021 • Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Weigang Zhang, Qingming Huang
Based on TDC, we propose the temporal dynamic concept modeling network (TDCMN) to learn an accurate and complete concept representation for efficient untrimmed video analysis.
1 code implementation • 23 Nov 2021 • Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Qingming Huang, Qi Tian
Future activity anticipation is a challenging problem in egocentric vision.
no code implementations • 19 Nov 2021 • Xu Yan, Zhengcong Fei, Shuhui Wang, Qingming Huang, Qi Tian
Dense video captioning (DVC) aims to generate multi-sentence descriptions to elucidate the multiple events in the video, which is challenging and demands visual consistency, discoursal coherence, and linguistic diversity.
1 code implementation • 11 Oct 2021 • Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, Qi Tian
Non-autoregressive image captioning with continuous iterative refinement, which eliminates the sequential dependence in a sentence generation, can achieve comparable performance to the autoregressive counterparts with a considerable acceleration.
1 code implementation • ICCV 2021 • Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian
Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information.
Ranked #2 on
Visual Question Answering (VQA)
on VQA-CP
1 code implementation • 13 Jul 2021 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian
Due to the domain discrepancy in visual domain adaptation, the performance of source model degrades when bumping into the high data density near decision boundary in target domain.
1 code implementation • 7 Jul 2021 • Xiaodong Wang, Junbao Zhuo, Shuhao Cui, Shuhui Wang
Semi-supervised domain adaptation (SSDA) aims to solve tasks in target domain by utilizing transferable information learned from the available source domain and a few labeled target data.
1 code implementation • 19 Apr 2021 • Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, Liang Wang
To be specific, in the proposed LATTICE model, we devise a novel modality-aware structure learning layer, which learns item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs.
no code implementations • CVPR 2021 • Xiaodan Li, Jinfeng Li, Yuefeng Chen, Shaokai Ye, Yuan He, Shuhui Wang, Hang Su, Hui Xue
Comprehensive experiments show that the proposed attack achieves a high attack success rate with few queries against the image retrieval systems under the black-box setting.
1 code implementation • 10 Dec 2020 • Xiaofeng Mao, Yuefeng Chen, Shuhui Wang, Hang Su, Yuan He, Hui Xue
Adversarial attack is a technique for deceiving Machine Learning (ML) models, which provides a way to evaluate the adversarial robustness.
1 code implementation • NeurIPS 2020 • Shuhao Cui, Xuan Jin, Shuhui Wang, Yuan He, Qingming Huang
In visual domain adaptation (DA), separating the domain-specific characteristics from the domain-invariant representations is an ill-posed problem.
no code implementations • 16 Oct 2020 • Jianfeng He, Xuchao Zhang, Shuo Lei, Shuhui Wang, Qingming Huang, Chang-Tien Lu, Bei Xiao
Each MEx area has the mask area of the generation as the majority and the boundary of original context as the minority.
1 code implementation • CVPR 2020 • Jun Wei, Shuhui Wang, Zhe Wu, Chi Su, Qingming Huang, Qi Tian
Though remarkable progress has been achieved, we observe that the closer the pixel is to the edge, the more difficult it is to be predicted, because edge pixels have a very imbalance distribution.
Ranked #1 on
Saliency Detection
on HKU-IS
no code implementations • 11 Aug 2020 • Xiaodan Li, Yining Lang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Shuhui Wang, Hui Xue, Quan Lu
A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction, rather than from instance embeddings to instance prediction and then to bag prediction in traditional MIL.
1 code implementation • CVPR 2020 • Dechao Meng, Liang Li, Xuejing Liu, Yadong Li, Shijie Yang, Zheng-Jun Zha, Xingyu Gao, Shuhui Wang, Qingming Huang
Vehicle Re-Identification is to find images of the same vehicle from various views in the cross-camera scenario.
1 code implementation • CVPR 2020 • Beichen Zhang, Liang Li, Shijie Yang, Shuhui Wang, Zheng-Jun Zha, Qingming Huang
In this paper, we propose a state relabeling adversarial active learning model (SRAAL), that leverages both the annotation and the labeled/unlabeled state information for deriving the most informative unlabeled samples.
2 code implementations • CVPR 2020 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Chi Su, Qingming Huang, Qi Tian
On the discriminator, GVB contributes to enhance the discriminating ability, and balance the adversarial training process.
2 code implementations • CVPR 2020 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian
We find by theoretical analysis that the prediction discriminability and diversity could be separately measured by the Frobenius-norm and rank of the batch output matrix.
4 code implementations • 26 Nov 2019 • Jun Wei, Shuhui Wang, Qingming Huang
Furthermore, different from binary cross entropy, the proposed PPA loss doesn't treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details.
Ranked #3 on
Salient Object Detection
on DUT-OMRON
1 code implementation • 5 Sep 2019 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Li Su, Qingming Huang
Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the query is unknown in the training stage.
1 code implementation • ICCV 2019 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang
It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction.
1 code implementation • 14 Aug 2019 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian
Multimodal learning aims to discover the relationship between multiple modalities.
1 code implementation • CVPR 2019 • Junbao Zhuo, Shuhui Wang, Shuhao Cui, Qingming Huang
We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain S is only a subset of those in unlabeled target domain T. The task is to correctly classify all samples in T including known and unknown categories.
no code implementations • ECCV 2018 • Yangyu Chen, Shuhui Wang, Weigang Zhang, Qingming Huang
We propose a plug-and-play PickNet to perform informative frame picking in video captioning.
no code implementations • ICCV 2017 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian
We incorporate the harmonization mechanism into the learning process of multimodal GPLVMs.
no code implementations • CVPR 2017 • Yiling Wu, Shuhui Wang, Qingming Huang
In this paper, we propose an online learning method to learn the similarity function between heterogeneous modalities by preserving the relative similarity in the training data, which is modeled as a set of bi-directional hinge loss constraints on the cross-modal training triplets.
1 code implementation • CVPR 2017 • Shijie Yang, Liang Li, Shuhui Wang, Weigang Zhang, Qingming Huang
Deep Auto-Encoder (DAE) has shown its promising power in high-level representation learning.
no code implementations • ICCV 2015 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian
Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects.
no code implementations • CVPR 2013 • Li Shen, Shuhui Wang, Gang Sun, Shuqiang Jiang, Qingming Huang
For each internode of the hierarchical category structure, a discriminative dictionary and a set of classification models are learnt for visual categorization, and the dictionaries in different layers are learnt to exploit the discriminative visual properties of different granularity.