Search Results for author: Shuhui Wang

Found 54 papers, 37 papers with code

Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision

no code implementations ECCV 2020 Xinzhe Han, Shuhui Wang, Chi Su, Weigang Zhang, Qingming Huang, Qi Tian

In this paper, we rethink implicit reasoning process in VQA, and propose a new formulation which maximizes the log-likelihood of joint distribution for the observed question and predicted answer.

Question Answering Visual Question Answering +1

Uncertainty-boosted Robust Video Activity Anticipation

1 code implementation29 Apr 2024 Zhaobo Qi, Shuhui Wang, Weigang Zhang, Qingming Huang

Video activity anticipation aims to predict what will happen in the future, embracing a broad application prospect ranging from robot vision and autonomous driving.

Autonomous Driving

Confusing Pair Correction Based on Category Prototype for Domain Adaptation under Noisy Environments

1 code implementation19 Mar 2024 Churan Zhi, Junbao Zhuo, Shuhui Wang

In this paper, we address unsupervised domain adaptation under noisy environments, which is more challenging and practical than traditional domain adaptation.

Unsupervised Domain Adaptation

A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes

no code implementations12 Mar 2024 Ting Yu, Xiaojun Lin, Shuhui Wang, Weiguo Sheng, Qingming Huang, Jun Yu

Three-Dimensional (3D) dense captioning is an emerging vision-language bridging task that aims to generate multiple detailed and accurate descriptions for 3D scenes.

3D dense captioning Dense Captioning

Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video

1 code implementation15 Jan 2024 Zhaobo Qi, Yibo Yuan, Xiaowen Ruan, Shuhui Wang, Weigang Zhang, Qingming Huang

Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue, which is caused by the uneven temporal distribution of the target moments for samples with similar semantic components in input videos or query texts.

Sentence Temporal Sentence Grounding

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

no code implementations13 Oct 2023 Jiayu Xiao, Henglei Lv, Liang Li, Shuhui Wang, Qingming Huang

Recent text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images given text-prompts as input.

Text-to-Image Generation

Open-Set Knowledge-Based Visual Question Answering with Inference Paths

1 code implementation12 Oct 2023 Jingru Gan, Xinzhe Han, Shuhui Wang, Qingming Huang

Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.

Knowledge Graphs Multi-class Classification +2

COMICS: End-to-end Bi-grained Contrastive Learning for Multi-face Forgery Detection

1 code implementation3 Aug 2023 Cong Zhang, Honggang Qi, Shuhui Wang, Yuezun Li, Siwei Lyu

One straightforward way to address this issue is to simultaneous process multi-face by integrating face extraction and forgery detection in an end-to-end fashion by adapting advanced object detection architectures.

Contrastive Learning Object +2

ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing

2 code implementations CVPR 2023 Xiaodan Li, Yuefeng Chen, Yao Zhu, Shuhui Wang, Rong Zhang, Hui Xue

We also evaluate some robust models including both adversarially trained models and other robust trained models and find that some models show worse robustness against attribute changes than vanilla models.

Attribute Benchmarking +1

Stable Attribute Group Editing for Reliable Few-shot Image Generation

1 code implementation1 Feb 2023 Guanqi Ding, Xinzhe Han, Shuhui Wang, Xin Jin, Dandan Tu, Qingming Huang

SAGE takes use of all given few-shot images and estimates a class center embedding based on the category-relevant attribute dictionary.

Attribute Classification +1

Multimodal Brain Disease Classification with Functional Interaction Learning from Single fMRI Volume

no code implementations5 Aug 2022 Wei Dai, Ziyao Zhang, Lixia Tian, Shengyuan Yu, Shuhui Wang, Zhao Dong, Hairong Zheng

The low representation ability of FC leads to poor performance in clinical practice, especially when dealing with multimodal medical data involving multiple types of visual signals and textual records for brain diseases.

Time Series Analysis

Multi-Attention Network for Compressed Video Referring Object Segmentation

1 code implementation26 Jul 2022 Weidong Chen, Dexiang Hong, Yuankai Qi, Zhenjun Han, Shuhui Wang, Laiyun Qing, Qingming Huang, Guorong Li

To address this problem, we propose a multi-attention network which consists of dual-path dual-attention module and a query-based cross-modal Transformer module.

Object Referring Expression Segmentation +4

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation18 Jul 2022 Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang

Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects.

Attribute Referring Expression +2

Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency

1 code implementation2 Apr 2022 Zhenhuan Liu, Liang Li, Huajie Jiang, Xin Jin, Dandan Tu, Shuhui Wang, Zheng-Jun Zha

Furthermore, we devise the spatio-temporal correlative map as a style-independent, global-aware regularization on the perceptual motion consistency.

Decoder Optical Flow Estimation +1

DeeCap: Dynamic Early Exiting for Efficient Image Captioning

1 code implementation CVPR 2022 Zhengcong Fei, Xu Yan, Shuhui Wang, Qi Tian

On one hand, the representation in shallow layers lacks high-level semantic and sufficient cross-modal fusion information for accurate prediction.

Image Captioning Imitation Learning

General Greedy De-bias Learning

1 code implementation20 Dec 2021 Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios.

Image Classification Question Answering +1

Hierarchical Modular Network for Video Captioning

1 code implementation CVPR 2022 Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang

(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.

Representation Learning Sentence +1

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

1 code implementation23 Nov 2021 Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Weigang Zhang, Qingming Huang

Based on TDC, we propose the temporal dynamic concept modeling network (TDCMN) to learn an accurate and complete concept representation for efficient untrimmed video analysis.

Image Categorization

DVCFlow: Modeling Information Flow Towards Human-like Video Captioning

no code implementations19 Nov 2021 Xu Yan, Zhengcong Fei, Shuhui Wang, Qingming Huang, Qi Tian

Dense video captioning (DVC) aims to generate multi-sentence descriptions to elucidate the multiple events in the video, which is challenging and demands visual consistency, discoursal coherence, and linguistic diversity.

Dense Video Captioning Diversity +1

Semi-Autoregressive Image Captioning

1 code implementation11 Oct 2021 Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, Qi Tian

Non-autoregressive image captioning with continuous iterative refinement, which eliminates the sequential dependence in a sentence generation, can achieve comparable performance to the autoregressive counterparts with a considerable acceleration.

Decoder Image Captioning +1

Greedy Gradient Ensemble for Robust Visual Question Answering

1 code implementation ICCV 2021 Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information.

Question Answering Visual Question Answering

Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation

1 code implementation13 Jul 2021 Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian

Due to the domain discrepancy in visual domain adaptation, the performance of source model degrades when bumping into the high data density near decision boundary in target domain.

Diversity Domain Adaptation

Learning Invariant Representation with Consistency and Diversity for Semi-supervised Source Hypothesis Transfer

1 code implementation7 Jul 2021 Xiaodong Wang, Junbao Zhuo, Shuhao Cui, Shuhui Wang

Semi-supervised domain adaptation (SSDA) aims to solve tasks in target domain by utilizing transferable information learned from the available source domain and a few labeled target data.

Diversity Domain Adaptation +1

Mining Latent Structures for Multimedia Recommendation

1 code implementation19 Apr 2021 Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, Liang Wang

To be specific, in the proposed LATTICE model, we devise a novel modality-aware structure learning layer, which learns item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs.

Collaborative Filtering Multimedia recommendation +1

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

no code implementations CVPR 2021 Xiaodan Li, Jinfeng Li, Yuefeng Chen, Shaokai Ye, Yuan He, Shuhui Wang, Hang Su, Hui Xue

Comprehensive experiments show that the proposed attack achieves a high attack success rate with few queries against the image retrieval systems under the black-box setting.

Image Classification Image Retrieval +1

Composite Adversarial Attacks

1 code implementation10 Dec 2020 Xiaofeng Mao, Yuefeng Chen, Shuhui Wang, Hang Su, Yuan He, Hui Xue

Adversarial attack is a technique for deceiving Machine Learning (ML) models, which provides a way to evaluate the adversarial robustness.

Adversarial Attack Adversarial Robustness

Heuristic Domain Adaptation

1 code implementation NeurIPS 2020 Shuhao Cui, Xuan Jin, Shuhui Wang, Yuan He, Qingming Huang

In visual domain adaptation (DA), separating the domain-specific characteristics from the domain-invariant representations is an ill-posed problem.

Domain Adaptation

Semantic Editing On Segmentation Map Via Multi-Expansion Loss

no code implementations16 Oct 2020 Jianfeng He, Xuchao Zhang, Shuo Lei, Shuhui Wang, Qingming Huang, Chang-Tien Lu, Bei Xiao

Each MEx area has the mask area of the generation as the majority and the boundary of original context as the minority.

Image Inpainting Segmentation

Label Decoupling Framework for Salient Object Detection

1 code implementation CVPR 2020 Jun Wei, Shuhui Wang, Zhe Wu, Chi Su, Qingming Huang, Qi Tian

Though remarkable progress has been achieved, we observe that the closer the pixel is to the edge, the more difficult it is to be predicted, because edge pixels have a very imbalance distribution.

Object object-detection +3

Sharp Multiple Instance Learning for DeepFake Video Detection

no code implementations11 Aug 2020 Xiaodan Li, Yining Lang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Shuhui Wang, Hui Xue, Quan Lu

A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction, rather than from instance embeddings to instance prediction and then to bag prediction in traditional MIL.

Face Swapping Multiple Instance Learning

State-Relabeling Adversarial Active Learning

1 code implementation CVPR 2020 Beichen Zhang, Liang Li, Shijie Yang, Shuhui Wang, Zheng-Jun Zha, Qingming Huang

In this paper, we propose a state relabeling adversarial active learning model (SRAAL), that leverages both the annotation and the labeled/unlabeled state information for deriving the most informative unlabeled samples.

Active Learning

Gradually Vanishing Bridge for Adversarial Domain Adaptation

2 code implementations CVPR 2020 Shuhao Cui, Shuhui Wang, Junbao Zhuo, Chi Su, Qingming Huang, Qi Tian

On the discriminator, GVB contributes to enhance the discriminating ability, and balance the adversarial training process.

Unsupervised Domain Adaptation

Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations

2 code implementations CVPR 2020 Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian

We find by theoretical analysis that the prediction discriminability and diversity could be separately measured by the Frobenius-norm and rank of the batch output matrix.

Diversity Domain Adaptation

F3Net: Fusion, Feedback and Focus for Salient Object Detection

4 code implementations26 Nov 2019 Jun Wei, Shuhui Wang, Qingming Huang

Furthermore, different from binary cross entropy, the proposed PPA loss doesn't treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details.

Dichotomous Image Segmentation Object +2

Learning fragment self-attention embeddings for image-text matching

1 code implementation ACMMM 2019 Yiling Wu, Shuhui Wang, Guoli Song, Qingming Huang

In this paper, we propose Self-Attention Embeddings (SAEM) to exploit fragment relations in images or texts by self-attention mechanism, and aggregate fragment information into visual and textual embeddings.

Image-text matching Sentence +1

Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation5 Sep 2019 Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Li Su, Qingming Huang

Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the query is unknown in the training stage.

Object Referring Expression +2

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation ICCV 2019 Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang

It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction.

Attribute Referring Expression +1

Unsupervised Open Domain Recognition by Semantic Discrepancy Minimization

1 code implementation CVPR 2019 Junbao Zhuo, Shuhui Wang, Shuhao Cui, Qingming Huang

We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain S is only a subset of those in unlabeled target domain T. The task is to correctly classify all samples in T including known and unknown categories.

Classification General Classification

Online Asymmetric Similarity Learning for Cross-Modal Retrieval

no code implementations CVPR 2017 Yiling Wu, Shuhui Wang, Qingming Huang

In this paper, we propose an online learning method to learn the similarity function between heterogeneous modalities by preserving the relative similarity in the training data, which is modeled as a set of bi-directional hinge loss constraints on the cross-modal training triplets.

Cross-Modal Retrieval Retrieval +2

Similarity Gaussian Process Latent Variable Model for Multi-Modal Data Analysis

no code implementations ICCV 2015 Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects.

Retrieval

Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization

no code implementations CVPR 2013 Li Shen, Shuhui Wang, Gang Sun, Shuqiang Jiang, Qingming Huang

For each internode of the hierarchical category structure, a discriminative dictionary and a set of classification models are learnt for visual categorization, and the dictionaries in different layers are learnt to exploit the discriminative visual properties of different granularity.

Dictionary Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.