Search Results for author: Shuhui Wang

Found 52 papers, 35 papers with code

Towards Discriminability and Diversity: Batch Nuclear-norm Maximization under Label Insufficient Situations

2 code implementations • CVPR 2020 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian

We find by theoretical analysis that the prediction discriminability and diversity could be separately measured by the Frobenius-norm and rank of the batch output matrix.

Domain Adaptation

321

Paper
Code

Gradually Vanishing Bridge for Adversarial Domain Adaptation

2 code implementations • CVPR 2020 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Chi Su, Qingming Huang, Qi Tian

On the discriminator, GVB contributes to enhance the discriminating ability, and balance the adversarial training process.

Unsupervised Domain Adaptation

321

Paper
Code

ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing

2 code implementations • CVPR 2023 • Xiaodan Li, Yuefeng Chen, Yao Zhu, Shuhui Wang, Rong Zhang, Hui Xue

We also evaluate some robust models including both adversarially trained models and other robust trained models and find that some models show worse robustness against attribute changes than vanilla models.

Attribute Benchmarking +1

305

Paper
Code

Fast Batch Nuclear-norm Maximization and Minimization for Robust Domain Adaptation

1 code implementation • 13 Jul 2021 • Shuhao Cui, Shuhui Wang, Junbao Zhuo, Liang Li, Qingming Huang, Qi Tian

Due to the domain discrepancy in visual domain adaptation, the performance of source model degrades when bumping into the high data density near decision boundary in target domain.

Domain Adaptation

262

Paper
Code

F3Net: Fusion, Feedback and Focus for Salient Object Detection

4 code implementations • 26 Nov 2019 • Jun Wei, Shuhui Wang, Qingming Huang

Furthermore, different from binary cross entropy, the proposed PPA loss doesn't treat pixels equally, which can synthesize the local structure information of a pixel to guide the network to focus more on local details.

Ranked #5 on Salient Object Detection on DUT-OMRON

Dichotomous Image Segmentation Object +2

215

Paper
Code

Label Decoupling Framework for Salient Object Detection

1 code implementation • CVPR 2020 • Jun Wei, Shuhui Wang, Zhe Wu, Chi Su, Qingming Huang, Qi Tian

Though remarkable progress has been achieved, we observe that the closer the pixel is to the edge, the more difficult it is to be predicted, because edge pixels have a very imbalance distribution.

Ranked #1 on Saliency Detection on HKU-IS

Object object-detection +3

113

Paper
Code

Parsing-based View-aware Embedding Network for Vehicle Re-Identification

1 code implementation • CVPR 2020 • Dechao Meng, Liang Li, Xuejing Liu, Yadong Li, Shijie Yang, Zheng-Jun Zha, Xingyu Gao, Shuhui Wang, Qingming Huang

Vehicle Re-Identification is to find images of the same vehicle from various views in the cross-camera scenario.

Vehicle Re-Identification

101

Paper
Code

Attribute Group Editing for Reliable Few-shot Image Generation

1 code implementation • CVPR 2022 • Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, Xin Jin, Dandan Tu, Qingming Huang

Few-shot image generation is a challenging task even using the state-of-the-art Generative Adversarial Networks (GANs).

Attribute Dictionary Learning +1

Paper
Code

Heuristic Domain Adaptation

1 code implementation • NeurIPS 2020 • Shuhao Cui, Xuan Jin, Shuhui Wang, Yuan He, Qingming Huang

In visual domain adaptation (DA), separating the domain-specific characteristics from the domain-invariant representations is an ill-posed problem.

Domain Adaptation

Paper
Code

Hierarchical Modular Network for Video Captioning

1 code implementation • CVPR 2022 • Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang

(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.

Representation Learning Sentence +1

Paper
Code

Unsupervised Open Domain Recognition by Semantic Discrepancy Minimization

1 code implementation • CVPR 2019 • Junbao Zhuo, Shuhui Wang, Shuhao Cui, Qingming Huang

We address the unsupervised open domain recognition (UODR) problem, where categories in labeled source domain S is only a subset of those in unlabeled target domain T. The task is to correctly classify all samples in T including known and unknown categories.

Classification General Classification

Paper
Code

Learning fragment self-attention embeddings for image-text matching

1 code implementation • ACMMM 2019 • Yiling Wu, Shuhui Wang, Guoli Song, Qingming Huang

In this paper, we propose Self-Attention Embeddings (SAEM) to exploit fragment relations in images or texts by self-attention mechanism, and aggregate fragment information into visual and textual embeddings.

Image-text matching Sentence +1

Paper
Code

Mining Latent Structures for Multimedia Recommendation

1 code implementation • 19 Apr 2021 • Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, Liang Wang

To be specific, in the proposed LATTICE model, we devise a novel modality-aware structure learning layer, which learns item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs.

Collaborative Filtering Multimedia recommendation +1

Paper
Code

Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation • ICCV 2019 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Dechao Meng, Qingming Huang

It builds the correspondence between image region proposal and query in an adaptive manner: adaptive grounding and collaborative reconstruction.

Attribute Referring Expression +1

Paper
Code

Composite Adversarial Attacks

1 code implementation • 10 Dec 2020 • Xiaofeng Mao, Yuefeng Chen, Shuhui Wang, Hang Su, Yuan He, Hui Xue

Adversarial attack is a technique for deceiving Machine Learning (ML) models, which provides a way to evaluate the adversarial robustness.

Adversarial Attack Adversarial Robustness

Paper
Code

Greedy Gradient Ensemble for Robust Visual Question Answering

1 code implementation • ICCV 2021 • Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information.

Ranked #2 on Visual Question Answering (VQA) on VQA-CP

Question Answering Visual Question Answering

Paper
Code

Self-Regulated Learning for Egocentric Video Activity Anticipation

1 code implementation • 23 Nov 2021 • Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Qingming Huang, Qi Tian

Future activity anticipation is a challenging problem in egocentric vision.

Multi-Task Learning

Paper
Code

DeeCap: Dynamic Early Exiting for Efficient Image Captioning

1 code implementation • CVPR 2022 • Zhengcong Fei, Xu Yan, Shuhui Wang, Qi Tian

On one hand, the representation in shallow layers lacks high-level semantic and sufficient cross-modal fusion information for accurate prediction.

Image Captioning Imitation Learning

Paper
Code

Learning Invariant Representation with Consistency and Diversity for Semi-supervised Source Hypothesis Transfer

1 code implementation • 7 Jul 2021 • Xiaodong Wang, Junbao Zhuo, Shuhao Cui, Shuhui Wang

Semi-supervised domain adaptation (SSDA) aims to solve tasks in target domain by utilizing transferable information learned from the available source domain and a few labeled target data.

Domain Adaptation Semi-supervised Domain Adaptation

Paper
Code

A Graph Regularized Deep Neural Network for Unsupervised Image Representation Learning

1 code implementation • CVPR 2017 • Shijie Yang, Liang Li, Shuhui Wang, Weigang Zhang, Qingming Huang

Deep Auto-Encoder (DAE) has shown its promising power in high-level representation learning.

Representation Learning

Paper
Code

The Euclidean Space is Evil: Hyperbolic Attribute Editing for Few-shot Image Generation

1 code implementation • ICCV 2023 • Lingxiao Li, Yi Zhang, Shuhui Wang

Existing methods suffer from the trade-off between the quality and diversity of generated images.

Attribute Image Generation

Paper
Code

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

1 code implementation • 14 Aug 2019 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Multimodal learning aims to discover the relationship between multiple modalities.

Cross-Modal Retrieval Retrieval

Paper
Code

General Greedy De-bias Learning

1 code implementation • 20 Dec 2021 • Xinzhe Han, Shuhui Wang, Chi Su, Qingming Huang, Qi Tian

Existing de-bias learning frameworks try to capture specific dataset bias by annotations but they fail to handle complicated OOD scenarios.

Image Classification Question Answering +1

Paper
Code

Orthogonal Temporal Interpolation for Zero-Shot Video Recognition

1 code implementation • 14 Aug 2023 • Yan Zhu, Junbao Zhuo, Bin Ma, Jiajia Geng, Xiaoming Wei, Xiaolin Wei, Shuhui Wang

We propose a model called OTI for ZSVR by employing orthogonal temporal interpolation and the matching loss based on VLMs.

Ranked #1 on Zero-Shot Action Recognition on UCF101

Video Recognition Zero-Shot Action Recognition +2

Paper
Code

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

1 code implementation • 23 Nov 2021 • Zhaobo Qi, Shuhui Wang, Chi Su, Li Su, Weigang Zhang, Qingming Huang

Based on TDC, we propose the temporal dynamic concept modeling network (TDCMN) to learn an accurate and complete concept representation for efficient untrimmed video analysis.

Image Categorization

Paper
Code

Multi-Attention Network for Compressed Video Referring Object Segmentation

1 code implementation • 26 Jul 2022 • Weidong Chen, Dexiang Hong, Yuankai Qi, Zhenjun Han, Shuhui Wang, Laiyun Qing, Qingming Huang, Guorong Li

To address this problem, we propose a multi-attention network which consists of dual-path dual-attention module and a query-based cross-modal Transformer module.

Ranked #5 on Referring Expression Segmentation on A2D Sentences

Object Referring Expression Segmentation +4

Paper
Code

Knowledge-guided Pairwise Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation • 5 Sep 2019 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Li Su, Qingming Huang

Weakly supervised referring expression grounding (REG) aims at localizing the referential entity in an image according to linguistic query, where the mapping between the image region (proposal) and the query is unknown in the training stage.

Object Referring Expression +2

Paper
Code

Stable Attribute Group Editing for Reliable Few-shot Image Generation

1 code implementation • 1 Feb 2023 • Guanqi Ding, Xinzhe Han, Shuhui Wang, Xin Jin, Dandan Tu, Qingming Huang

SAGE takes use of all given few-shot images and estimates a class center embedding based on the category-relevant attribute dictionary.

Attribute Classification +1

Paper
Code

State-Relabeling Adversarial Active Learning

1 code implementation • CVPR 2020 • Beichen Zhang, Liang Li, Shijie Yang, Shuhui Wang, Zheng-Jun Zha, Qingming Huang

In this paper, we propose a state relabeling adversarial active learning model (SRAAL), that leverages both the annotation and the labeled/unlabeled state information for deriving the most informative unlabeled samples.

Active Learning

Paper
Code

Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video

1 code implementation • 15 Jan 2024 • Zhaobo Qi, Yibo Yuan, Xiaowen Ruan, Shuhui Wang, Weigang Zhang, Qingming Huang

Temporal Sentence Grounding in Video (TSGV) is troubled by dataset bias issue, which is caused by the uneven temporal distribution of the target moments for samples with similar semantic components in input videos or query texts.

Sentence Temporal Sentence Grounding

Paper
Code

Semi-Autoregressive Image Captioning

1 code implementation • 11 Oct 2021 • Xu Yan, Zhengcong Fei, Zekang Li, Shuhui Wang, Qingming Huang, Qi Tian

Non-autoregressive image captioning with continuous iterative refinement, which eliminates the sequential dependence in a sentence generation, can achieve comparable performance to the autoregressive counterparts with a considerable acceleration.

Image Captioning Sentence

Paper
Code

Entity-enhanced Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding

1 code implementation • 18 Jul 2022 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang

Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects.

Attribute Referring Expression +2

Paper
Code

Open-Set Knowledge-Based Visual Question Answering with Inference Paths

1 code implementation • 12 Oct 2023 • Jingru Gan, Xinzhe Han, Shuhui Wang, Qingming Huang

Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.

Knowledge Graphs Multi-class Classification +2

Paper
Code

Less Is More: Picking Informative Frames for Video Captioning

no code implementations • ECCV 2018 • Yangyu Chen, Shuhui Wang, Weigang Zhang, Qingming Huang

We propose a plug-and-play PickNet to perform informative frame picking in video captioning.

Video Captioning

Paper
Add Code

Multi-level Discriminative Dictionary Learning towards Hierarchical Visual Categorization

no code implementations • CVPR 2013 • Li Shen, Shuhui Wang, Gang Sun, Shuqiang Jiang, Qingming Huang

For each internode of the hierarchical category structure, a discriminative dictionary and a set of classification models are learnt for visual categorization, and the dictionaries in different layers are learnt to exploit the discriminative visual properties of different granularity.

Dictionary Learning

Paper
Add Code

Online Asymmetric Similarity Learning for Cross-Modal Retrieval

no code implementations • CVPR 2017 • Yiling Wu, Shuhui Wang, Qingming Huang

In this paper, we propose an online learning method to learn the similarity function between heterogeneous modalities by preserving the relative similarity in the training data, which is modeled as a set of bi-directional hinge loss constraints on the cross-modal training triplets.

Cross-Modal Retrieval Retrieval +2

Paper
Add Code

Similarity Gaussian Process Latent Variable Model for Multi-Modal Data Analysis

no code implementations • ICCV 2015 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects.

Retrieval

Paper
Add Code

Multimodal Gaussian Process Latent Variable Models With Harmonization

no code implementations • ICCV 2017 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian

We incorporate the harmonization mechanism into the learning process of multimodal GPLVMs.

Cross-Modal Retrieval Retrieval

Paper
Add Code

Sharp Multiple Instance Learning for DeepFake Video Detection

no code implementations • 11 Aug 2020 • Xiaodan Li, Yining Lang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Shuhui Wang, Hui Xue, Quan Lu

A sharp MIL (S-MIL) is proposed which builds direct mapping from instance embeddings to bag prediction, rather than from instance embeddings to instance prediction and then to bag prediction in traditional MIL.

Face Swapping Multiple Instance Learning

Paper
Add Code

Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision

no code implementations • ECCV 2020 • Xinzhe Han, Shuhui Wang, Chi Su, Weigang Zhang, Qingming Huang, Qi Tian

In this paper, we rethink implicit reasoning process in VQA, and propose a new formulation which maximizes the log-likelihood of joint distribution for the observed question and predicted answer.

Question Answering Visual Question Answering +1

Paper
Add Code

Semantic Editing On Segmentation Map Via Multi-Expansion Loss

no code implementations • 16 Oct 2020 • Jianfeng He, Xuchao Zhang, Shuo Lei, Shuhui Wang, Qingming Huang, Chang-Tien Lu, Bei Xiao

Each MEx area has the mask area of the generation as the majority and the boundary of original context as the minority.

Image Inpainting Segmentation

Paper
Add Code

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

no code implementations • CVPR 2021 • Xiaodan Li, Jinfeng Li, Yuefeng Chen, Shaokai Ye, Yuan He, Shuhui Wang, Hang Su, Hui Xue

Comprehensive experiments show that the proposed attack achieves a high attack success rate with few queries against the image retrieval systems under the black-box setting.

Image Classification Image Retrieval +1

Paper
Add Code

DVCFlow: Modeling Information Flow Towards Human-like Video Captioning

no code implementations • 19 Nov 2021 • Xu Yan, Zhengcong Fei, Shuhui Wang, Qingming Huang, Qi Tian

Dense video captioning (DVC) aims to generate multi-sentence descriptions to elucidate the multiple events in the video, which is challenging and demands visual consistency, discoursal coherence, and linguistic diversity.

Dense Video Captioning Sentence

Paper
Add Code

IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning

no code implementations • 2 Apr 2022 • Zhenhuan Liu, Jincan Deng, Liang Li, Shaofei Cai, Qianqian Xu, Shuhui Wang, Qingming Huang

Conditional image generation is an active research topic including text2image and image translation.

Conditional Image Generation Generative Adversarial Network +1

Paper
Add Code

Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency

1 code implementation • 2 Apr 2022 • Zhenhuan Liu, Liang Li, Huajie Jiang, Xin Jin, Dandan Tu, Shuhui Wang, Zheng-Jun Zha

Furthermore, we devise the spatio-temporal correlative map as a style-independent, global-aware regularization on the perceptual motion consistency.

Optical Flow Estimation Style Transfer

Paper
Code

Atrial Fibrillation Detection Using Weight-Pruned, Log-Quantised Convolutional Neural Networks

no code implementations • 14 Jun 2022 • Xiu Qi Chang, Ann Feng Chew, Benjamin Chen Ming Choong, Shuhui Wang, Rui Han, Wang He, Li Xiaolin, Rajesh C. Panicker, Deepu John

Deep neural networks (DNN) are a promising tool in medical applications.

Atrial Fibrillation Detection Model Compression

Paper
Add Code

Multimodal Brain Disease Classification with Functional Interaction Learning from Single fMRI Volume

no code implementations • 5 Aug 2022 • Wei Dai, Ziyao Zhang, Lixia Tian, Shengyuan Yu, Shuhui Wang, Zhao Dong, Hairong Zheng

The low representation ability of FC leads to poor performance in clinical practice, especially when dealing with multimodal medical data involving multiple types of visual signals and textual records for brain diseases.

Time Series Analysis

Paper
Add Code

Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection

no code implementations • CVPR 2023 • Chen Zhang, Guorong Li, Yuankai Qi, Shuhui Wang, Laiyun Qing, Qingming Huang, Ming-Hsuan Yang

Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.

Anomaly Detection Pseudo Label +1

Paper
Add Code

Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval

no code implementations • 11 Sep 2023 • Yabing Wang, Shuhui Wang, Hao Luo, Jianfeng Dong, Fan Wang, Meng Han, Xun Wang, Meng Wang

Therefore, we propose Dual-view Curricular Optimal Transport (DCOT) to learn with noisy correspondence in CCR.

Cross-Lingual Transfer Cross-Modal Retrieval +2

Paper
Add Code

R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation

no code implementations • 13 Oct 2023 • Jiayu Xiao, Henglei Lv, Liang Li, Shuhui Wang, Qingming Huang

Recent text-to-image (T2I) diffusion models have achieved remarkable progress in generating high-quality images given text-prompts as input.

Text-to-Image Generation

Paper
Add Code

A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes

no code implementations • 12 Mar 2024 • Ting Yu, Xiaojun Lin, Shuhui Wang, Weiguo Sheng, Qingming Huang, Jun Yu

Three-Dimensional (3D) dense captioning is an emerging vision-language bridging task that aims to generate multiple detailed and accurate descriptions for 3D scenes.

3D dense captioning Dense Captioning

Paper
Add Code

Confusing Pair Correction Based on Category Prototype for Domain Adaptation under Noisy Environments

1 code implementation • 19 Mar 2024 • Churan Zhi, Junbao Zhuo, Shuhui Wang

In this paper, we address unsupervised domain adaptation under noisy environments, which is more challenging and practical than traditional domain adaptation.

Unsupervised Domain Adaptation

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.