Search Results for author: Heng Tao Shen

Found 115 papers, 50 papers with code

SeMv-3D: Towards Semantic and Mutil-view Consistency simultaneously for General Text-to-3D Generation with Triplane Priors

no code implementations10 Oct 2024 Xiao Cai, Pengpeng Zeng, Lianli Gao, Junchen Zhu, Jiaxin Zhang, Sitong Su, Heng Tao Shen, Jingkuan Song

Recent advancements in generic 3D content generation from text prompts have been remarkable by fine-tuning text-to-image diffusion (T2I) models or employing these T2I models as priors to learn a general text-to-3D model.

3D Generation Text to 3D

On Efficient Variants of Segment Anything Model: A Survey

no code implementations7 Oct 2024 Xiaorui Sun, Jun Liu, Heng Tao Shen, Xiaofeng Zhu, Ping Hu

The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications.

Image Segmentation Semantic Segmentation +1

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

no code implementations9 Sep 2024 Run Luo, Haonan Zhang, Longze Chen, Ting-En Lin, Xiong Liu, Yuchuan Wu, Min Yang, Minzheng Wang, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Yunshui Li, Xiaobo Xia, Fei Huang, Jingkuan Song, Yongbin Li

This framework iteratively improve data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution, generating a more complex and diverse image-text instruction dataset that empowers MLLMs with enhanced capabilities.

Diversity Visual Reasoning

VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization

1 code implementation2 Sep 2024 Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen

Through the integration of vector quantization (VQ), we empower the flow models to distinguish different concepts of multi-class normal data in an unsupervised manner, resulting in a novel flow-based unified method, named VQ-Flow.

Quantization Unsupervised Anomaly Detection

DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion

no code implementations13 Aug 2024 Yujia Wu, Yiming Shi, Jiwei Wei, ChengWei Sun, Yang Yang, Heng Tao Shen

Moreover, we introduce a novel identity-oriented LoRA weights construction pipeline to facilitate the training process of DiffLoRA.

Text-to-Image Generation

Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning

1 code implementation1 Aug 2024 Yi Bin, Junrong Liao, Yujuan Ding, Haoxuan Li, Yang Yang, See-Kiong Ng, Heng Tao Shen

The iterative cross-modal boosting also functions in inference to further enhance coherence prediction in each modality.

GalleryGPT: Analyzing Paintings with Large Multimodal Models

1 code implementation1 Aug 2024 Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

Specifically, we first propose a task of composing paragraph analysis for artworks, i. e., painting in this paper, only focusing on visual characteristics to formulate more comprehensive understanding of artworks.

Art Analysis

Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

1 code implementation24 May 2024 Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen

To tackle this issue, we conducted the theoretical analysis to promote the effectiveness of contrast decoding.

Hallucination

AICL: Action In-Context Learning for Video Diffusion Model

1 code implementation18 Mar 2024 Jianzhi Liu, Junchen Zhu, Lianli Gao, Heng Tao Shen, Jingkuan Song

The open-domain video generation models are constrained by the scale of the training video datasets, and some less common actions still cannot be generated.

Action Generation In-Context Learning +2

Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning

no code implementations15 Mar 2024 Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen

Aligning these distributions between corresponding regions from different tasks imparts higher flexibility and capacity to capture intra-region structures, accommodating a broader range of tasks.

Depth Estimation Semantic Segmentation +1

Interpreting and Improving Attention From the Perspective of Large Kernel Convolution

no code implementations11 Jan 2024 Chenghao Li, Chaoning Zhang, Boheng Zeng, Yi Lu, Pengbo Shi, Qingzi Chen, Jirui Liu, Lingyun Zhu, Yang Yang, Heng Tao Shen

These findings highlight the effectiveness of LKCA in bridging local and global feature modeling, offering a practical and robust solution for real-world applications with limited data and resources.

Image Classification

Ensemble Diversity Facilitates Adversarial Transferability

1 code implementation CVPR 2024 Bowen Tang, Zheng Wang, Yi Bin, Qi Dou, Yang Yang, Heng Tao Shen

With the advent of ensemble-based attacks the transferability of generated adversarial examples is elevated by a noticeable margin despite many methods only employing superficial integration yet ignoring the diversity between ensemble models.

Diversity reinforcement-learning +1

Embracing Unimodal Aleatoric Uncertainty for Robust Multimodal Fusion

no code implementations CVPR 2024 Zixian Gao, Xun Jiang, Xing Xu, Fumin Shen, Yujie Li, Heng Tao Shen

However in this paper we show that the potential noise in unimodal data could be well quantified and further employed to enhance more stable unimodal embeddings via contrastive learning.

Contrastive Learning

ALF: Adaptive Label Finetuning for Scene Graph Generation

no code implementations29 Dec 2023 Qishen Chen, Jianzhi Liu, Xinyu Lyu, Lianli Gao, Heng Tao Shen, Jingkuan Song

Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image.

Graph Generation Relation +1

JoReS-Diff: Joint Retinex and Semantic Priors in Diffusion Model for Low-light Image Enhancement

no code implementations20 Dec 2023 Yuhui Wu, Guoqing Wang, Zhiwen Wang, Yang Yang, Tianyu Li, Malu Zhang, Chongyi Li, Heng Tao Shen

By treating Retinex- and semantic-based priors as the condition, JoReS-Diff presents a unique perspective for establishing an diffusion model for LLIE and similar image enhancement tasks.

Low-Light Image Enhancement Semantic Segmentation

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

1 code implementation CVPR 2024 Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP.

Few-Shot Learning Text Retrieval +1

Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control

no code implementations6 Dec 2023 Sitong Su, Litao Guo, Lianli Gao, Heng Tao Shen, Jingkuan Song

Story Visualization aims to generate images aligned with story prompts, reflecting the coherence of storybooks through visual consistency among characters and scenes. Whereas current approaches exclusively concentrate on characters and neglect the visual consistency among contextually correlated scenes, resulting in independent character images without inter-image coherence. To tackle this issue, we propose a new presentation form for Story Visualization called Storyboard, inspired by film-making, as illustrated in Fig. 1. Specifically, a Storyboard unfolds a story into visual representations scene by scene.

Story Visualization

Continual Referring Expression Comprehension via Dual Modular Memorization

1 code implementation25 Nov 2023 Heng Tao Shen, Cheng Chen, Peng Wang, Lianli Gao, Meng Wang, Jingkuan Song

In this paper, we propose Continual Referring Expression Comprehension (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.

Memorization Referring Expression +1

SAM Meets UAP: Attacking Segment Anything Model With Universal Adversarial Perturbation

no code implementations19 Oct 2023 Dongshen Han, Chaoning Zhang, Sheng Zheng, Chang Lu, Yang Yang, Heng Tao Shen

On top of the ablation study to understand various components in our proposed method, we shed light on the roles of positive and negative samples in making the generated UAP effective for attacking SAM.

Adversarial Attack Adversarial Robustness +1

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

1 code implementation NeurIPS 2023 Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen

In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.

Image-text matching Image-to-Text Retrieval +5

DePT: Decoupled Prompt Tuning

1 code implementation CVPR 2024 Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song

Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i. e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks.

Prompt Engineering Zero-shot Generalization

MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly Detection

1 code implementation29 Aug 2023 Yixuan Zhou, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen

Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training.

Anomaly Localization Unsupervised Anomaly Detection

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

1 code implementation28 Aug 2023 Tianshi Wang, Fengling Li, Lei Zhu, Jingjing Li, Zheng Zhang, Heng Tao Shen

With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval methods struggle to meet the needs of users seeking access to data across various modalities.

Cross-Modal Retrieval Retrieval

Informative Scene Graph Generation via Debiasing

no code implementations10 Aug 2023 Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song

It integrates two components: Semantic Debiasing (SD) and Balanced Predicate Learning (BPL), for these imbalances.

Blocking Graph Generation +4

Generalized Unbiased Scene Graph Generation

no code implementations9 Aug 2023 Xinyu Lyu, Lianli Gao, Junlin Xie, Pengpeng Zeng, Yulu Tian, Jie Shao, Heng Tao Shen

To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts.

Graph Generation Unbiased Scene Graph Generation

Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval

1 code implementation8 Aug 2023 Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen

Specifically, on two key tasks, \textit{i. e.}, image-to-text and text-to-image retrieval, HAT achieves 7. 6\% and 16. 7\% relative score improvement of Recall@1 on MSCOCO, and 4. 4\% and 11. 6\% on Flickr30k respectively.

Cross-Modal Retrieval Image Retrieval +2

Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

1 code implementation8 Aug 2023 Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen

Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e. g., hard negatives make the model learn efficiently and effectively.

Image-text matching Representation Learning +2

Part-Aware Transformer for Generalizable Person Re-identification

1 code implementation ICCV 2023 Hao Ni, Yuke Li, Lianli Gao, Heng Tao Shen, Jingkuan Song

Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features.

Domain Generalization Generalizable Person Re-identification

Feature Noise Boosts DNN Generalization under Label Noise

1 code implementation3 Aug 2023 Lu Zeng, Xuan Chen, Xiaoshuang Shi, Heng Tao Shen

In this study, we introduce and theoretically demonstrate a simple feature noise method, which directly adds noise to the features of training data, can enhance the generalization of DNNs under label noise.

A Universal Unbiased Method for Classification from Aggregate Observations

no code implementations20 Jun 2023 Zixi Wei, Lei Feng, Bo Han, Tongliang Liu, Gang Niu, Xiaofeng Zhu, Heng Tao Shen

This motivates the study on classification from aggregate observations (CFAO), where the supervision is provided to groups of instances, instead of individual instances.

Classification Multiple Instance Learning

Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

1 code implementation5 Jun 2023 Jiabang He, Yi Hu, Lei Wang, Xing Xu, Ning Liu, Hui Liu, Heng Tao Shen

Results from the experiments demonstrate that there is a significant performance gap between the in-distribution (ID) and OOD settings for document images, and that fine-grained analysis of distribution shifts can reveal the brittle nature of existing pre-trained VDU models and OOD generalization algorithms.

document understanding Question Answering

Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

1 code implementation28 May 2023 Jin Sun, Xiaoshuang Shi, Zhiyuan Wang, Kaidi Xu, Heng Tao Shen, Xiaofeng Zhu

Then, we build a pure-MLP architecture called Caterpillar by replacing the convolutional layer with the SPC module in a hybrid model of sMLPNet.

Computational Efficiency Inductive Bias

Faster Video Moment Retrieval with Point-Level Supervision

no code implementations23 May 2023 Xun Jiang, Zailei Zhou, Xing Xu, Yang Yang, Guoqing Wang, Heng Tao Shen

Existing VMR methods suffer from two defects: (1) massive expensive temporal annotations are required to obtain satisfying performance; (2) complicated cross-modal interaction modules are deployed, which lead to high computational cost and low efficiency for the retrieval process.

Moment Retrieval Natural Language Queries +1

Non-Autoregressive Math Word Problem Solver with Unified Tree Structure

1 code implementation8 May 2023 Yi Bin, Mengqun Han, Wenhao Shi, Lei Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

For evaluating the possible expression variants, we design a path-based metric to evaluate the partial accuracy of expressions of a unified tree.

Math valid

Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal Retriveal

no code implementations7 May 2023 Zhitao Liu, Zengyu Liu, Jiwei Wei, Guan Wang, Zhenjiang Du, Ning Xie, Heng Tao Shen

Hence, the performance of cross-modal retrieval methods heavily depends on the representational capacity of this embedding space.

Cross-Modal Retrieval Retrieval

Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement

1 code implementation CVPR 2023 Yuhui Wu, Chen Pan, Guoqing Wang, Yang Yang, Jiwei Wei, Chongyi Li, Heng Tao Shen

To address this issue, we propose a novel semantic-aware knowledge-guided framework (SKF) that can assist a low-light enhancement model in learning rich and diverse priors encapsulated in a semantic segmentation model.

Low-Light Image Enhancement Semantic Segmentation

ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding

1 code implementation23 Mar 2023 Ziyang Lu, Yunqiang Pei, Guoqing Wang, Yang Yang, Zheng Wang, Heng Tao Shen

Despite their effectiveness, existing methods suffer from the difficulty of low recognition accuracy in cases of multiple adjacent objects with similar appearances. To address this issue, this work intuitively introduces the human-robot interaction as a cue to facilitate the development of 3D visual grounding.

3D visual grounding

ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction

1 code implementation ICCV 2023 Jiabang He, Lei Wang, Yi Hu, Ning Liu, Hui Liu, Xing Xu, Heng Tao Shen

To this end, we propose a simple but effective in-context learning framework called ICL-D3IE, which enables LLMs to perform DIE with different types of demonstration examples.

Document AI In-Context Learning

Imbalanced Open Set Domain Adaptation via Moving-threshold Estimation and Gradual Alignment

1 code implementation8 Mar 2023 Jinghan Ru, Jun Tian, Zhekai Du, Chengwei Xiao, Jingjing Li, Heng Tao Shen

To alleviate the negative effects raised by label shift in OSDA, we propose Open-set Moving-threshold Estimation and Gradual Alignment (OMEGA) - a novel architecture that improves existing OSDA methods on class-imbalanced data.

Transfer Learning Unsupervised Domain Adaptation

A Comprehensive Survey on Source-free Domain Adaptation

no code implementations23 Feb 2023 Zhiqi Yu, Jingjing Li, Zhekai Du, Lei Zhu, Heng Tao Shen

Over the past decade, domain adaptation has become a widely studied branch of transfer learning that aims to improve performance on target domains by leveraging knowledge from the source domain.

Source-Free Domain Adaptation Survey +1

Multilateral Semantic Relations Modeling for Image Text Retrieval

no code implementations CVPR 2023 Zheng Wang, Zhenwei Gao, Kangshuai Guo, Yang Yang, Xiaoming Wang, Heng Tao Shen

Specifically, a given query is first mapped as a probabilistic embedding to learn its true semantic distribution based on Mahalanobis distance.

Image-text Retrieval Text Retrieval

Semantic Enhanced Knowledge Graph for Large-Scale Zero-Shot Learning

no code implementations26 Dec 2022 Jiwei Wei, Yang Yang, Zeyu Ma, Jingjing Li, Xing Xu, Heng Tao Shen

In this paper, we provide a new semantic enhanced knowledge graph that contains both expert knowledge and categories semantic correlation.

Zero-Shot Learning

Visual Commonsense-aware Representation Network for Video Captioning

1 code implementation17 Nov 2022 Pengpeng Zeng, Haonan Zhang, Lianli Gao, Xiangpeng Li, Jin Qian, Heng Tao Shen

Generating consecutive descriptions for videos, i. e., Video Captioning, requires taking full advantage of visual representation along with the generation process.

Caption Generation Question Answering +2

A Lower Bound of Hash Codes' Performance

1 code implementation12 Oct 2022 Xiaosu Zhu, Jingkuan Song, Yu Lei, Lianli Gao, Heng Tao Shen

By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26. 5\%$ increase in mean Average Precision and an up to $20. 5\%$ increase in accuracy.

Metric Learning Representation Learning

Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation

1 code implementation16 Jul 2022 Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El Saddik, Heng Tao Shen

Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones.

Graph Generation Image Captioning +3

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

no code implementations11 Jul 2022 Xinyu Lyu, Lianli Gao, Pengpeng Zeng, Heng Tao Shen, Jingkuan Song

The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.

Fine-Grained Image Classification Graph Generation +4

KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

1 code implementation21 Jun 2022 Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen

In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e. g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e. g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining'').

Attribute

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

no code implementations4 Jun 2022 Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Existing visual attention models are generally planar, i. e., different channels of the last conv-layer feature map of an image share the same weight.

Object Question Answering +1

Structured Two-stream Attention Network for Video Question Answering

no code implementations2 Jun 2022 Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen

To date, visual question answering (VQA) (i. e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA.

Question Answering Video Question Answering +2

Thunder: Thumbnail based Fast Lightweight Image Denoising Network

no code implementations24 May 2022 Yifeng Zhou, Xing Xu, Shuaicheng Liu, Guoqing Wang, Huimin Lu, Heng Tao Shen

To achieve promising results on removing noise from real-world images, most of existing denoising networks are formulated with complex network structure, making them impractical for deployment.

Image Denoising SSIM

Support-set based Multi-modal Representation Enhancement for Video Captioning

1 code implementation19 May 2022 Xiaoya Chen, Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes.

Video Captioning

Fine-Grained Predicates Learning for Scene Graph Generation

1 code implementation CVPR 2022 Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng Tao Shen, Jingkuan Song

The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e. g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child".

Fine-Grained Image Classification Graph Generation +2

Relation Regularized Scene Graph Generation

no code implementations22 Feb 2022 Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao Shen, Xuelong Li

Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.

Graph Classification Graph Generation +6

One-shot Scene Graph Generation

1 code implementation22 Feb 2022 Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen

Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted from the visual content, e. g., the visual relationships "standing in", "sitting in", and "lying in" may exist between "dog" and "yard", while the Commonsense Knowledge encodes "sense-making" knowledge like "dog can guard yard".

Graph Generation Scene Graph Generation +1

Semi-Supervised Video Paragraph Grounding With Contrastive Encoder

no code implementations CVPR 2022 Xun Jiang, Xing Xu, Jingran Zhang, Fumin Shen, Zuo Cao, Heng Tao Shen

Video events grounding aims at retrieving the most relevant moments from an untrimmed video in terms of a given natural language query.

Sentence Video Grounding

Meta Distribution Alignment for Generalizable Person Re-Identification

1 code implementation CVPR 2022 Hao Ni, Jingkuan Song, Xiaopeng Luo, Feng Zheng, Wen Li, Heng Tao Shen

Domain Generalizable (DG) person ReID is a challenging task which trains a model on source domains yet generalizes well on target domains.

Generalizable Person Re-identification Meta-Learning +1

Fast Gradient Non-sign Methods

1 code implementation25 Oct 2021 Yaya Cheng, Jingkuan Song, Xiaosu Zhu, Qilong Zhang, Lianli Gao, Heng Tao Shen

Based on the linearity hypothesis, under $\ell_\infty$ constraint, $sign$ operation applied to the gradients is a good choice for generating perturbations.

Salience-Guided Iterative Asymmetric Mutual Hashing for Fast Person Re-identification

2 code implementations IEEE Transactions on Image Processing 2021 Cairong Zhao, Yuanpeng Tu, Zhihui Lai, Fumin Shen, Heng Tao Shen, Duoqian Miao

Moreover, a novel iterative asymmetric mutual training strategy (IAMT) is proposed to alleviate drawbacks of common mutual learning, which can continuously refine the discriminative regions for SSB and extract regularized dark knowledge for two models as well.

Code Generation Person Re-Identification

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

1 code implementation ICCV 2021 Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song

The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.

Blocking Graph Generation +2

Adversarial Energy Disaggregation for Non-intrusive Load Monitoring

no code implementations2 Aug 2021 Zhekai Du, Jingjing Li, Lei Zhu, Ke Lu, Heng Tao Shen

Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis.

Non-Intrusive Load Monitoring

Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

no code implementations CVPR 2021 Mingxing Zhang, Yang Yang, Xinghan Chen, Yanli Ji, Xing Xu, Jingjing Li, Heng Tao Shen

Then for a moment candidate, we concatenate the starting/middle/ending representations of its starting/middle/ending elements respectively to form the final moment representation.

Sentence

Staircase Sign Method for Boosting Adversarial Attacks

2 code implementations20 Apr 2021 Qilong Zhang, Xiaosu Zhu, Jingkuan Song, Lianli Gao, Heng Tao Shen

Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.

Adversarial Attack

Patch-wise++ Perturbation for Adversarial Targeted Attacks

1 code implementation31 Dec 2020 Lianli Gao, Qilong Zhang, Jingkuan Song, Heng Tao Shen

Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $\epsilon$-constraint is properly assigned to its surrounding regions by a project kernel.

Adversarial Attack

Dual ResGCN for Balanced Scene GraphGeneration

no code implementations9 Nov 2020 Jingyi Zhang, Yong Zhang, Baoyuan Wu, Yanbo Fan, Fumin Shen, Heng Tao Shen

We propose to incorporate the prior about the co-occurrence of relation pairs into the graph to further help alleviate the class imbalance issue.

Graph Generation Relation +1

Universal Weighting Metric Learning for Cross-Modal Matching

1 code implementation CVPR 2020 Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, Heng Tao Shen

Furthermore, we introduce a new polynomial loss under the universal weighting framework, which defines a weight function for the positive and negative informative pairs respectively.

Image-text matching Metric Learning +1

Patch-wise Attack for Fooling Deep Neural Network

4 code implementations ECCV 2020 Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, Heng Tao Shen

By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models.

Adversarial Attack Image Classification

MetaMixUp: Learning Adaptive Interpolation Policy of MixUp with Meta-Learning

no code implementations27 Aug 2019 Zhijun Mai, Guosheng Hu, Dexiong Chen, Fumin Shen, Heng Tao Shen

Since deep networks are capable of memorizing the entire dataset, the corrupted samples generated by vanilla MixUp with a badly chosen interpolation policy will degrade the performance of networks.

Data Augmentation Domain Adaptation +2

Cooperative Cross-Stream Network for Discriminative Action Representation

no code implementations27 Aug 2019 Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

It extracts this complementary information of different modality from a connection block, which aims at exploring correlations of different stream features.

Ranked #15 on Action Recognition on HMDB-51 (using extra training data)

Action Recognition Temporal Action Localization +1

Temporal Reasoning Graph for Activity Recognition

no code implementations27 Aug 2019 Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

In this paper, we propose an efficient temporal reasoning graph (TRG) to simultaneously capture the appearance features and temporal relation between video sequences at multiple time scales.

Action Recognition Relation +1

Template-based math word problem solvers with recursive neural networks

1 code implementation AAAI 2019 Lei Wang, Dongxiang Zhang, Jipeng Zhang, Xing Xu, Lianli Gao, Bing Tian Dai, Heng Tao Shen

Then, we design a recursive neural network to encode the quantity with Bi-LSTM and self attention, and infer the unknown operator nodes in a bottom-up manner.

Math

Deep Recurrent Quantization for Generating Sequential Binary Codes

1 code implementation16 Jun 2019 Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen

To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations.

Image Retrieval Quantization +1

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval

1 code implementation16 Jun 2019 Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen

In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval.

Image Retrieval Quantization +1

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

1 code implementation CVPR 2019 Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu

Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables.

Adversarial Attack Image Captioning

Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval

no code implementations25 Apr 2019 Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, Heng Tao Shen

To address the problem, in this paper, we propose a novel hashing approach, dubbed as \emph{Discrete Semantic Transfer Hashing} (DSTH).

Content-Based Image Retrieval Retrieval

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

no code implementations26 Dec 2018 Jingkuan Song, Xiangpeng Li, Lianli Gao, Heng Tao Shen

Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and high-level language context information to support the caption generation.

Caption Generation Image Captioning +2

Generative Domain-Migration Hashing for Sketch-to-Image Retrieval

1 code implementation ECCV 2018 Jingyi Zhang, Fumin Shen, Li Liu, Fan Zhu, Mengyang Yu, Ling Shao, Heng Tao Shen, Luc van Gool

The generative model learns a mapping that the distributions of sketches can be indistinguishable from the distribution of natural images using an adversarial loss, and simultaneously learns an inverse mapping based on the cycle consistency loss in order to enhance the indistinguishability.

Multi-Task Learning Retrieval +1

The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers

no code implementations22 Aug 2018 Dongxiang Zhang, Lei Wang, Luming Zhang, Bing Tian Dai, Heng Tao Shen

Solving mathematical word problems (MWPs) automatically is challenging, primarily due to the semantic gap between human-readable words and machine-understandable logics.

Math Semantic Parsing +1

Leveraging Weak Semantic Relevance for Complex Video Event Classification

no code implementations ICCV 2017 Chao Li, Jiewei Cao, Zi Huang, Lei Zhu, Heng Tao Shen

In this paper, we propose a novel approach to automatically maximize the utility of weak semantic annotations (formalized as the semantic relevance of video shots to the target event) to facilitate video event classification.

Classification General Classification

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

no code implementations8 Aug 2017 Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong. Li, Alan Hanjalic, Heng Tao Shen

In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables.

Decoder Video Captioning

Multi-Attention Network for One Shot Learning

no code implementations CVPR 2017 Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton Van Den Hengel, Heng Tao Shen

One-shot learning is a challenging problem where the aim is to recognize a class identified by a single training image.

One-Shot Learning TAG +1

Matrix Tri-Factorization With Manifold Regularizations for Zero-Shot Learning

no code implementations CVPR 2017 Xing Xu, Fumin Shen, Yang Yang, Dongxiang Zhang, Heng Tao Shen, Jingkuan Song

By additionally introducing manifold regularizations on visual data and semantic embeddings, the learned projection can effectively captures the geometrical manifold structure residing in both visual and semantic spaces.

Retrieval Transfer Learning +1

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

no code implementations5 Jun 2017 Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen

Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict the related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information.

Caption Generation Decoder +2

Deep Region Hashing for Efficient Large-scale Instance Search from Images

no code implementations26 Jan 2017 Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Heng Tao Shen

Specifically, DRH is an end-to-end deep neural network which consists of object proposal, feature extraction, and hash code generation.

Code Generation Image Retrieval +3

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

no code implementations15 Dec 2016 Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen

Along with the prosperity of recurrent neural network in modelling sequential data and the power of attention mechanism in automatically identify salient information, image captioning, a. k. a., image description, has been remarkably advanced in recent years.

Decoder Image Captioning +1

Binary Subspace Coding for Query-by-Image Video Retrieval

no code implementations6 Dec 2016 Ruicong Xu, Yang Yang, Yadan Luo, Fumin Shen, Zi Huang, Heng Tao Shen

The first approach, termed Inner-product Binary Coding (IBC), preserves the inner relationships of images and videos in a common Hamming space.

Retrieval Video Retrieval

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

no code implementations6 Oct 2016 Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation.

Depth Estimation Object +4

Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps

no code implementations22 Jun 2016 Jiewei Cao, Lingqiao Liu, Peng Wang, Zi Huang, Chunhua Shen, Heng Tao Shen

Instance retrieval requires one to search for images that contain a particular object within a large corpus.

Retrieval

Zero-Shot Hashing via Transferring Supervised Knowledge

no code implementations16 Jun 2016 Yang Yang, Wei-Lun Chen, Yadan Luo, Fumin Shen, Jie Shao, Heng Tao Shen

Supervised knowledge e. g. semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality of hash codes and hash functions.

Image Retrieval Retrieval +1

What's Wrong With That Object? Identifying Images of Unusual Objects by Modelling the Detection Score Distribution

no code implementations CVPR 2016 Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton Van Den Hengel, Heng Tao Shen

The key observation motivating our approach is that "regular object" images, "unusual object" images and "other objects" images exhibit different region-level scores in terms of both the score values and the spatial distributions.

Gaussian Processes Object +2

A Survey on Learning to Hash

no code implementations1 Jun 2016 Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, Heng Tao Shen

In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations.

Quantization Survey

Structured Learning of Binary Codes with Column Generation

no code implementations22 Feb 2016 Guosheng Lin, Fayao Liu, Chunhua Shen, Jianxin Wu, Heng Tao Shen

Our column generation based method can be further generalized from the triplet loss to a general structured learning based framework that allows one to directly optimize multivariate performance measures.

Image Retrieval Information Retrieval +2

Hi Detector, What's Wrong with that Object? Identifying Irregular Object From Images by Modelling the Detection Score Distribution

no code implementations14 Feb 2016 Peng Wang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel, Heng Tao Shen

To address this problem, we propose a novel approach by inspecting the distribution of the detection scores at multiple image regions based on the detector trained from the "regular object" and "other objects".

Gaussian Processes Object

Order-aware Convolutional Pooling for Video Based Action Recognition

no code implementations31 Jan 2016 Peng Wang, Lingqiao Liu, Chunhua Shen, Heng Tao Shen

Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame.

Action Recognition Temporal Action Localization

Compositional Model based Fisher Vector Coding for Image Classification

1 code implementation16 Jan 2016 Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, Heng Tao Shen

To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions.

Classification General Classification +1

Learning Binary Codes for Maximum Inner Product Search

no code implementations ICCV 2015 Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, Heng Tao Shen

Inspired by the latest advance in asymmetric hashing schemes, we propose an asymmetric binary code learning framework based on inner product fitting.

Optimal Graph Learning With Partial Tags and Multiple Features for Image and Video Annotation

no code implementations CVPR 2015 Lianli Gao, Jingkuan Song, Feiping Nie, Yan Yan, Nicu Sebe, Heng Tao Shen

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available.

graph construction Graph Learning

Hashing on Nonlinear Manifolds

no code implementations2 Dec 2014 Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, Zhenmin Tang, Heng Tao Shen

In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance.

Image Classification Quantization +2

Hashing for Similarity Search: A Survey

no code implementations13 Aug 2014 Jingdong Wang, Heng Tao Shen, Jingkuan Song, Jianqiu Ji

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database.

Survey

Face Identification with Second-Order Pooling

no code implementations26 Jun 2014 Fumin Shen, Chunhua Shen, Heng Tao Shen

Spatial pyramid pooling of features encoded by an over-complete dictionary has been the key component of many state-of-the-art image classification systems.

Face Identification Face Recognition +4

Face Image Classification by Pooling Raw Features

no code implementations26 Jun 2014 Fumin Shen, Chunhua Shen, Heng Tao Shen

We propose a very simple, efficient yet surprisingly effective feature extraction method for face recognition (about 20 lines of Matlab code), which is mainly inspired by spatial pyramid pooling in generic image classification.

Classification Face Recognition +2

Optimized Cartesian $K$-Means

no code implementations16 May 2014 Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li

In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.