Search Results for author: Heng Tao Shen

Found 101 papers, 43 papers with code

Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning

no code implementations • 15 Mar 2024 • Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen

Aligning these distributions between corresponding regions from different tasks imparts higher flexibility and capacity to capture intra-region structures, accommodating a broader range of tasks.

Depth Estimation Semantic Segmentation +1

Paper
Add Code

ReCo-Diff: Explore Retinex-Based Condition Strategy in Diffusion Model for Low-Light Image Enhancement

no code implementations • 20 Dec 2023 • Yuhui Wu, Guoqing Wang, Zhiwen Wang, Yang Yang, Tianyu Li, Peng Wang, Chongyi Li, Heng Tao Shen

Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models.

Low-Light Image Enhancement

Paper
Add Code

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

1 code implementation • 19 Dec 2023 • Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP.

Few-Shot Learning Retrieval +2

Paper
Code

Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control

no code implementations • 6 Dec 2023 • Sitong Su, Litao Guo, Lianli Gao, Heng Tao Shen, Jingkuan Song

Story Visualization aims to generate images aligned with story prompts, reflecting the coherence of storybooks through visual consistency among characters and scenes. Whereas current approaches exclusively concentrate on characters and neglect the visual consistency among contextually correlated scenes, resulting in independent character images without inter-image coherence. To tackle this issue, we propose a new presentation form for Story Visualization called Storyboard, inspired by film-making, as illustrated in Fig. 1. Specifically, a Storyboard unfolds a story into visual representations scene by scene.

Story Visualization

Paper
Add Code

Towards Redundancy-Free Sub-networks in Continual Learning

1 code implementation • 1 Dec 2023 • Cheng Chen, Jingkuan Song, Lianli Gao, Heng Tao Shen

Catastrophic Forgetting (CF) is a prominent issue in continual learning.

Ranked #1 on Continual Learning on CIFAR-100 ResNet-18 - 300 Epochs

Continual Learning

Paper
Code

Continual Referring Expression Comprehension via Dual Modular Memorization

1 code implementation • 25 Nov 2023 • Heng Tao Shen, Cheng Chen, Peng Wang, Lianli Gao, Meng Wang, Jingkuan Song

In this paper, we propose Continual Referring Expression Comprehension (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.

Memorization Referring Expression +1

Paper
Code

Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval

1 code implementation • NeurIPS 2023 • Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen

In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.

Ranked #16 on Video Retrieval on MSVD

Image-text matching Image-to-Text Retrieval +6

Paper
Code

DePT: Decoupled Prompt Tuning

1 code implementation • 14 Sep 2023 • Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song

Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i. e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks.

Zero-shot Generalization

Paper
Code

MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly Detection

1 code implementation • 29 Aug 2023 • Yixuan Zhou, Xing Xu, Jingkuan Song, Fumin Shen, Heng Tao Shen

Unsupervised anomaly detection (UAD) attracts a lot of research interest and drives widespread applications, where only anomaly-free samples are available for training.

Ranked #5 on Anomaly Detection on MVTec AD

Unsupervised Anomaly Detection

Paper
Code

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

1 code implementation • 28 Aug 2023 • Fengling Li, Lei Zhu, Tianshi Wang, Jingjing Li, Zheng Zhang, Heng Tao Shen

With the exponential surge in diverse multi-modal data, traditional uni-modal retrieval methods struggle to meet the needs of users demanding access to data from various modalities.

Cross-Modal Retrieval Retrieval

Paper
Code

Informative Scene Graph Generation via Debiasing

no code implementations • 10 Aug 2023 • Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song

It integrates two components: Semantic Debiasing (SD) and Balanced Predicate Learning (BPL), for these imbalances.

Blocking Graph Generation +4

Paper
Add Code

Generalized Unbiased Scene Graph Generation

no code implementations • 9 Aug 2023 • Xinyu Lyu, Lianli Gao, Junlin Xie, Pengpeng Zeng, Yulu Tian, Jie Shao, Heng Tao Shen

To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts.

Graph Generation Unbiased Scene Graph Generation

Paper
Add Code

Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

1 code implementation • 8 Aug 2023 • Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen

Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e. g., hard negatives make the model learn efficiently and effectively.

Image-text matching Representation Learning +1

Paper
Code

Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval

1 code implementation • 8 Aug 2023 • Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen

Specifically, on two key tasks, \textit{i. e.}, image-to-text and text-to-image retrieval, HAT achieves 7. 6\% and 16. 7\% relative score improvement of Recall@1 on MSCOCO, and 4. 4\% and 11. 6\% on Flickr30k respectively.

Cross-Modal Retrieval Image Retrieval +1

Paper
Code

Part-Aware Transformer for Generalizable Person Re-identification

1 code implementation • ICCV 2023 • Hao Ni, Yuke Li, Lianli Gao, Heng Tao Shen, Jingkuan Song

Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features.

Domain Generalization Generalizable Person Re-identification

Paper
Code

Feature Noise Boosts DNN Generalization under Label Noise

1 code implementation • 3 Aug 2023 • Lu Zeng, Xuan Chen, Xiaoshuang Shi, Heng Tao Shen

In this study, we introduce and theoretically demonstrate a simple feature noise method, which directly adds noise to the features of training data, can enhance the generalization of DNNs under label noise.

Paper
Code

A Universal Unbiased Method for Classification from Aggregate Observations

no code implementations • 20 Jun 2023 • Zixi Wei, Lei Feng, Bo Han, Tongliang Liu, Gang Niu, Xiaofeng Zhu, Heng Tao Shen

This motivates the study on classification from aggregate observations (CFAO), where the supervision is provided to groups of instances, instead of individual instances.

Classification Multiple Instance Learning

Paper
Add Code

Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models

1 code implementation • 5 Jun 2023 • Jiabang He, Yi Hu, Lei Wang, Xing Xu, Ning Liu, Hui Liu, Heng Tao Shen

Results from the experiments demonstrate that there is a significant performance gap between the in-distribution (ID) and OOD settings for document images, and that fine-grained analysis of distribution shifts can reveal the brittle nature of existing pre-trained VDU models and OOD generalization algorithms.

document understanding Question Answering

Paper
Code

Caterpillar: A Pure-MLP Architecture with Shifted-Pillars-Concatenation

no code implementations • 28 May 2023 • Jin Sun, Xiaoshuang Shi, Zhiyuan Wang, Kaidi Xu, Heng Tao Shen, Xiaofeng Zhu

Then, we build a pure-MLP architecture called Caterpillar by replacing the convolutional layer with the SPC module in a hybrid model of sMLPNet.

Computational Efficiency Inductive Bias

Paper
Add Code

Faster Video Moment Retrieval with Point-Level Supervision

no code implementations • 23 May 2023 • Xun Jiang, Zailei Zhou, Xing Xu, Yang Yang, Guoqing Wang, Heng Tao Shen

Existing VMR methods suffer from two defects: (1) massive expensive temporal annotations are required to obtain satisfying performance; (2) complicated cross-modal interaction modules are deployed, which lead to high computational cost and low efficiency for the retrieval process.

Moment Retrieval Natural Language Queries +1

Paper
Add Code

Non-Autoregressive Math Word Problem Solver with Unified Tree Structure

1 code implementation • 8 May 2023 • Yi Bin, Mengqun Han, Wenhao Shi, Lei Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

For evaluating the possible expression variants, we design a path-based metric to evaluate the partial accuracy of expressions of a unified tree.

Math valid

Paper
Code

Instance-Variant Loss with Gaussian RBF Kernel for 3D Cross-modal Retriveal

no code implementations • 7 May 2023 • Zhitao Liu, Zengyu Liu, Jiwei Wei, Guan Wang, Zhenjiang Du, Ning Xie, Heng Tao Shen

Hence, the performance of cross-modal retrieval methods heavily depends on the representational capacity of this embedding space.

Cross-Modal Retrieval Retrieval

Paper
Add Code

T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering

1 code implementation • 5 May 2023 • Lei Wang, Yi Hu, Jiabang He, Xing Xu, Ning Liu, Hui Liu, Heng Tao Shen

To address these issues, we propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals.

Language Modelling Large Language Model +1

Paper
Code

Learning Semantic-Aware Knowledge Guidance for Low-Light Image Enhancement

1 code implementation • CVPR 2023 • Yuhui Wu, Chen Pan, Guoqing Wang, Yang Yang, Jiwei Wei, Chongyi Li, Heng Tao Shen

To address this issue, we propose a novel semantic-aware knowledge-guided framework (SKF) that can assist a low-light enhancement model in learning rich and diverse priors encapsulated in a semantic segmentation model.

Ranked #3 on Low-Light Image Enhancement on LOLv2

Low-Light Image Enhancement Semantic Segmentation

168

Paper
Code

ScanERU: Interactive 3D Visual Grounding based on Embodied Reference Understanding

1 code implementation • 23 Mar 2023 • Ziyang Lu, Yunqiang Pei, Guoqing Wang, Yang Yang, Zheng Wang, Heng Tao Shen

Despite their effectiveness, existing methods suffer from the difficulty of low recognition accuracy in cases of multiple adjacent objects with similar appearances. To address this issue, this work intuitively introduces the human-robot interaction as a cue to facilitate the development of 3D visual grounding.

Visual Grounding

Paper
Code

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

no code implementations • 21 Mar 2023 • Chaoning Zhang, Chenshuang Zhang, Sheng Zheng, Yu Qiao, Chenghao Li, Mengchun Zhang, Sumit Kumar Dam, Chu Myaet Thwal, Ye Lin Tun, Le Luang Huy, Donguk Kim, Sung-Ho Bae, Lik-Hang Lee, Yang Yang, Heng Tao Shen, In So Kweon, Choong Seon Hong

As ChatGPT goes viral, generative AI (AIGC, a. k. a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond.

Language Modelling

Paper
Add Code

ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction

1 code implementation • ICCV 2023 • Jiabang He, Lei Wang, Yi Hu, Ning Liu, Hui Liu, Xing Xu, Heng Tao Shen

To this end, we propose a simple but effective in-context learning framework called ICL-D3IE, which enables LLMs to perform DIE with different types of demonstration examples.

Document AI In-Context Learning

Paper
Code

Imbalanced Open Set Domain Adaptation via Moving-threshold Estimation and Gradual Alignment

1 code implementation • 8 Mar 2023 • Jinghan Ru, Jun Tian, Zhekai Du, Chengwei Xiao, Jingjing Li, Heng Tao Shen

To alleviate the negative effects raised by label shift in OSDA, we propose Open-set Moving-threshold Estimation and Gradual Alignment (OMEGA) - a novel architecture that improves existing OSDA methods on class-imbalanced data.

Transfer Learning Unsupervised Domain Adaptation

Paper
Code

A Comprehensive Survey on Source-free Domain Adaptation

no code implementations • 23 Feb 2023 • Zhiqi Yu, Jingjing Li, Zhekai Du, Lei Zhu, Heng Tao Shen

Over the past decade, domain adaptation has become a widely studied branch of transfer learning that aims to improve performance on target domains by leveraging knowledge from the source domain.

Source-Free Domain Adaptation Transfer Learning

Paper
Add Code

Multilateral Semantic Relations Modeling for Image Text Retrieval

no code implementations • CVPR 2023 • Zheng Wang, Zhenwei Gao, Kangshuai Guo, Yang Yang, Xiaoming Wang, Heng Tao Shen

Specifically, a given query is first mapped as a probabilistic embedding to learn its true semantic distribution based on Mahalanobis distance.

Retrieval Text Retrieval

Paper
Add Code

Multivariate, Multi-Frequency and Multimodal: Rethinking Graph Neural Networks for Emotion Recognition in Conversation

1 code implementation • CVPR 2023 • Feiyu Chen, Jie Shao, Shuyuan Zhu, Heng Tao Shen

Yet, previous works tend to encode multimodal and contextual relationships in a loosely-coupled manner, which may harm relationship modelling.

Emotion Recognition in Conversation

Paper
Code

Semantic Enhanced Knowledge Graph for Large-Scale Zero-Shot Learning

no code implementations • 26 Dec 2022 • Jiwei Wei, Yang Yang, Zeyu Ma, Jingjing Li, Xing Xu, Heng Tao Shen

In this paper, we provide a new semantic enhanced knowledge graph that contains both expert knowledge and categories semantic correlation.

Zero-Shot Learning

Paper
Add Code

Visual Commonsense-aware Representation Network for Video Captioning

1 code implementation • 17 Nov 2022 • Pengpeng Zeng, Haonan Zhang, Lianli Gao, Xiangpeng Li, Jin Qian, Heng Tao Shen

Generating consecutive descriptions for videos, i. e., Video Captioning, requires taking full advantage of visual representation along with the generation process.

Caption Generation Question Answering +2

Paper
Code

A Lower Bound of Hash Codes' Performance

1 code implementation • 12 Oct 2022 • Xiaosu Zhu, Jingkuan Song, Yu Lei, Lianli Gao, Heng Tao Shen

By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26. 5\%$ increase in mean Average Precision and an up to $20. 5\%$ increase in accuracy.

Metric Learning Representation Learning

Paper
Code

Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation

1 code implementation • 16 Jul 2022 • Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El Saddik, Heng Tao Shen

Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones.

Graph Generation Image Captioning +3

Paper
Code

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

no code implementations • 11 Jul 2022 • Xinyu Lyu, Lianli Gao, Pengpeng Zeng, Heng Tao Shen, Jingkuan Song

The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.

Fine-Grained Image Classification Graph Generation +4

Paper
Add Code

KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

1 code implementation • 21 Jun 2022 • Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen

In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e. g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e. g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining'').

Attribute

Paper
Code

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

no code implementations • 4 Jun 2022 • Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Existing visual attention models are generally planar, i. e., different channels of the last conv-layer feature map of an image share the same weight.

Object Question Answering +1

Paper
Add Code

Structured Two-stream Attention Network for Video Question Answering

no code implementations • 2 Jun 2022 • Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen

To date, visual question answering (VQA) (i. e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA.

Question Answering Video Question Answering +2

Paper
Add Code

Thunder: Thumbnail based Fast Lightweight Image Denoising Network

no code implementations • 24 May 2022 • Yifeng Zhou, Xing Xu, Shuaicheng Liu, Guoqing Wang, Huimin Lu, Heng Tao Shen

To achieve promising results on removing noise from real-world images, most of existing denoising networks are formulated with complex network structure, making them impractical for deployment.

Image Denoising SSIM

Paper
Add Code

Support-set based Multi-modal Representation Enhancement for Video Captioning

1 code implementation • 19 May 2022 • Xiaoya Chen, Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes.

Video Captioning

Paper
Code

Fine-Grained Predicates Learning for Scene Graph Generation

1 code implementation • CVPR 2022 • Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng Tao Shen, Jingkuan Song

The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e. g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child".

Fine-Grained Image Classification Graph Generation +2

Paper
Code

Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression

1 code implementation • CVPR 2022 • Xiaosu Zhu, Jingkuan Song, Lianli Gao, Feng Zheng, Heng Tao Shen

Modeling latent variables with priors and hyperpriors is an essential problem in variational image compression.

Image Compression Quantization

106

Paper
Code

Relation Regularized Scene Graph Generation

no code implementations • 22 Feb 2022 • Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao Shen, Xuelong Li

Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.

Graph Classification Graph Generation +6

Paper
Add Code

One-shot Scene Graph Generation

1 code implementation • 22 Feb 2022 • Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen

Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted from the visual content, e. g., the visual relationships "standing in", "sitting in", and "lying in" may exist between "dog" and "yard", while the Commonsense Knowledge encodes "sense-making" knowledge like "dog can guard yard".

Graph Generation Scene Graph Generation

Paper
Code

Semi-Supervised Video Paragraph Grounding With Contrastive Encoder

no code implementations • CVPR 2022 • Xun Jiang, Xing Xu, Jingran Zhang, Fumin Shen, Zuo Cao, Heng Tao Shen

Video events grounding aims at retrieving the most relevant moments from an untrimmed video in terms of a given natural language query.

Sentence Video Grounding

Paper
Add Code

Meta Distribution Alignment for Generalizable Person Re-Identification

1 code implementation • CVPR 2022 • Hao Ni, Jingkuan Song, Xiaopeng Luo, Feng Zheng, Wen Li, Heng Tao Shen

Domain Generalizable (DG) person ReID is a challenging task which trains a model on source domains yet generalizes well on target domains.

Domain Generalization Generalizable Person Re-identification +1

Paper
Code

Fast Gradient Non-sign Methods

1 code implementation • 25 Oct 2021 • Yaya Cheng, Jingkuan Song, Xiaosu Zhu, Qilong Zhang, Lianli Gao, Heng Tao Shen

Based on the linearity hypothesis, under $\ell_\infty$ constraint, $sign$ operation applied to the gradients is a good choice for generating perturbations.

Paper
Code

Salience-Guided Iterative Asymmetric Mutual Hashing for Fast Person Re-identification

2 code implementations • IEEE Transactions on Image Processing 2021 • Cairong Zhao, Yuanpeng Tu, Zhihui Lai, Fumin Shen, Heng Tao Shen, Duoqian Miao

Moreover, a novel iterative asymmetric mutual training strategy (IAMT) is proposed to alleviate drawbacks of common mutual learning, which can continuously refine the discriminative regions for SSB and extract regularized dark knowledge for two models as well.

Code Generation Person Re-Identification

Paper
Code

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

1 code implementation • ICCV 2021 • Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song

The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.

Blocking Graph Generation +2

Paper
Code

Adversarial Energy Disaggregation for Non-intrusive Load Monitoring

no code implementations • 2 Aug 2021 • Zhekai Du, Jingjing Li, Lei Zhu, Ke Lu, Heng Tao Shen

Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis.

Non-Intrusive Load Monitoring

Paper
Add Code

Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

no code implementations • CVPR 2021 • Mingxing Zhang, Yang Yang, Xinghan Chen, Yanli Ji, Xing Xu, Jingjing Li, Heng Tao Shen

Then for a moment candidate, we concatenate the starting/middle/ending representations of its starting/middle/ending elements respectively to form the final moment representation.

Sentence

Paper
Add Code

Staircase Sign Method for Boosting Adversarial Attacks

2 code implementations • 20 Apr 2021 • Qilong Zhang, Xiaosu Zhu, Jingkuan Song, Lianli Gao, Heng Tao Shen

Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.

Adversarial Attack

130

Paper
Code

Patch-wise++ Perturbation for Adversarial Targeted Attacks

1 code implementation • 31 Dec 2020 • Lianli Gao, Qilong Zhang, Jingkuan Song, Heng Tao Shen

Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $\epsilon$-constraint is properly assigned to its surrounding regions by a project kernel.

Adversarial Attack

Paper
Code

Dual ResGCN for Balanced Scene GraphGeneration

no code implementations • 9 Nov 2020 • Jingyi Zhang, Yong Zhang, Baoyuan Wu, Yanbo Fan, Fumin Shen, Heng Tao Shen

We propose to incorporate the prior about the co-occurrence of relation pairs into the graph to further help alleviate the class imbalance issue.

Graph Generation Relation +1

Paper
Add Code

Universal Weighting Metric Learning for Cross-Modal Matching

1 code implementation • CVPR 2020 • Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, Heng Tao Shen

Furthermore, we introduce a new polynomial loss under the universal weighting framework, which defines a weight function for the positive and negative informative pairs respectively.

Image-text matching Metric Learning +1

Paper
Code

Patch-wise Attack for Fooling Deep Neural Network

4 code implementations • ECCV 2020 • Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, Heng Tao Shen

By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models.

Adversarial Attack Image Classification

130

Paper
Code

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

no code implementations • 29 Apr 2020 • Yunlian Lv, Ning Xie, Yimin Shi, Zijiao Wang, Heng Tao Shen

On the other hand, TSE module is used to generate sub-targets which allow agent to learn from failures.

Visual Navigation

Paper
Add Code

Cooperative Cross-Stream Network for Discriminative Action Representation

no code implementations • 27 Aug 2019 • Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

It extracts this complementary information of different modality from a connection block, which aims at exploring correlations of different stream features.

Ranked #15 on Action Recognition on HMDB-51 (using extra training data)

Action Recognition Temporal Action Localization

Paper
Add Code

Temporal Reasoning Graph for Activity Recognition

no code implementations • 27 Aug 2019 • Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

In this paper, we propose an efficient temporal reasoning graph (TRG) to simultaneously capture the appearance features and temporal relation between video sequences at multiple time scales.

Ranked #53 on Action Recognition on Something-Something V1

Action Recognition Relation +1

Paper
Add Code

MetaMixUp: Learning Adaptive Interpolation Policy of MixUp with Meta-Learning

no code implementations • 27 Aug 2019 • Zhijun Mai, Guosheng Hu, Dexiong Chen, Fumin Shen, Heng Tao Shen

Since deep networks are capable of memorizing the entire dataset, the corrupted samples generated by vanilla MixUp with a badly chosen interpolation policy will degrade the performance of networks.

Data Augmentation Domain Adaptation +2

Paper
Add Code

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

2 code implementations • 12 Aug 2019 • Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, Jingkuan Song

We propose a novel framework that achieves remarkable matching performance with acceptable model complexity.

Binary Classification General Classification +4

Paper
Code

Template-based math word problem solvers with recursive neural networks

1 code implementation • AAAI 2019 • Lei Wang, Dongxiang Zhang, Jipeng Zhang, Xing Xu, Lianli Gao, Bing Tian Dai, Heng Tao Shen

Then, we design a recursive neural network to encode the quantity with Bi-LSTM and self attention, and infer the unknown operator nodes in a bottom-up manner.

Math

155

Paper
Code

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval

1 code implementation • 16 Jun 2019 • Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen

In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval.

Image Retrieval Quantization +1

Paper
Code

Deep Recurrent Quantization for Generating Sequential Binary Codes

1 code implementation • 16 Jun 2019 • Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen

To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations.

Image Retrieval Quantization +1

Paper
Code

One-Shot Image-to-Image Translation via Part-Global Learning with a Multi-adversarial Framework

no code implementations • 12 May 2019 • Ziqiang Zheng, Zhibin Yu, Haiyong Zheng, Yang Yang, Heng Tao Shen

It is well known that humans can learn and recognize objects effectively from several limited image samples.

Image-to-Image Translation Translation

Paper
Add Code

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

1 code implementation • CVPR 2019 • Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu

Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables.

Adversarial Attack Image Captioning

Paper
Code

Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval

no code implementations • 25 Apr 2019 • Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, Heng Tao Shen

To address the problem, in this paper, we propose a novel hashing approach, dubbed as \emph{Discrete Semantic Transfer Hashing} (DSTH).

Content-Based Image Retrieval Retrieval

Paper
Add Code

A Large-scale Varying-view RGB-D Action Dataset for Arbitrary-view Human Action Recognition

no code implementations • 24 Apr 2019 • Yanli Ji, Feixiang Xu, Yang Yang, Fumin Shen, Heng Tao Shen, Wei-Shi Zheng

Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition.

Ranked #1 on Skeleton Based Action Recognition on Varying-view RGB-D Action-Skeleton

Action Analysis Action Recognition +2

Paper
Add Code

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

no code implementations • 26 Dec 2018 • Jingkuan Song, Xiangpeng Li, Lianli Gao, Heng Tao Shen

Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and high-level language context information to support the caption generation.

Caption Generation Image Captioning +2

Paper
Add Code

Highly-Economized Multi-View Binary Compression for Scalable Image Clustering

no code implementations • ECCV 2018 • Zheng Zhang, Li Liu, Jie Qin, Fan Zhu, Fumin Shen, Yong Xu, Ling Shao, Heng Tao Shen

How to economically cluster large-scale multi-view images is a long-standing problem in computer vision.

Clustering Image Clustering +1

Paper
Add Code

Generative Domain-Migration Hashing for Sketch-to-Image Retrieval

1 code implementation • ECCV 2018 • Jingyi Zhang, Fumin Shen, Li Liu, Fan Zhu, Mengyang Yu, Ling Shao, Heng Tao Shen, Luc van Gool

The generative model learns a mapping that the distributions of sketches can be indistinguishable from the distribution of natural images using an adversarial loss, and simultaneously learns an inverse mapping based on the cycle consistency loss in order to enhance the indistinguishability.

Multi-Task Learning Retrieval +1

Paper
Code

The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers

no code implementations • 22 Aug 2018 • Dongxiang Zhang, Lei Wang, Luming Zhang, Bing Tian Dai, Heng Tao Shen

Solving mathematical word problems (MWPs) automatically is challenging, primarily due to the semantic gap between human-readable words and machine-understandable logics.

Math Semantic Parsing

Paper
Add Code

Leveraging Weak Semantic Relevance for Complex Video Event Classification

no code implementations • ICCV 2017 • Chao Li, Jiewei Cao, Zi Huang, Lei Zhu, Heng Tao Shen

In this paper, we propose a novel approach to automatically maximize the utility of weak semantic annotations (formalized as the semantic relevance of video shots to the target event) to facilitate video event classification.

Classification General Classification

Paper
Add Code

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

no code implementations • 8 Aug 2017 • Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong. Li, Alan Hanjalic, Heng Tao Shen

In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables.

Video Captioning

Paper
Add Code

Multi-Attention Network for One Shot Learning

no code implementations • CVPR 2017 • Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton Van Den Hengel, Heng Tao Shen

One-shot learning is a challenging problem where the aim is to recognize a class identified by a single training image.

One-Shot Learning TAG +1

Paper
Add Code

Matrix Tri-Factorization With Manifold Regularizations for Zero-Shot Learning

no code implementations • CVPR 2017 • Xing Xu, Fumin Shen, Yang Yang, Dongxiang Zhang, Heng Tao Shen, Jingkuan Song

By additionally introducing manifold regularizations on visual data and semantic embeddings, the learned projection can effectively captures the geometrical manifold structure residing in both visual and semantic spaces.

Retrieval Transfer Learning +1

Paper
Add Code

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

no code implementations • 5 Jun 2017 • Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen

Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict the related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information.

Caption Generation Language Modelling +1

Paper
Add Code

Deep Region Hashing for Efficient Large-scale Instance Search from Images

no code implementations • 26 Jan 2017 • Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Heng Tao Shen

Specifically, DRH is an end-to-end deep neural network which consists of object proposal, feature extraction, and hash code generation.

Code Generation Image Retrieval +3

Paper
Add Code

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

no code implementations • 15 Dec 2016 • Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen

Along with the prosperity of recurrent neural network in modelling sequential data and the power of attention mechanism in automatically identify salient information, image captioning, a. k. a., image description, has been remarkably advanced in recent years.

Image Captioning Variational Inference

Paper
Add Code

Binary Subspace Coding for Query-by-Image Video Retrieval

no code implementations • 6 Dec 2016 • Ruicong Xu, Yang Yang, Yadan Luo, Fumin Shen, Zi Huang, Heng Tao Shen

The first approach, termed Inner-product Binary Coding (IBC), preserves the inner relationships of images and videos in a common Hamming space.

Retrieval Video Retrieval

Paper
Add Code

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

no code implementations • 6 Oct 2016 • Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation.

Depth Estimation Object +4

Paper
Add Code

Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps

no code implementations • 22 Jun 2016 • Jiewei Cao, Lingqiao Liu, Peng Wang, Zi Huang, Chunhua Shen, Heng Tao Shen

Instance retrieval requires one to search for images that contain a particular object within a large corpus.

Retrieval

Paper
Add Code

Zero-Shot Hashing via Transferring Supervised Knowledge

no code implementations • 16 Jun 2016 • Yang Yang, Wei-Lun Chen, Yadan Luo, Fumin Shen, Jie Shao, Heng Tao Shen

Supervised knowledge e. g. semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality of hash codes and hash functions.

Image Retrieval Retrieval +1

Paper
Add Code

Bidirectional Long-Short Term Memory for Video Description

no code implementations • 15 Jun 2016 • Yi Bin, Yang Yang, Zi Huang, Fumin Shen, Xing Xu, Heng Tao Shen

Video captioning has been attracting broad research attention in multimedia community.

Language Modelling Video Captioning +1

Paper
Add Code

What's Wrong With That Object? Identifying Images of Unusual Objects by Modelling the Detection Score Distribution

no code implementations • CVPR 2016 • Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton Van Den Hengel, Heng Tao Shen

The key observation motivating our approach is that "regular object" images, "unusual object" images and "other objects" images exhibit different region-level scores in terms of both the score values and the spatial distributions.

Gaussian Processes Object +2

Paper
Add Code

A Survey on Learning to Hash

no code implementations • 1 Jun 2016 • Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, Heng Tao Shen

In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations.

Quantization

Paper
Add Code

Learning Binary Codes and Binary Weights for Efficient Classification

no code implementations • 14 Mar 2016 • Fumin Shen, Yadong Mu, Wei Liu, Yang Yang, Heng Tao Shen

The optimization alternatively proceeds over the binary classifiers and image hash codes.

Classification General Classification +2

Paper
Add Code

Structured Learning of Binary Codes with Column Generation

no code implementations • 22 Feb 2016 • Guosheng Lin, Fayao Liu, Chunhua Shen, Jianxin Wu, Heng Tao Shen

Our column generation based method can be further generalized from the triplet loss to a general structured learning based framework that allows one to directly optimize multivariate performance measures.

Image Retrieval Information Retrieval +1

Paper
Add Code

Hi Detector, What's Wrong with that Object? Identifying Irregular Object From Images by Modelling the Detection Score Distribution

no code implementations • 14 Feb 2016 • Peng Wang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel, Heng Tao Shen

To address this problem, we propose a novel approach by inspecting the distribution of the detection scores at multiple image regions based on the detector trained from the "regular object" and "other objects".

Gaussian Processes Object

Paper
Add Code

Order-aware Convolutional Pooling for Video Based Action Recognition

no code implementations • 31 Jan 2016 • Peng Wang, Lingqiao Liu, Chunhua Shen, Heng Tao Shen

Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame.

Action Recognition Temporal Action Localization

Paper
Add Code

Compositional Model based Fisher Vector Coding for Image Classification

1 code implementation • 16 Jan 2016 • Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, Heng Tao Shen

To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions.

Classification General Classification +1

Paper
Code

Learning Binary Codes for Maximum Inner Product Search

no code implementations • ICCV 2015 • Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, Heng Tao Shen

Inspired by the latest advance in asymmetric hashing schemes, we propose an asymmetric binary code learning framework based on inner product fitting.

Paper
Add Code

Optimal Graph Learning With Partial Tags and Multiple Features for Image and Video Annotation

no code implementations • CVPR 2015 • Lianli Gao, Jingkuan Song, Feiping Nie, Yan Yan, Nicu Sebe, Heng Tao Shen

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available.

graph construction Graph Learning

Paper
Add Code

Supervised Discrete Hashing

1 code implementation • CVPR 2015 • Fumin Shen, Chunhua Shen, Wei Liu, Heng Tao Shen

This paper has been withdrawn by the authour.

Paper
Code

Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

no code implementations • 4 Mar 2015 • Peng Wang, Yuanzhouhan Cao, Chunhua Shen, Lingqiao Liu, Heng Tao Shen

One challenge is that video contains a varying number of frames which is incompatible to the standard input format of CNNs.

Action Recognition Image Classification +1

Paper
Add Code

Hashing on Nonlinear Manifolds

no code implementations • 2 Dec 2014 • Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, Zhenmin Tang, Heng Tao Shen

In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance.

Image Classification Quantization +2

Paper
Add Code

Hashing for Similarity Search: A Survey

no code implementations • 13 Aug 2014 • Jingdong Wang, Heng Tao Shen, Jingkuan Song, Jianqiu Ji

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database.

Paper
Add Code

Face Image Classification by Pooling Raw Features

no code implementations • 26 Jun 2014 • Fumin Shen, Chunhua Shen, Heng Tao Shen

We propose a very simple, efficient yet surprisingly effective feature extraction method for face recognition (about 20 lines of Matlab code), which is mainly inspired by spatial pyramid pooling in generic image classification.

Classification Face Recognition +2

Paper
Add Code

Face Identification with Second-Order Pooling

no code implementations • 26 Jun 2014 • Fumin Shen, Chunhua Shen, Heng Tao Shen

Spatial pyramid pooling of features encoded by an over-complete dictionary has been the key component of many state-of-the-art image classification systems.

Face Identification Face Recognition +4

Paper
Add Code

Optimized Cartesian $K$-Means

no code implementations • 16 May 2014 • Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li

In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.

Quantization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.