Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation

1 code implementation16 Jul 2022 Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El Saddik, Heng Tao Shen

Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones.

Graph Generation Image Captioning +1

Adaptive Fine-Grained Predicates Learning for Scene Graph Generation

no code implementations11 Jul 2022 Xinyu Lyu, Lianli Gao, Pengpeng Zeng, Heng Tao Shen, Jingkuan Song

The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.

Fine-Grained Image Classification Graph Generation +2

KE-RCNN: Unifying Knowledge based Reasoning into Part-level Attribute Parsing

1 code implementation21 Jun 2022 Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen

In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e. g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e. g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining'').

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

no code implementations4 Jun 2022 Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Existing visual attention models are generally planar, i. e., different channels of the last conv-layer feature map of an image share the same weight.

Question Answering Visual Question Answering +1

Structured Two-stream Attention Network for Video Question Answering

no code implementations2 Jun 2022 Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen

To date, visual question answering (VQA) (i. e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA.

Question Answering Video Question Answering +2

Thunder: Thumbnail based Fast Lightweight Image Denoising Network

no code implementations24 May 2022 Yifeng Zhou, Xing Xu, Shuaicheng Liu, Guoqing Wang, Huimin Lu, Heng Tao Shen

To achieve promising results on removing noise from real-world images, most of existing denoising networks are formulated with complex network structure, making them impractical for deployment.

Image Denoising SSIM

Support-set based Multi-modal Representation Enhancement for Video Captioning

1 code implementation19 May 2022 Xiaoya Chen, Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen

Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes.

Video Captioning

Fine-Grained Predicates Learning for Scene Graph Generation

1 code implementation CVPR 2022 Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng Tao Shen, Jingkuan Song

The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e. g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child".

Fine-Grained Image Classification Graph Generation +2

Practical No-box Adversarial Attacks with Training-free Hybrid Image Transformation

no code implementations9 Mar 2022 Qilong Zhang, Chaoning Zhang, CHAOQUN LI, Jingkuan Song, Lianli Gao, Heng Tao Shen

In this paper, we move a step forward and show the existence of a \textbf{training-free} adversarial perturbation under the no-box threat model, which can be successfully used to attack different DNNs in real-time.

One-shot Scene Graph Generation

1 code implementation22 Feb 2022 Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen

Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted from the visual content, e. g., the visual relationships "standing in", "sitting in", and "lying in" may exist between "dog" and "yard", while the Commonsense Knowledge encodes "sense-making" knowledge like "dog can guard yard".

Graph Generation Natural Language Processing +1

Relation Regularized Scene Graph Generation

no code implementations22 Feb 2022 Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao Shen, Xuelong Li

Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.

Graph Classification Graph Generation +4

Semi-Supervised Video Paragraph Grounding With Contrastive Encoder

no code implementations CVPR 2022 Xun Jiang, Xing Xu, Jingran Zhang, Fumin Shen, Zuo Cao, Heng Tao Shen

Video events grounding aims at retrieving the most relevant moments from an untrimmed video in terms of a given natural language query.

Video Grounding

Fast Gradient Non-sign Methods

1 code implementation25 Oct 2021 Yaya Cheng, Jingkuan Song, Xiaosu Zhu, Qilong Zhang, Lianli Gao, Heng Tao Shen

Based on the linearity hypothesis, under $\ell_\infty$ constraint, $sign$ operation applied to the gradients is a good choice for generating perturbations.

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

1 code implementation ICCV 2021 Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song

The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.

Graph Generation Scene Graph Generation +1

Adversarial Energy Disaggregation for Non-intrusive Load Monitoring

no code implementations2 Aug 2021 Zhekai Du, Jingjing Li, Lei Zhu, Ke Lu, Heng Tao Shen

Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis.

Non-Intrusive Load Monitoring

Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

no code implementations CVPR 2021 Mingxing Zhang, Yang Yang, Xinghan Chen, Yanli Ji, Xing Xu, Jingjing Li, Heng Tao Shen

Then for a moment candidate, we concatenate the starting/middle/ending representations of its starting/middle/ending elements respectively to form the final moment representation.

Staircase Sign Method for Boosting Adversarial Attacks

1 code implementation20 Apr 2021 Qilong Zhang, Xiaosu Zhu, Jingkuan Song, Lianli Gao, Heng Tao Shen

Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.

Adversarial Attack

Patch-wise++ Perturbation for Adversarial Targeted Attacks

1 code implementation31 Dec 2020 Lianli Gao, Qilong Zhang, Jingkuan Song, Heng Tao Shen

Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $\epsilon$-constraint is properly assigned to its surrounding regions by a project kernel.

Adversarial Attack

Dual ResGCN for Balanced Scene GraphGeneration

no code implementations9 Nov 2020 Jingyi Zhang, Yong Zhang, Baoyuan Wu, Yanbo Fan, Fumin Shen, Heng Tao Shen

We propose to incorporate the prior about the co-occurrence of relation pairs into the graph to further help alleviate the class imbalance issue.

Graph Generation Scene Graph Generation

Universal Weighting Metric Learning for Cross-Modal Matching

1 code implementation CVPR 2020 Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, Heng Tao Shen

Furthermore, we introduce a new polynomial loss under the universal weighting framework, which defines a weight function for the positive and negative informative pairs respectively.

Metric Learning Text Matching

Patch-wise Attack for Fooling Deep Neural Network

2 code implementations ECCV 2020 Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, Heng Tao Shen

By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models.

Adversarial Attack Image Classification

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

no code implementations29 Apr 2020 Yunlian Lv, Ning Xie, Yimin Shi, Zijiao Wang, Heng Tao Shen

On the other hand, TSE module is used to generate sub-targets which allow agent to learn from failures.

Visual Navigation

MetaMixUp: Learning Adaptive Interpolation Policy of MixUp with Meta-Learning

no code implementations27 Aug 2019 Zhijun Mai, Guosheng Hu, Dexiong Chen, Fumin Shen, Heng Tao Shen

Since deep networks are capable of memorizing the entire dataset, the corrupted samples generated by vanilla MixUp with a badly chosen interpolation policy will degrade the performance of networks.

Data Augmentation Domain Adaptation +2

Cooperative Cross-Stream Network for Discriminative Action Representation

no code implementations27 Aug 2019 Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

It extracts this complementary information of different modality from a connection block, which aims at exploring correlations of different stream features.

Ranked #12 on Action Recognition on UCF101 (using extra training data)

Action Recognition

Temporal Reasoning Graph for Activity Recognition

no code implementations27 Aug 2019 Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen

In this paper, we propose an efficient temporal reasoning graph (TRG) to simultaneously capture the appearance features and temporal relation between video sequences at multiple time scales.

Action Recognition Relation Extraction

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

2 code implementations12 Aug 2019 Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, Jingkuan Song

We propose a novel framework that achieves remarkable matching performance with acceptable model complexity.

General Classification Re-Ranking +2

Deep Recurrent Quantization for Generating Sequential Binary Codes

1 code implementation16 Jun 2019 Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen

To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations.

Image Retrieval Quantization

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval

1 code implementation16 Jun 2019 Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen

In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval.

Image Retrieval Quantization

Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables

1 code implementation CVPR 2019 Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu

Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables.

Adversarial Attack Image Captioning

Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval

no code implementations25 Apr 2019 Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, Heng Tao Shen

To address the problem, in this paper, we propose a novel hashing approach, dubbed as \emph{Discrete Semantic Transfer Hashing} (DSTH).

Content-Based Image Retrieval

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

no code implementations26 Dec 2018 Jingkuan Song, Xiangpeng Li, Lianli Gao, Heng Tao Shen

Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and high-level language context information to support the caption generation.

Image Captioning Language Modelling +1

Generative Domain-Migration Hashing for Sketch-to-Image Retrieval

1 code implementation ECCV 2018 Jingyi Zhang, Fumin Shen, Li Liu, Fan Zhu, Mengyang Yu, Ling Shao, Heng Tao Shen, Luc van Gool

The generative model learns a mapping that the distributions of sketches can be indistinguishable from the distribution of natural images using an adversarial loss, and simultaneously learns an inverse mapping based on the cycle consistency loss in order to enhance the indistinguishability.

Multi-Task Learning Sketch-Based Image Retrieval

The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers

no code implementations22 Aug 2018 Dongxiang Zhang, Lei Wang, Luming Zhang, Bing Tian Dai, Heng Tao Shen

Solving mathematical word problems (MWPs) automatically is challenging, primarily due to the semantic gap between human-readable words and machine-understandable logics.

Semantic Parsing

Leveraging Weak Semantic Relevance for Complex Video Event Classification

no code implementations ICCV 2017 Chao Li, Jiewei Cao, Zi Huang, Lei Zhu, Heng Tao Shen

In this paper, we propose a novel approach to automatically maximize the utility of weak semantic annotations (formalized as the semantic relevance of video shots to the target event) to facilitate video event classification.

Classification General Classification

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

no code implementations8 Aug 2017 Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong. Li, Alan Hanjalic, Heng Tao Shen

In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables.

Video Captioning

Matrix Tri-Factorization With Manifold Regularizations for Zero-Shot Learning

no code implementations CVPR 2017 Xing Xu, Fumin Shen, Yang Yang, Dongxiang Zhang, Heng Tao Shen, Jingkuan Song

By additionally introducing manifold regularizations on visual data and semantic embeddings, the learned projection can effectively captures the geometrical manifold structure residing in both visual and semantic spaces.

Transfer Learning Zero-Shot Learning

Multi-Attention Network for One Shot Learning

no code implementations CVPR 2017 Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton Van Den Hengel, Heng Tao Shen

One-shot learning is a challenging problem where the aim is to recognize a class identified by a single training image.

One-Shot Learning TAG +1

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

no code implementations5 Jun 2017 Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen

Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict the related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information.

Language Modelling Video Captioning

Deep Region Hashing for Efficient Large-scale Instance Search from Images

no code implementations26 Jan 2017 Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Heng Tao Shen

Specifically, DRH is an end-to-end deep neural network which consists of object proposal, feature extraction, and hash code generation.

Code Generation Image Retrieval +2

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

no code implementations15 Dec 2016 Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen

Along with the prosperity of recurrent neural network in modelling sequential data and the power of attention mechanism in automatically identify salient information, image captioning, a. k. a., image description, has been remarkably advanced in recent years.

Image Captioning Variational Inference

Binary Subspace Coding for Query-by-Image Video Retrieval

no code implementations6 Dec 2016 Ruicong Xu, Yang Yang, Yadan Luo, Fumin Shen, Zi Huang, Heng Tao Shen

The first approach, termed Inner-product Binary Coding (IBC), preserves the inner relationships of images and videos in a common Hamming space.

Video Retrieval

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

no code implementations6 Oct 2016 Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen

Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation.

Depth Estimation object-detection +2

Where to Focus: Query Adaptive Matching for Instance Retrieval Using Convolutional Feature Maps

no code implementations22 Jun 2016 Jiewei Cao, Lingqiao Liu, Peng Wang, Zi Huang, Chunhua Shen, Heng Tao Shen

Instance retrieval requires one to search for images that contain a particular object within a large corpus.

Zero-Shot Hashing via Transferring Supervised Knowledge

no code implementations16 Jun 2016 Yang Yang, Wei-Lun Chen, Yadan Luo, Fumin Shen, Jie Shao, Heng Tao Shen

Supervised knowledge e. g. semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality of hash codes and hash functions.

Image Retrieval Zero-shot Image Retrieval

What's Wrong With That Object? Identifying Images of Unusual Objects by Modelling the Detection Score Distribution

no code implementations CVPR 2016 Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton Van Den Hengel, Heng Tao Shen

The key observation motivating our approach is that "regular object" images, "unusual object" images and "other objects" images exhibit different region-level scores in terms of both the score values and the spatial distributions.

Gaussian Processes object-detection +1

A Survey on Learning to Hash

no code implementations1 Jun 2016 Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, Heng Tao Shen

In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations.


Structured Learning of Binary Codes with Column Generation

no code implementations22 Feb 2016 Guosheng Lin, Fayao Liu, Chunhua Shen, Jianxin Wu, Heng Tao Shen

Our column generation based method can be further generalized from the triplet loss to a general structured learning based framework that allows one to directly optimize multivariate performance measures.

Image Retrieval Information Retrieval

Hi Detector, What's Wrong with that Object? Identifying Irregular Object From Images by Modelling the Detection Score Distribution

no code implementations14 Feb 2016 Peng Wang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel, Heng Tao Shen

To address this problem, we propose a novel approach by inspecting the distribution of the detection scores at multiple image regions based on the detector trained from the "regular object" and "other objects".

Gaussian Processes

Order-aware Convolutional Pooling for Video Based Action Recognition

no code implementations31 Jan 2016 Peng Wang, Lingqiao Liu, Chunhua Shen, Heng Tao Shen

Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame.

Action Recognition

Compositional Model based Fisher Vector Coding for Image Classification

1 code implementation16 Jan 2016 Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, Heng Tao Shen

To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions.

Classification General Classification +1

Learning Binary Codes for Maximum Inner Product Search

no code implementations ICCV 2015 Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, Heng Tao Shen

Inspired by the latest advance in asymmetric hashing schemes, we propose an asymmetric binary code learning framework based on inner product fitting.

Optimal Graph Learning With Partial Tags and Multiple Features for Image and Video Annotation

no code implementations CVPR 2015 Lianli Gao, Jingkuan Song, Feiping Nie, Yan Yan, Nicu Sebe, Heng Tao Shen

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available.

graph construction Graph Learning

Temporal Pyramid Pooling Based Convolutional Neural Networks for Action Recognition

no code implementations4 Mar 2015 Peng Wang, Yuanzhouhan Cao, Chunhua Shen, Lingqiao Liu, Heng Tao Shen

One challenge is that video contains a varying number of frames which is incompatible to the standard input format of CNNs.

Action Recognition Image Classification

Hashing on Nonlinear Manifolds

no code implementations2 Dec 2014 Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, Zhenmin Tang, Heng Tao Shen

In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance.

Image Classification Quantization +1

Hashing for Similarity Search: A Survey

no code implementations13 Aug 2014 Jingdong Wang, Heng Tao Shen, Jingkuan Song, Jianqiu Ji

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database.

Face Identification with Second-Order Pooling

no code implementations26 Jun 2014 Fumin Shen, Chunhua Shen, Heng Tao Shen

Spatial pyramid pooling of features encoded by an over-complete dictionary has been the key component of many state-of-the-art image classification systems.

Face Identification Face Recognition +3

Face Image Classification by Pooling Raw Features

no code implementations26 Jun 2014 Fumin Shen, Chunhua Shen, Heng Tao Shen

We propose a very simple, efficient yet surprisingly effective feature extraction method for face recognition (about 20 lines of Matlab code), which is mainly inspired by spatial pyramid pooling in generic image classification.

Classification Face Recognition +2

Optimized Cartesian $K$-Means

no code implementations16 May 2014 Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li

In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.


