1 code implementation • 16 Jul 2022 • Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El Saddik, Heng Tao Shen
Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones.
no code implementations • 11 Jul 2022 • Xinyu Lyu, Lianli Gao, Pengpeng Zeng, Heng Tao Shen, Jingkuan Song
The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.
1 code implementation • 21 Jun 2022 • Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen
In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e. g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e. g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining'').
no code implementations • 4 Jun 2022 • Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen
Existing visual attention models are generally planar, i. e., different channels of the last conv-layer feature map of an image share the same weight.
no code implementations • 2 Jun 2022 • Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen
To date, visual question answering (VQA) (i. e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA.
no code implementations • 24 May 2022 • Yifeng Zhou, Xing Xu, Shuaicheng Liu, Guoqing Wang, Huimin Lu, Heng Tao Shen
To achieve promising results on removing noise from real-world images, most of existing denoising networks are formulated with complex network structure, making them impractical for deployment.
1 code implementation • 19 May 2022 • Xiaoya Chen, Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen
Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes.
1 code implementation • CVPR 2022 • Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng Tao Shen, Jingkuan Song
The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e. g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child".
1 code implementation • CVPR 2022 • Xiaosu Zhu, Jingkuan Song, Lianli Gao, Feng Zheng, Heng Tao Shen
Modeling latent variables with priors and hyperpriors is an essential problem in variational image compression.
no code implementations • 9 Mar 2022 • Qilong Zhang, Chaoning Zhang, CHAOQUN LI, Jingkuan Song, Lianli Gao, Heng Tao Shen
In this paper, we move a step forward and show the existence of a \textbf{training-free} adversarial perturbation under the no-box threat model, which can be successfully used to attack different DNNs in real-time.
1 code implementation • 22 Feb 2022 • Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen
Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted from the visual content, e. g., the visual relationships "standing in", "sitting in", and "lying in" may exist between "dog" and "yard", while the Commonsense Knowledge encodes "sense-making" knowledge like "dog can guard yard".
no code implementations • 22 Feb 2022 • Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao Shen, Xuelong Li
Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.
1 code implementation • CVPR 2022 • Hao Ni, Jingkuan Song, Xiaopeng Luo, Feng Zheng, Wen Li, Heng Tao Shen
Domain Generalizable (DG) person ReID is a challenging task which trains a model on source domains yet generalizes well on target domains.
Domain Generalization
Generalizable Person Re-identification
+1
no code implementations • CVPR 2022 • Xun Jiang, Xing Xu, Jingran Zhang, Fumin Shen, Zuo Cao, Heng Tao Shen
Video events grounding aims at retrieving the most relevant moments from an untrimmed video in terms of a given natural language query.
1 code implementation • 25 Oct 2021 • Yaya Cheng, Jingkuan Song, Xiaosu Zhu, Qilong Zhang, Lianli Gao, Heng Tao Shen
Based on the linearity hypothesis, under $\ell_\infty$ constraint, $sign$ operation applied to the gradients is a good choice for generating perturbations.
1 code implementation • ICCV 2021 • Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song
The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.
no code implementations • 2 Aug 2021 • Zhekai Du, Jingjing Li, Lei Zhu, Ke Lu, Heng Tao Shen
Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis.
no code implementations • CVPR 2021 • Mingxing Zhang, Yang Yang, Xinghan Chen, Yanli Ji, Xing Xu, Jingjing Li, Heng Tao Shen
Then for a moment candidate, we concatenate the starting/middle/ending representations of its starting/middle/ending elements respectively to form the final moment representation.
1 code implementation • 20 Apr 2021 • Qilong Zhang, Xiaosu Zhu, Jingkuan Song, Lianli Gao, Heng Tao Shen
Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.
1 code implementation • 31 Dec 2020 • Lianli Gao, Qilong Zhang, Jingkuan Song, Heng Tao Shen
Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $\epsilon$-constraint is properly assigned to its surrounding regions by a project kernel.
no code implementations • 9 Nov 2020 • Jingyi Zhang, Yong Zhang, Baoyuan Wu, Yanbo Fan, Fumin Shen, Heng Tao Shen
We propose to incorporate the prior about the co-occurrence of relation pairs into the graph to further help alleviate the class imbalance issue.
1 code implementation • CVPR 2020 • Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, Heng Tao Shen
Furthermore, we introduce a new polynomial loss under the universal weighting framework, which defines a weight function for the positive and negative informative pairs respectively.
2 code implementations • ECCV 2020 • Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, Heng Tao Shen
By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models.
no code implementations • 29 Apr 2020 • Yunlian Lv, Ning Xie, Yimin Shi, Zijiao Wang, Heng Tao Shen
On the other hand, TSE module is used to generate sub-targets which allow agent to learn from failures.
no code implementations • 27 Aug 2019 • Zhijun Mai, Guosheng Hu, Dexiong Chen, Fumin Shen, Heng Tao Shen
Since deep networks are capable of memorizing the entire dataset, the corrupted samples generated by vanilla MixUp with a badly chosen interpolation policy will degrade the performance of networks.
no code implementations • 27 Aug 2019 • Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen
It extracts this complementary information of different modality from a connection block, which aims at exploring correlations of different stream features.
Ranked #12 on
Action Recognition
on UCF101
(using extra training data)
no code implementations • 27 Aug 2019 • Jingran Zhang, Fumin Shen, Xing Xu, Heng Tao Shen
In this paper, we propose an efficient temporal reasoning graph (TRG) to simultaneously capture the appearance features and temporal relation between video sequences at multiple time scales.
Ranked #37 on
Action Recognition
on Something-Something V1
2 code implementations • 12 Aug 2019 • Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, Jingkuan Song
We propose a novel framework that achieves remarkable matching performance with acceptable model complexity.
1 code implementation • 16 Jun 2019 • Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen
To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations.
1 code implementation • 16 Jun 2019 • Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen
In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval.
no code implementations • 12 May 2019 • Ziqiang Zheng, Zhibin Yu, Haiyong Zheng, Yang Yang, Heng Tao Shen
It is well known that humans can learn and recognize objects effectively from several limited image samples.
1 code implementation • CVPR 2019 • Yan Xu, Baoyuan Wu, Fumin Shen, Yanbo Fan, Yong Zhang, Heng Tao Shen, Wei Liu
Due to the sequential dependencies among words in a caption, we formulate the generation of adversarial noises for targeted partial captions as a structured output learning problem with latent variables.
no code implementations • 25 Apr 2019 • Lei Zhu, Zi Huang, Zhihui Li, Liang Xie, Heng Tao Shen
To address the problem, in this paper, we propose a novel hashing approach, dubbed as \emph{Discrete Semantic Transfer Hashing} (DSTH).
no code implementations • 24 Apr 2019 • Yanli Ji, Feixiang Xu, Yang Yang, Fumin Shen, Heng Tao Shen, Wei-Shi Zheng
Besides, we propose a View-guided Skeleton CNN (VS-CNN) to tackle the problem of arbitrary-view action recognition.
no code implementations • 26 Dec 2018 • Jingkuan Song, Xiangpeng Li, Lianli Gao, Heng Tao Shen
Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and high-level language context information to support the caption generation.
no code implementations • ECCV 2018 • Zheng Zhang, Li Liu, Jie Qin, Fan Zhu, Fumin Shen, Yong Xu, Ling Shao, Heng Tao Shen
How to economically cluster large-scale multi-view images is a long-standing problem in computer vision.
1 code implementation • ECCV 2018 • Jingyi Zhang, Fumin Shen, Li Liu, Fan Zhu, Mengyang Yu, Ling Shao, Heng Tao Shen, Luc van Gool
The generative model learns a mapping that the distributions of sketches can be indistinguishable from the distribution of natural images using an adversarial loss, and simultaneously learns an inverse mapping based on the cycle consistency loss in order to enhance the indistinguishability.
no code implementations • 22 Aug 2018 • Dongxiang Zhang, Lei Wang, Luming Zhang, Bing Tian Dai, Heng Tao Shen
Solving mathematical word problems (MWPs) automatically is challenging, primarily due to the semantic gap between human-readable words and machine-understandable logics.
no code implementations • ICCV 2017 • Chao Li, Jiewei Cao, Zi Huang, Lei Zhu, Heng Tao Shen
In this paper, we propose a novel approach to automatically maximize the utility of weak semantic annotations (formalized as the semantic relevance of video shots to the target event) to facilitate video event classification.
no code implementations • 8 Aug 2017 • Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong. Li, Alan Hanjalic, Heng Tao Shen
In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables.
no code implementations • CVPR 2017 • Xing Xu, Fumin Shen, Yang Yang, Dongxiang Zhang, Heng Tao Shen, Jingkuan Song
By additionally introducing manifold regularizations on visual data and semantic embeddings, the learned projection can effectively captures the geometrical manifold structure residing in both visual and semantic spaces.
no code implementations • CVPR 2017 • Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton Van Den Hengel, Heng Tao Shen
One-shot learning is a challenging problem where the aim is to recognize a class identified by a single training image.
no code implementations • 5 Jun 2017 • Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen
Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict the related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information.
no code implementations • 26 Jan 2017 • Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Heng Tao Shen
Specifically, DRH is an end-to-end deep neural network which consists of object proposal, feature extraction, and hash code generation.
no code implementations • 15 Dec 2016 • Hao Liu, Yang Yang, Fumin Shen, Lixin Duan, Heng Tao Shen
Along with the prosperity of recurrent neural network in modelling sequential data and the power of attention mechanism in automatically identify salient information, image captioning, a. k. a., image description, has been remarkably advanced in recent years.
no code implementations • 6 Dec 2016 • Ruicong Xu, Yang Yang, Yadan Luo, Fumin Shen, Zi Huang, Heng Tao Shen
The first approach, termed Inner-product Binary Coding (IBC), preserves the inner relationships of images and videos in a common Hamming space.
no code implementations • 6 Oct 2016 • Yuanzhouhan Cao, Chunhua Shen, Heng Tao Shen
Augmenting RGB data with measured depth has been shown to improve the performance of a range of tasks in computer vision including object detection and semantic segmentation.
no code implementations • 22 Jun 2016 • Jiewei Cao, Lingqiao Liu, Peng Wang, Zi Huang, Chunhua Shen, Heng Tao Shen
Instance retrieval requires one to search for images that contain a particular object within a large corpus.
no code implementations • 16 Jun 2016 • Yang Yang, Wei-Lun Chen, Yadan Luo, Fumin Shen, Jie Shao, Heng Tao Shen
Supervised knowledge e. g. semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality of hash codes and hash functions.
no code implementations • 15 Jun 2016 • Yi Bin, Yang Yang, Zi Huang, Fumin Shen, Xing Xu, Heng Tao Shen
Video captioning has been attracting broad research attention in multimedia community.
no code implementations • CVPR 2016 • Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton Van Den Hengel, Heng Tao Shen
The key observation motivating our approach is that "regular object" images, "unusual object" images and "other objects" images exhibit different region-level scores in terms of both the score values and the spatial distributions.
no code implementations • 1 Jun 2016 • Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, Heng Tao Shen
In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations.
no code implementations • 14 Mar 2016 • Fumin Shen, Yadong Mu, Wei Liu, Yang Yang, Heng Tao Shen
The optimization alternatively proceeds over the binary classifiers and image hash codes.
no code implementations • 22 Feb 2016 • Guosheng Lin, Fayao Liu, Chunhua Shen, Jianxin Wu, Heng Tao Shen
Our column generation based method can be further generalized from the triplet loss to a general structured learning based framework that allows one to directly optimize multivariate performance measures.
no code implementations • 14 Feb 2016 • Peng Wang, Lingqiao Liu, Chunhua Shen, Anton Van Den Hengel, Heng Tao Shen
To address this problem, we propose a novel approach by inspecting the distribution of the detection scores at multiple image regions based on the detector trained from the "regular object" and "other objects".
no code implementations • 31 Jan 2016 • Peng Wang, Lingqiao Liu, Chunhua Shen, Heng Tao Shen
Most video based action recognition approaches create the video-level representation by temporally pooling the features extracted at each frame.
1 code implementation • 16 Jan 2016 • Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, Heng Tao Shen
To handle this limitation, in this paper we break the convention which assumes that a local feature is drawn from one of few Gaussian distributions.
no code implementations • ICCV 2015 • Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, Heng Tao Shen
Inspired by the latest advance in asymmetric hashing schemes, we propose an asymmetric binary code learning framework based on inner product fitting.
no code implementations • CVPR 2015 • Lianli Gao, Jingkuan Song, Feiping Nie, Yan Yan, Nicu Sebe, Heng Tao Shen
In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available.
1 code implementation • CVPR 2015 • Fumin Shen, Chunhua Shen, Wei Liu, Heng Tao Shen
This paper has been withdrawn by the authour.
no code implementations • 4 Mar 2015 • Peng Wang, Yuanzhouhan Cao, Chunhua Shen, Lingqiao Liu, Heng Tao Shen
One challenge is that video contains a varying number of frames which is incompatible to the standard input format of CNNs.
no code implementations • 2 Dec 2014 • Fumin Shen, Chunhua Shen, Qinfeng Shi, Anton Van Den Hengel, Zhenmin Tang, Heng Tao Shen
In addition, a supervised inductive manifold hashing framework is developed by incorporating the label information, which is shown to greatly advance the semantic retrieval performance.
no code implementations • 13 Aug 2014 • Jingdong Wang, Heng Tao Shen, Jingkuan Song, Jianqiu Ji
Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database.
no code implementations • 26 Jun 2014 • Fumin Shen, Chunhua Shen, Heng Tao Shen
Spatial pyramid pooling of features encoded by an over-complete dictionary has been the key component of many state-of-the-art image classification systems.
no code implementations • 26 Jun 2014 • Fumin Shen, Chunhua Shen, Heng Tao Shen
We propose a very simple, efficient yet surprisingly effective feature extraction method for face recognition (about 20 lines of Matlab code), which is mainly inspired by spatial pyramid pooling in generic image classification.
no code implementations • 16 May 2014 • Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li
In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.