Technical Report: Disentangled Action Parsing Networks for Accurate Part-level Action Parsing

no code implementations5 Nov 2021 Xuanhan Wang, Xiaojia Chen, Lianli Gao, Lechao Chen, Jingkuan Song

Despite of dramatic progresses in the area of video classification research, a severe problem faced by the community is that the detailed understanding of human actions is ignored.

Action Parsing Action Recognition +3

Fast Gradient Non-sign Methods

1 code implementation25 Oct 2021 Yaya Cheng, Xiaosu Zhu, Qilong Zhang, Lianli Gao, Jingkuan Song

However, the side-effect from such operation exists since it leads to the bias of direction between the real gradients and the perturbations.

From General to Specific: Informative Scene Graph Generation via Balance Adjustment

1 code implementation ICCV 2021 Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song

The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.

Graph Generation Scene Graph Generation +1

Unsupervised Domain-adaptive Hash for Networks

no code implementations20 Aug 2021 Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li

Abundant real-world data can be naturally represented by large-scale networks, which demands efficient and effective learning algorithms.

Link Prediction Node Classification

Semi-supervised Network Embedding with Differentiable Deep Quantisation

no code implementations20 Aug 2021 Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li

Learning accurate low-dimensional embeddings for a network is a crucial task as it facilitates many downstream network analytics tasks.

Link Prediction Network Embedding +1

Exploiting Scene Graphs for Human-Object Interaction Detection

1 code implementation ICCV 2021 Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li

Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.

Human-Object Interaction Detection

Feature Space Targeted Attacks by Statistic Alignment

1 code implementation25 May 2021 Lianli Gao, Yaya Cheng, Qilong Zhang, Xing Xu, Jingkuan Song

However, the current choice of pixel-wise Euclidean Distance to measure the discrepancy is questionable because it unreasonably imposes a spatial-consistency constraint on the source and target features.


Revisiting Multi-Codebook Quantization

1 code implementation NeurIPS 2021 Xiaosu Zhu, Jingkuan Song, Lianli Gao, Xiaoyan Gu, HengTao Shen

However, finding the optimal solution to MCQ is proved to be NP-hard due to its encoding process, \textit{i. e.}, converting an input vector to a binary code.


Staircase Sign Method for Boosting Adversarial Attacks

1 code implementation20 Apr 2021 Lianli Gao, Qilong Zhang, Xiaosu Zhu, Jingkuan Song, Heng Tao Shen

Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.

Adversarial Attack

Patch-wise++ Perturbation for Adversarial Targeted Attacks

1 code implementation31 Dec 2020 Lianli Gao, Qilong Zhang, Jingkuan Song, Heng Tao Shen

Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $\epsilon$-constraint is properly assigned to its surrounding regions by a project kernel.

Adversarial Attack

Patch-wise Attack for Fooling Deep Neural Network

2 code implementations ECCV 2020 Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, Heng Tao Shen

By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models.

Adversarial Attack Image Classification

Learning from the Scene and Borrowing from the Rich: Tackling the Long Tail in Scene Graph Generation

no code implementations13 Jun 2020 Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li

Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue.

Graph Generation Scene Graph Generation +1

Binary Neural Networks: A Survey

1 code implementation31 Mar 2020 Haotong Qin, Ruihao Gong, Xianglong Liu, Xiao Bai, Jingkuan Song, Nicu Sebe

The binary neural network, largely saving the storage and computation, serves as a promising technique for deploying deep models on resource-limited devices.

Binarization Image Classification +3

Forward and Backward Information Retention for Accurate Binary Neural Networks

2 code implementations CVPR 2020 Haotong Qin, Ruihao Gong, Xianglong Liu, Mingzhu Shen, Ziran Wei, Fengwei Yu, Jingkuan Song

Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks.

Binarization Neural Network Compression +1

Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking

2 code implementations12 Aug 2019 Tan Wang, Xing Xu, Yang Yang, Alan Hanjalic, Heng Tao Shen, Jingkuan Song

We propose a novel framework that achieves remarkable matching performance with acceptable model complexity.

General Classification Re-Ranking +1

One Network for Multi-Domains: Domain Adaptive Hashing with Intersectant Generative Adversarial Network

1 code implementation1 Jul 2019 Tao He, Yuan-Fang Li, Lianli Gao, Dongxiang Zhang, Jingkuan Song

We evaluate our framework on {four} public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval.

Image Retrieval Object Recognition

Localizing Unseen Activities in Video via Image Query

no code implementations28 Jun 2019 Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Deng Cai

Thus, we consider a new task to localize unseen activities in videos via image queries, named Image-Based Activity Localization.

Action Localization Video Understanding

Open-Ended Long-Form Video Question Answering via Hierarchical Convolutional Self-Attention Networks

no code implementations28 Jun 2019 Zhu Zhang, Zhou Zhao, Zhijie Lin, Jingkuan Song, Xiaofei He

Concretely, we first develop a hierarchical convolutional self-attention encoder to efficiently model long-form video contents, which builds the hierarchical structure for video sequences and captures question-aware long-range dependencies from video context.

Question Answering Video Question Answering

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval

1 code implementation16 Jun 2019 Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen

In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval.

Image Retrieval Quantization

Deep Recurrent Quantization for Generating Sequential Binary Codes

1 code implementation16 Jun 2019 Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen

To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations.

Image Retrieval Quantization

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

no code implementations26 Dec 2018 Jingkuan Song, Xiangpeng Li, Lianli Gao, Heng Tao Shen

Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and high-level language context information to support the caption generation.

Image Captioning Language Modelling +1

Self-Supervised Video Hashing with Hierarchical Binary Auto-encoder

no code implementations7 Feb 2018 Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong

Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss.

Binarization Video Retrieval

Binary Generative Adversarial Networks for Image Retrieval

1 code implementation8 Aug 2017 Jingkuan Song

By restricting the input noise variable of generative adversarial networks (GAN) to be binary and conditioned on the features of each input image, BGAN can simultaneously learn a binary representation per image, and generate an image plausibly similar to the original one.

Image Retrieval

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

no code implementations8 Aug 2017 Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong. Li, Alan Hanjalic, Heng Tao Shen

In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables.

Video Captioning

Deep Binaries: Encoding Semantic-Rich Cues for Efficient Textual-Visual Cross Retrieval

no code implementations ICCV 2017 Yuming Shen, Li Liu, Ling Shao, Jingkuan Song

Cross-modal hashing is usually regarded as an effective technique for large-scale textual-visual cross retrieval, where data from different modalities are mapped into a shared Hamming space for matching.

Cross-Modal Retrieval

Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search

no code implementations13 Jul 2017 Lei Zhu, Zi Huang, Xiaobai Liu, Xiangnan He, Jingkuan Song, Xiaofang Zhou

Finally, compact binary codes are learned on intermediate representation within a tailored discrete binary embedding model which preserves visual relations of images measured with canonical views and removes the involved noises.

Learning in High-Dimensional Multimedia Data: The State of the Art

no code implementations10 Jul 2017 Lianli Gao, Jingkuan Song, Xingyi Liu, Junming Shao, Jiajun Liu, Jie Shao

Given the high dimensionality and the high complexity of multimedia data, it is important to investigate new machine learning algorithms to facilitate multimedia data analysis.

Deep Discrete Hashing with Self-supervised Pairwise Labels

1 code implementation7 Jul 2017 Jingkuan Song, Tao He, Hangbo Fan, Lianli Gao

2) how to equip the binary representation with the ability of accurate image retrieval and classification in an unsupervised way?

General Classification Image Retrieval +1

Matrix Tri-Factorization With Manifold Regularizations for Zero-Shot Learning

no code implementations CVPR 2017 Xing Xu, Fumin Shen, Yang Yang, Dongxiang Zhang, Heng Tao Shen, Jingkuan Song

By additionally introducing manifold regularizations on visual data and semantic embeddings, the learned projection can effectively captures the geometrical manifold structure residing in both visual and semantic spaces.

Transfer Learning Zero-Shot Learning

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

no code implementations5 Jun 2017 Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen

Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict the related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information.

Language Modelling Video Captioning

Deep Region Hashing for Efficient Large-scale Instance Search from Images

no code implementations26 Jan 2017 Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Heng Tao Shen

Specifically, DRH is an end-to-end deep neural network which consists of object proposal, feature extraction, and hash code generation.

Code Generation Image Retrieval +2

A Survey on Learning to Hash

no code implementations1 Jun 2016 Jingdong Wang, Ting Zhang, Jingkuan Song, Nicu Sebe, Heng Tao Shen

In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations.


Localize Me Anywhere, Anytime: A Multi-Task Point-Retrieval Approach

no code implementations ICCV 2015 Guoyu Lu, Yan Yan, Li Ren, Jingkuan Song, Nicu Sebe, Chandra Kambhamettu

The main contribution of our paper is that we use a 3D model reconstructed by a short video as the query to realize 3D-to-3D localization under a multi-task point retrieval framework.

Image-Based Localization Multi-Task Learning

Learning Deep Representations of Appearance and Motion for Anomalous Event Detection

no code implementations6 Oct 2015 Dan Xu, Elisa Ricci, Yan Yan, Jingkuan Song, Nicu Sebe

We present a novel unsupervised deep learning framework for anomalous event detection in complex video scenes.

Anomaly Detection Denoising +1

Optimal Graph Learning With Partial Tags and Multiple Features for Image and Video Annotation

no code implementations CVPR 2015 Lianli Gao, Jingkuan Song, Feiping Nie, Yan Yan, Nicu Sebe, Heng Tao Shen

In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available.

graph construction Graph Learning

Hashing for Similarity Search: A Survey

no code implementations13 Aug 2014 Jingdong Wang, Heng Tao Shen, Jingkuan Song, Jianqiu Ji

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database.

Optimized Cartesian $K$-Means

no code implementations16 May 2014 Jianfeng Wang, Jingdong Wang, Jingkuan Song, Xin-Shun Xu, Heng Tao Shen, Shipeng Li

In OCKM, multiple sub codewords are used to encode the subvector of a data point in a subspace.


