1 code implementation • 16 Dec 2024 • Shihan Wu, Ji Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen
Prompt tuning (PT) has long been recognized as an effective and efficient paradigm for transferring large pre-trained vision-language models (VLMs) to downstream tasks by learning a tiny set of context vectors.
no code implementations • 13 Dec 2024 • Sitong Su, Xiao Cai, Lianli Gao, Pengpeng Zeng, Qinhong Du, Mengqi Li, Heng Tao Shen, Jingkuan Song
Our metrics are dissected into a) Textual-3D Alignment measures textual alignment with multi-granularity visual 3D representations; and b) 3D Visual Quality which considers texture fidelity, multi-view consistency, and geometry correctness.
1 code implementation • 18 Nov 2024 • Chenhang Cui, Gelei Deng, An Zhang, Jingnan Zheng, Yicong Li, Lianli Gao, Tianwei Zhang, Tat-Seng Chua
Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications.
no code implementations • 10 Oct 2024 • Xiao Cai, Pengpeng Zeng, Lianli Gao, Junchen Zhu, Jiaxin Zhang, Sitong Su, Heng Tao Shen, Jingkuan Song
Recent advancements in generic 3D content generation from text prompts have been remarkable by fine-tuning text-to-image diffusion (T2I) models or employing these T2I models as priors to learn a general text-to-3D model.
1 code implementation • 2 Oct 2024 • Hao Li, Jiayang Gu, Jingkuan Song, An Zhang, Lianli Gao
Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical.
no code implementations • 9 Sep 2024 • Run Luo, Haonan Zhang, Longze Chen, Ting-En Lin, Xiong Liu, Yuchuan Wu, Min Yang, Minzheng Wang, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Yunshui Li, Xiaobo Xia, Fei Huang, Jingkuan Song, Yongbin Li
This framework iteratively improve data quality through a refined combination of fine-grained perception, cognitive reasoning, and interaction evolution, generating a more complex and diverse image-text instruction dataset that empowers MLLMs with enhanced capabilities.
1 code implementation • 17 Jul 2024 • Youheng Sun, Shengming Yuan, Xuanhan Wang, Lianli Gao, Jingkuan Song
Notably, compared with other generative methods, our method achieves an approximately $14. 13\%$ higher attack success rate for unknown classes and an approximately $4. 23\%$ higher success rate for known classes.
1 code implementation • 24 May 2024 • Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen
To tackle this issue, we conducted the theoretical analysis to promote the effectiveness of contrast decoding.
1 code implementation • 21 May 2024 • Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, HengTao Shen
Then, we devise a shared local interaction module that employs several learnable queries to capture latent semantic concepts for learning fine-grained alignment.
1 code implementation • 16 May 2024 • Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping Ye
We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes.
1 code implementation • 18 Mar 2024 • Jianzhi Liu, Junchen Zhu, Lianli Gao, Heng Tao Shen, Jingkuan Song
The open-domain video generation models are constrained by the scale of the training video datasets, and some less common actions still cannot be generated.
1 code implementation • 13 Mar 2024 • Cheng Chen, Junchen Zhu, Xu Luo, HengTao Shen, Lianli Gao, Jingkuan Song
To this end, we introduce MoELoRA to MLLMs which is effective to retain the previous instruction alignment.
no code implementations • 17 Jan 2024 • Jiaqi Guo, Sitong Su, Junchen Zhu, Lianli Gao, Jingkuan Song
Therefore, we propose a training-free pipeline employing a pre-trained diffusion model imbued with semantic prior knowledge, which can process composite videos with broader semantic disparities.
no code implementations • 29 Dec 2023 • Qishen Chen, Jianzhi Liu, Xinyu Lyu, Lianli Gao, Heng Tao Shen, Jingkuan Song
Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image.
1 code implementation • CVPR 2024 • Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen
Then, in Context-aware Simulator Learning stage, we train a Content-aware Prompt Simulator under a simulated test scenarios to produce the corresponding CaDP.
no code implementations • 6 Dec 2023 • Sitong Su, Litao Guo, Lianli Gao, Heng Tao Shen, Jingkuan Song
Story Visualization aims to generate images aligned with story prompts, reflecting the coherence of storybooks through visual consistency among characters and scenes. Whereas current approaches exclusively concentrate on characters and neglect the visual consistency among contextually correlated scenes, resulting in independent character images without inter-image coherence. To tackle this issue, we propose a new presentation form for Story Visualization called Storyboard, inspired by film-making, as illustrated in Fig. 1. Specifically, a Storyboard unfolds a story into visual representations scene by scene.
no code implementations • 6 Dec 2023 • Sitong Su, Jianzhi Liu, Lianli Gao, Jingkuan Song
Recently Text-to-Video (T2V) synthesis has undergone a breakthrough by training transformers or diffusion models on large-scale datasets.
1 code implementation • 1 Dec 2023 • Cheng Chen, Jingkuan Song, Lianli Gao, Heng Tao Shen
Catastrophic Forgetting (CF) is a prominent issue in continual learning.
no code implementations • 28 Nov 2023 • Sitong Su, Litao Guo, Lianli Gao, HengTao Shen, Jingkuan Song
To tackle the two issues, we propose a prompt-adaptive and disentangled motion control strategy coined as MotionZero, which derives motion priors from prompts of different objects by Large-Language-Models and accordingly applies motion control of different objects to corresponding regions in disentanglement.
1 code implementation • 25 Nov 2023 • Chen Cheng, Jingkuan Song, Xiaosu Zhu, Junchen Zhu, Lianli Gao, HengTao Shen
To address this issue, after analyzing the phenomenon and identifying the lack of diversity as a vital factor, we propose a method named Codebook for Unsupervised Continual Learning (CUCL) which promotes the model to learn discriminative features to complete the class boundary.
1 code implementation • 25 Nov 2023 • Heng Tao Shen, Cheng Chen, Peng Wang, Lianli Gao, Meng Wang, Jingkuan Song
In this paper, we propose Continual Referring Expression Comprehension (CREC), a new setting for REC, where a model is learning on a stream of incoming tasks.
1 code implementation • 25 Nov 2023 • Cheng Chen, Ji Zhang, Jingkuan Song, Lianli Gao
Catastrophic forgetting is one of the most critical challenges in Continual Learning (CL).
no code implementations • 3 Nov 2023 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li
In light of this, we introduce SG2HOI+, a unified one-step model based on the Transformer architecture.
no code implementations • 5 Oct 2023 • Xu Luo, Difan Zou, Lianli Gao, Zenglin Xu, Jingkuan Song
Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model.
1 code implementation • NeurIPS 2023 • Hao Li, Jingkuan Song, Lianli Gao, Xiaosu Zhu, Heng Tao Shen
In this paper, we propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
Ranked #16 on
Video Retrieval
on MSVD
1 code implementation • CVPR 2024 • Ji Zhang, Shihan Wu, Lianli Gao, Heng Tao Shen, Jingkuan Song
Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue, i. e., the vast majority of feature channels are occupied by base-specific knowledge, resulting in the collapse of taskshared knowledge important to new tasks.
Ranked #2 on
Prompt Engineering
on Stanford Cars
no code implementations • 23 Aug 2023 • Xiaojia Chen, Xuanhan Wang, Lianli Gao, Beitao Chen, Jingkuan Song, HenTao Shen
Existing methods of multiple human parsing (MHP) apply statistical models to acquire underlying associations between images and labeled body parts.
1 code implementation • 20 Aug 2023 • Ji Zhang, Lianli Gao, Bingguang Hao, Hao Huang, Jingkuan Song, HengTao Shen
Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
+1
no code implementations • 10 Aug 2023 • Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song
It integrates two components: Semantic Debiasing (SD) and Balanced Predicate Learning (BPL), for these imbalances.
no code implementations • 10 Aug 2023 • Xinyu Lyu, Jingwei Liu, Yuyu Guo, Lianli Gao
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
no code implementations • 9 Aug 2023 • Xinyu Lyu, Lianli Gao, Junlin Xie, Pengpeng Zeng, Yulu Tian, Jie Shao, Heng Tao Shen
To the end, we propose the Multi-Concept Learning (MCL) framework, which ensures a balanced learning process across rare/ uncommon/ common concepts.
1 code implementation • ICCV 2023 • Hao Ni, Yuke Li, Lianli Gao, Heng Tao Shen, Jingkuan Song
Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features.
Domain Generalization
Generalizable Person Re-identification
no code implementations • 31 Jul 2023 • Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo
In the basic generation, we take advantage of the pretrained image diffusion model, and adapt it to a high-quality open-domain vertical video generator for mobile devices.
no code implementations • 12 Jun 2023 • Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu
To generate videos, we extend the capabilities of a pretrained text-to-image diffusion model through a two-stage process.
1 code implementation • CVPR 2023 • Chaofan Zheng, Xinyu Lyu, Lianli Gao, Bo Dai, Jingkuan Song
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
3 code implementations • ICCV 2023 • Ji Zhang, Lianli Gao, Xu Luo, HengTao Shen, Jingkuan Song
Test-time task adaptation in few-shot learning aims to adapt a pre-trained task-agnostic model for capturing taskspecific knowledge of the test task, rely only on few-labeled support samples.
no code implementations • 10 Mar 2023 • Boheng Zeng, Lianli Gao, Qilong Zhang, CHAOQUN LI, Jingkuan Song, ShuaiQi Jing
However, our method still outperforms existing methods when attacking transformers.
3 code implementations • 28 Jan 2023 • Xu Luo, Hao Wu, Ji Zhang, Lianli Gao, Jing Xu, Jingkuan Song
Few-shot classification consists of a training phase where a model is learned on a relatively large dataset and an adaptation phase where the learned model is adapted to previously-unseen tasks with limited labeled samples.
2 code implementations • NeurIPS 2022 2022 • Hao Li, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Haonan Zhang, Gongfu Li
To verify the effectiveness of our approach, extensive experiments are conducted on MS-COCO, CUB Captions, and Flickr30K, which are commonly used in cross-modal retrieval.
1 code implementation • 17 Nov 2022 • Pengpeng Zeng, Jinkuan Zhu, Jingkuan Song, Lianli Gao
Specifically, we design a novel embedding method called tree-structured prototype, producing a set of hierarchical representative embeddings which capture the hierarchical semantic structure in textual space.
1 code implementation • 17 Nov 2022 • Pengpeng Zeng, Haonan Zhang, Lianli Gao, Xiangpeng Li, Jin Qian, Heng Tao Shen
Generating consecutive descriptions for videos, i. e., Video Captioning, requires taking full advantage of visual representation along with the generation process.
1 code implementation • 12 Oct 2022 • Xiaosu Zhu, Jingkuan Song, Yu Lei, Lianli Gao, Heng Tao Shen
By testing on a series of hash-models, we obtain performance improvements among all of them, with an up to $26. 5\%$ increase in mean Average Precision and an up to $20. 5\%$ increase in accuracy.
1 code implementation • 5 Oct 2022 • Shengming Yuan, Qilong Zhang, Lianli Gao, Yaya Cheng, Jingkuan Song
Unrestricted color attacks, which manipulate semantically meaningful color of an image, have shown their stealthiness and success in fooling both human eyes and deep neural networks.
no code implementations • 27 Aug 2022 • Xiaojia Chen, Xuanhan Wang, Lianli Gao, Jingkuan Song
Different from mainstream methods, RepParser solves the multiple human parsing in a new single-stage manner without resorting to person detection or post-grouping. To this end, RepParser decouples the parsing pipeline into instance-aware kernel generation and part-aware human parsing, which are responsible for instance separation and instance-specific part segmentation, respectively.
no code implementations • 17 Aug 2022 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li
In this paper, we introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes but is required to infer relations for unseen target object classes.
1 code implementation • 16 Jul 2022 • Chaofan Zheng, Lianli Gao, Xinyu Lyu, Pengpeng Zeng, Abdulmotaleb El Saddik, Heng Tao Shen
Experiments show that our approach achieves a new state-of-the-art performance on VG and GQA datasets and makes a trade-off between the performance of tail predicates and head ones.
2 code implementations • 12 Jul 2022 • Yuyang Long, Qilong Zhang, Boheng Zeng, Lianli Gao, Xianglong Liu, Jian Zhang, Jingkuan Song
Specifically, we apply a spectrum transformation to the input and thus perform the model augmentation in the frequency domain.
no code implementations • 11 Jul 2022 • Xinyu Lyu, Lianli Gao, Pengpeng Zeng, Heng Tao Shen, Jingkuan Song
The performance of current Scene Graph Generation (SGG) models is severely hampered by hard-to-distinguish predicates, e. g., woman-on/standing on/walking on-beach.
1 code implementation • 30 Jun 2022 • Xuanhan Wang, Yan Dai, Lianli Gao, Jingkuan Song
Specifically, each GCN model in ACFL not only learns action representation from the single-form skeletons, but also adaptively mimics useful representations derived from other forms of skeletons.
no code implementations • 23 Jun 2022 • Chaofan Zheng, Xinyu Lyu, Yuyu Guo, Pengpeng Zeng, Jingkuan Song, Lianli Gao
SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.
1 code implementation • 21 Jun 2022 • Xuanhan Wang, Lianli Gao, Yixuan Zhou, Jingkuan Song, Meng Wang
Human densepose estimation, aiming at establishing dense correspondences between 2D pixels of human body and 3D human body template, is a key technique in enabling machines to have an understanding of people in images.
1 code implementation • 21 Jun 2022 • Xuanhan Wang, Jingkuan Song, Xiaojia Chen, Lechao Cheng, Lianli Gao, Heng Tao Shen
In this article, we propose a Knowledge Embedded RCNN (KE-RCNN) to identify attributes by leveraging rich knowledges, including implicit knowledge (e. g., the attribute ``above-the-hip'' for a shirt requires visual/geometry relations of shirt-hip) and explicit knowledge (e. g., the part of ``shorts'' cannot have the attribute of ``hoodie'' or ``lining'').
no code implementations • 4 Jun 2022 • Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen
Existing visual attention models are generally planar, i. e., different channels of the last conv-layer feature map of an image share the same weight.
no code implementations • 2 Jun 2022 • Lianli Gao, Pengpeng Zeng, Jingkuan Song, Yuan-Fang Li, Wu Liu, Tao Mei, Heng Tao Shen
To date, visual question answering (VQA) (i. e., image QA and video QA) is still a holy grail in vision and language understanding, especially for video QA.
1 code implementation • 19 May 2022 • Xiaoya Chen, Jingkuan Song, Pengpeng Zeng, Lianli Gao, Heng Tao Shen
Video captioning is a challenging task that necessitates a thorough comprehension of visual scenes.
1 code implementation • CVPR 2022 • Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng Tao Shen, Jingkuan Song
The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e. g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child".
1 code implementation • CVPR 2022 • Xiaosu Zhu, Jingkuan Song, Lianli Gao, Feng Zheng, Heng Tao Shen
Modeling latent variables with priors and hyperpriors is an essential problem in variational image compression.
1 code implementation • CVPR 2022 • Ye Liu, Yaya Cheng, Lianli Gao, Xianglong Liu, Qilong Zhang, Jingkuan Song
Specifically, by observing that adversarial examples to a specific defense model follow some regularities in their starting points, we design an Adaptive Direction Initialization strategy to speed up the evaluation.
no code implementations • 9 Mar 2022 • Qilong Zhang, Chaoning Zhang, CHAOQUN LI, Jingkuan Song, Lianli Gao
In this paper, we move a step forward and show the existence of a \textbf{training-free} adversarial perturbation under the no-box threat model, which can be successfully used to attack different DNNs in real-time.
no code implementations • 22 Feb 2022 • Yuyu Guo, Jingqiu Zhang, Lianli Gao
In TS-LSTM, a temporal pooling LSTM (TP-LSTM) is designed to incorporate both spatial and temporal information to extract long-term temporal dynamics within video sub-shots; and a stacked LSTM is introduced to generate a list of words to describe the video.
1 code implementation • 22 Feb 2022 • Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen
Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted from the visual content, e. g., the visual relationships "standing in", "sitting in", and "lying in" may exist between "dog" and "yard", while the Commonsense Knowledge encodes "sense-making" knowledge like "dog can guard yard".
no code implementations • 22 Feb 2022 • Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao Shen, Xuelong Li
Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.
2 code implementations • ICLR 2022 • Qilong Zhang, Xiaodan Li, Yuefeng Chen, Jingkuan Song, Lianli Gao, Yuan He, Hui Xue
Notably, our methods outperform state-of-the-art approaches by up to 7. 71\% (towards coarse-grained domains) and 25. 91\% (towards fine-grained domains) on average.
1 code implementation • CVPR 2022 • Wenwen Pan, Haonan Shi, Zhou Zhao, Jieming Zhu, Xiuqiang He, Zhigeng Pan, Lianli Gao, Jun Yu, Fei Wu, Qi Tian
Audio-Guided video semantic segmentation is a challenging problem in visual analysis and editing, which automatically separates foreground objects from background in a video sequence according to the referring audio expressions.
no code implementations • 5 Nov 2021 • Xuanhan Wang, Xiaojia Chen, Lianli Gao, Lechao Chen, Jingkuan Song
Despite of dramatic progresses in the area of video classification research, a severe problem faced by the community is that the detailed understanding of human actions is ignored.
1 code implementation • 25 Oct 2021 • Yaya Cheng, Jingkuan Song, Xiaosu Zhu, Qilong Zhang, Lianli Gao, Heng Tao Shen
Based on the linearity hypothesis, under $\ell_\infty$ constraint, $sign$ operation applied to the gradients is a good choice for generating perturbations.
1 code implementation • 15 Oct 2021 • Yinpeng Dong, Qi-An Fu, Xiao Yang, Wenzhao Xiang, Tianyu Pang, Hang Su, Jun Zhu, Jiayu Tang, Yuefeng Chen, Xiaofeng Mao, Yuan He, Hui Xue, Chao Li, Ye Liu, Qilong Zhang, Lianli Gao, Yunrui Yu, Xitong Gao, Zhe Zhao, Daquan Lin, Jiadong Lin, Chuanbiao Song, ZiHao Wang, Zhennan Wu, Yang Guo, Jiequan Cui, Xiaogang Xu, Pengguang Chen
Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years.
1 code implementation • ICCV 2021 • Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song
The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.
no code implementations • 20 Aug 2021 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li
Learning accurate low-dimensional embeddings for a network is a crucial task as it facilitates many downstream network analytics tasks.
no code implementations • 20 Aug 2021 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li
Abundant real-world data can be naturally represented by large-scale networks, which demands efficient and effective learning algorithms.
1 code implementation • ICCV 2021 • Tao He, Lianli Gao, Jingkuan Song, Yuan-Fang Li
Human-Object Interaction (HOI) detection is a fundamental visual task aiming at localizing and recognizing interactions between humans and objects.
no code implementations • 19 Aug 2021 • Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li
Scene graphs provide valuable information to many downstream tasks.
1 code implementation • 25 May 2021 • Lianli Gao, Yaya Cheng, Qilong Zhang, Xing Xu, Jingkuan Song
However, the current choice of pixel-wise Euclidean Distance to measure the discrepancy is questionable because it unreasonably imposes a spatial-consistency constraint on the source and target features.
no code implementations • NeurIPS 2021 • Xiaosu Zhu, Jingkuan Song, Lianli Gao, Xiaoyan Gu, HengTao Shen
However, finding the optimal solution to MCQ is proved to be NP-hard due to its encoding process, \textit{i. e.}, converting an input vector to a binary code.
2 code implementations • 20 Apr 2021 • Qilong Zhang, Xiaosu Zhu, Jingkuan Song, Lianli Gao, Heng Tao Shen
Crafting adversarial examples for the transfer-based attack is challenging and remains a research hot spot.
1 code implementation • 31 Dec 2020 • Lianli Gao, Qilong Zhang, Jingkuan Song, Heng Tao Shen
Specifically, we introduce an amplification factor to the step size in each iteration, and one pixel's overall gradient overflowing the $\epsilon$-constraint is properly assigned to its surrounding regions by a project kernel.
4 code implementations • ECCV 2020 • Lianli Gao, Qilong Zhang, Jingkuan Song, Xianglong Liu, Heng Tao Shen
By adding human-imperceptible noise to clean images, the resultant adversarial examples can fool other unknown models.
no code implementations • 13 Jun 2020 • Tao He, Lianli Gao, Jingkuan Song, Jianfei Cai, Yuan-Fang Li
Despite the huge progress in scene graph generation in recent years, its long-tail distribution in object relationships remains a challenging and pestering issue.
1 code implementation • CVPR 2020 • Zhu Zhang, Zhou Zhao, Yang Zhao, Qi. Wang, Huasheng Liu, Lianli Gao
In this paper, we consider a novel task, Spatio-Temporal Video Grounding for Multi-Form Sentences (STVG).
no code implementations • 18 Jan 2020 • Lirong Wu, Kejie Huang, Haibin Shen, Lianli Gao
In this paper, we propose a video compression method that extracts and compresses the foreground and background of the video separately.
1 code implementation • AAAI 2019 • Lei Wang, Dongxiang Zhang, Jipeng Zhang, Xing Xu, Lianli Gao, Bing Tian Dai, Heng Tao Shen
Then, we design a recursive neural network to encode the quantity with Bi-LSTM and self attention, and infer the unknown operator nodes in a bottom-up manner.
1 code implementation • 1 Jul 2019 • Tao He, Yuan-Fang Li, Lianli Gao, Dongxiang Zhang, Jingkuan Song
We evaluate our framework on {four} public benchmark datasets, all of which show that our method is superior to the other state-of-the-art methods on the tasks of object recognition and image retrieval.
1 code implementation • 16 Jun 2019 • Lianli Gao, Xiaosu Zhu, Jingkuan Song, Zhou Zhao, Heng Tao Shen
In this work, we propose a deep progressive quantization (DPQ) model, as an alternative to PQ, for large scale image retrieval.
1 code implementation • 16 Jun 2019 • Jingkuan Song, Xiaosu Zhu, Lianli Gao, Xin-Shun Xu, Wu Liu, Heng Tao Shen
To the end, when the model is trained, a sequence of binary codes can be generated and the code length can be easily controlled by adjusting the number of recurrent iterations.
no code implementations • 26 Dec 2018 • Jingkuan Song, Xiangpeng Li, Lianli Gao, Heng Tao Shen
Also, a hierarchical LSTMs is designed to simultaneously consider both low-level visual information and high-level language context information to support the caption generation.
no code implementations • CVPR 2019 • Peng Wang, Qi Wu, Jiewei Cao, Chunhua Shen, Lianli Gao, Anton Van Den Hengel
Being composed of node attention component and edge attention component, the proposed graph attention mechanism explicitly represents inter-object relationships, and properties with a flexibility and power impossible with competing approaches.
no code implementations • 7 Feb 2018 • Jingkuan Song, Hanwang Zhang, Xiangpeng Li, Lianli Gao, Meng Wang, Richang Hong
Existing video hash functions are built on three isolated stages: frame pooling, relaxed learning, and binarization, which have not adequately explored the temporal order of video frames in a joint binary optimization model, resulting in severe information loss.
no code implementations • 8 Aug 2017 • Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong. Li, Alan Hanjalic, Heng Tao Shen
In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables.
no code implementations • 10 Jul 2017 • Lianli Gao, Jingkuan Song, Xingyi Liu, Junming Shao, Jiajun Liu, Jie Shao
Given the high dimensionality and the high complexity of multimedia data, it is important to investigate new machine learning algorithms to facilitate multimedia data analysis.
1 code implementation • 7 Jul 2017 • Jingkuan Song, Tao He, Hangbo Fan, Lianli Gao
2) how to equip the binary representation with the ability of accurate image retrieval and classification in an unsupervised way?
no code implementations • 5 Jun 2017 • Jingkuan Song, Zhao Guo, Lianli Gao, Wu Liu, Dongxiang Zhang, Heng Tao Shen
Specifically, the proposed framework utilizes the temporal attention for selecting specific frames to predict the related words, while the adjusted temporal attention is for deciding whether to depend on the visual information or the language context information.
no code implementations • 26 Jan 2017 • Jingkuan Song, Tao He, Lianli Gao, Xing Xu, Heng Tao Shen
Specifically, DRH is an end-to-end deep neural network which consists of object proposal, feature extraction, and hash code generation.
no code implementations • CVPR 2015 • Lianli Gao, Jingkuan Song, Feiping Nie, Yan Yan, Nicu Sebe, Heng Tao Shen
In multimedia annotation, due to the time constraints and the tediousness of manual tagging, it is quite common to utilize both tagged and untagged data to improve the performance of supervised learning when only limited tagged training data are available.
no code implementations • 9 Nov 2014 • Lianli Gao, Michael Bruenig, Jane Hunter
Wildfires are frequent, devastating events in Australia that regularly cause significant loss of life and widespread property damage.