no code implementations • 17 Apr 2025 • Yaguang Song, Xiaoshan Yang, Dongmei Jiang, YaoWei Wang, Changsheng Xu
To address this task, we propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention, enabling the model to reduce the modal discrepancy and learn from a sequence of distinct modalities, ultimately completing tasks across multiple modalities within a unified framework.
1 code implementation • 31 Mar 2025 • Dizhan Xue, Jing Cui, Shengsheng Qian, Chuanrui Hu, Changsheng Xu
First, we propose a new Cross-platform Short-Video (XS-Video) dataset, which aims to provide a large-scale and real-world short-video propagation network across various platforms to facilitate the research on short-video propagation.
1 code implementation • 30 Mar 2025 • Lu Yu, Haoyu Han, Zhe Tao, Hantao Yao, Changsheng Xu
Our approach leverages the Concept Bottleneck Layer, aligning semantic consistency with CLIP models to learn human-understandable concepts that can generalize across tasks.
no code implementations • 27 Mar 2025 • Fan Qi, Yu Duan, Changsheng Xu
Recent advances in text-guided diffusion models have revolutionized conditional image generation, yet they struggle to synthesize complex scenes with multiple objects due to imprecise spatial grounding and limited scalability.
no code implementations • 14 Mar 2025 • Guanhua Zheng, Jitao Sang, Changsheng Xu
Attributions aim to identify input pixels that are relevant to the decision-making process.
1 code implementation • 20 Feb 2025 • Haowei Liu, Xi Zhang, Haiyang Xu, Yuyang Wanyan, Junyang Wang, Ming Yan, Ji Zhang, Chunfeng Yuan, Changsheng Xu, Weiming Hu, Fei Huang
From the decision-making perspective, to handle complex user instructions and interdependent subtasks more effectively, we propose a hierarchical multi-agent collaboration architecture that decomposes decision-making processes into Instruction-Subtask-Action levels.
no code implementations • 15 Feb 2025 • Zhenyu Yang, Yuhang Hu, Zemin Du, Dizhan Xue, Shengsheng Qian, JiaHong Wu, Fan Yang, WeiMing Dong, Changsheng Xu
Despite the significant advancements of Large Vision-Language Models (LVLMs) on established benchmarks, there remains a notable gap in suitable evaluation regarding their applicability in the emerging domain of long-context streaming video understanding.
no code implementations • 26 Jan 2025 • Yuxin Zhang, Minyan Luo, WeiMing Dong, Xiao Yang, Haibin Huang, Chongyang Ma, Oliver Deussen, Tong-Yee Lee, Changsheng Xu
The stories and characters that captivate us as we grow up shape unique fantasy worlds, with images serving as the primary medium for visually experiencing these realms.
no code implementations • 23 Jan 2025 • Baochen Xiong, Xiaoshan Yang, Yaguang Song, YaoWei Wang, Changsheng Xu
In this paper, we explore a novel federated multimodal instruction tuning task(FedMIT), which is significant for collaboratively fine-tuning MLLMs on different types of multimodal instruction data on distributed devices.
4 code implementations • 28 Dec 2024 • Linhui Xiao, Xiaoshan Yang, Xiangyuan Lan, YaoWei Wang, Changsheng Xu
Finally, we outline the challenges confronting visual grounding and propose valuable directions for future research, which may serve as inspiration for subsequent researchers.
1 code implementation • IEEE Transactions on Multimedia 2024 • Dizhan Xue, Shengsheng Qian, Quan Fang, Changsheng Xu
Finally, we design a multimodal explanation transformer to construct cross-modal relationships and generate rational explanations.
Explanatory Visual Question Answering
Multimodal Reasoning
+1
no code implementations • 7 Dec 2024 • Ming Tao, Bing-Kun Bao, YaoWei Wang, Changsheng Xu
However, unlike Large Language Models (LLMs) that can learn multiple tasks in a single model based on instructed data, diffusion models always require additional branches, task-specific training strategies, and losses for effective adaptation to different downstream tasks.
no code implementations • 30 Oct 2024 • Yuxin Zhang, Dandan Zheng, Biao Gong, Jingdong Chen, Ming Yang, WeiMing Dong, Changsheng Xu
Lighting plays a pivotal role in ensuring the naturalness of video generation, significantly influencing the aesthetic quality of the generated content.
1 code implementation • 29 Oct 2024 • Lu Yu, Haiyang Zhang, Changsheng Xu
Our goal is to maintain the generalization of the CLIP model and enhance its adversarial robustness: The Attention Refinement module aligns the text-guided attention obtained from the target model via adversarial examples with the text-guided attention acquired from the original model via clean examples.
1 code implementation • ACM MM 2024 • Zhenyu Yang, Shengsheng Qian, Dizhan Xue, JiaHong Wu, Fan Yang, WeiMing Dong, Changsheng Xu
To address this limitation, this paper proposes a training-free method called Semantic Editing Increment for ZS-CIR (SEIZE) to retrieve the target image based on the query image and text without training.
Ranked #3 on
Zero-Shot Composed Image Retrieval (ZS-CIR)
on CIRCO
1 code implementation • ACM MM 2024 • Dizhan Xue, Shengsheng Qian, Changsheng Xu
First, we propose a new Standard Multimodal Explanation (SME) dataset and a new Few-Shot Multimodal Explanation for VQA (FS-MEVQA) task, which aims to generate the multimodal explanation of the underlying reasoning process for solving visual questions with few training samples.
Ranked #1 on
FS-MEVQA
on SME
Explainable artificial intelligence
Explainable Artificial Intelligence (XAI)
+4
1 code implementation • 11 Oct 2024 • Mengyuan Chen, Junyu Gao, Changsheng Xu
A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool and then leveraging a pre-trained vision-language model to perform classification on both in-distribution (ID) and OOD labels.
2 code implementations • 10 Oct 2024 • Linhui Xiao, Xiaoshan Yang, Fang Peng, YaoWei Wang, Changsheng Xu
Simultaneously, the current mask visual language modeling (MVLM) fails to capture the nuanced referential relationship between image-text in referring tasks.
1 code implementation • 1 Oct 2024 • Mengyuan Chen, Junyu Gao, Changsheng Xu
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation that provides reliable predictive uncertainty in a single forward pass, attracting significant attention.
1 code implementation • 19 Sep 2024 • Shengsheng Qian, Zuyi Zhou, Dizhan Xue, Bing Wang, Changsheng Xu
Cross-modal reasoning (CMR), the intricate process of synthesizing and drawing inferences across divergent sensory modalities, is increasingly recognized as a crucial capability in the progression toward more sophisticated and anthropomorphic artificial intelligence systems.
1 code implementation • 7 Sep 2024 • Junyu Gao, Mengyuan Chen, Liangyu Xiang, Changsheng Xu
To address this challenge, a novel paradigm called Evidential Deep Learning (EDL) has emerged, providing reliable uncertainty estimation with minimal additional computation in a single forward pass.
1 code implementation • 2 Aug 2024 • Lu Yu, Zhe Tao, Hantao Yao, Joost Van de Weijer, Changsheng Xu
The semantic knowledge available in the label information of the images, offers important semantic information that can be related with previously acquired knowledge of semantic classes.
no code implementations • 20 Jul 2024 • Yuyang Wanyan, Xiaoshan Yang, WeiMing Dong, Changsheng Xu
Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data in action recognition.
1 code implementation • SIGIR 2024 • Zhenyu Yang, Dizhan Xue, Shengsheng Qian, WeiMing Dong, Changsheng Xu
To conduct ZS-CIR, the prevailing methods employ pre-trained image-to-text models to transform the query image and text into a single text, which is then projected into the common feature space by CLIP to retrieve the target image.
Ranked #8 on
Zero-Shot Composed Image Retrieval (ZS-CIR)
on CIRCO
no code implementations • 4 Jun 2024 • Jifei Luo, Hantao Yao, Changsheng Xu
However, existing techniques that construct the affinity graph based on pairwise instances can lead to the propagation of misinformation from outliers and other manifolds, resulting in inaccurate results.
1 code implementation • 24 May 2024 • Hantao Yao, Rui Zhang, Lu Yu, Yongdong Zhang, Changsheng Xu
Comprehensive evaluations across various benchmarks and tasks confirm SEP's efficacy in prompt tuning.
1 code implementation • 16 May 2024 • Yifan Xu, Xiaoshan Yang, Yaguang Song, Changsheng Xu
Specifically, we incorporate a routed visual expert with a cross-modal bridge module into a pretrained LLM to route the vision and language flows during attention computing to enable different attention patterns in inner-modal modeling and cross-modal interaction scenarios.
1 code implementation • WWW '24: Proceedings of the ACM on Web Conference 2024 2024 • Huaiwen Zhang, Xinxin Liu, Qing Yang, Yang Yang, Fan Qi, Shengsheng Qian, Changsheng Xu
To tackle this challenge, we introduce the Test-Time Training for Rumor Detection (T^3RD) to enhance the performance of rumor detection models on low-resource datasets.
1 code implementation • 20 Apr 2024 • Linhui Xiao, Xiaoshan Yang, Fang Peng, YaoWei Wang, Changsheng Xu
The cross-modal bridge can address the inconsistency between visual features and those required for grounding, and establish a connection between multi-level visual and text features.
1 code implementation • 9 Apr 2024 • Ming Tao, Bing-Kun Bao, Hao Tang, YaoWei Wang, Changsheng Xu
3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly.
1 code implementation • 25 Jan 2024 • Nisha Huang, WeiMing Dong, Yuxin Zhang, Fan Tang, Ronghui Li, Chongyang Ma, Xiu Li, Tong-Yee Lee, Changsheng Xu
Although remarkable progress has been made in image style transfer, style is just one of the components of artistic paintings.
no code implementations • 21 Jan 2024 • Yukun Zuo, Hantao Yao, Lu Yu, Liansheng Zhuang, Changsheng Xu
Nonetheless, these learnable prompts tend to concentrate on the discriminatory knowledge of the current task while ignoring past task knowledge, leading to that learnable prompts still suffering from catastrophic forgetting.
1 code implementation • 11 Jan 2024 • Yukun Zuo, Hantao Yao, Liansheng Zhuang, Changsheng Xu
We introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models, respectively.
no code implementations • CVPR 2024 • Baochen Xiong, Xiaoshan Yang, Yaguang Song, YaoWei Wang, Changsheng Xu
Existing image-based TTA methods cannot be directly applied to this task because video have domain shift in multimodal and temporal which brings difficulties to adaptation.
1 code implementation • 25 Dec 2023 • Chengcheng Ma, Ismail Elezi, Jiankang Deng, WeiMing Dong, Changsheng Xu
For instance, on CIFAR-10-LT, CPE improves test accuracy by over 2. 22% compared to baselines.
1 code implementation • 13 Dec 2023 • Shengsheng Qian, Dizhan Xue, Yifei Wang, Shengjie Zhang, Huaiwen Zhang, Changsheng Xu
After obtaining the threat model trained on the poisoned dataset, our method can precisely detect poisonous samples based on the assumption that masking the backdoor trigger can effectively change the activation of a downstream clustering model.
1 code implementation • 8 Dec 2023 • Yuxin Zhang, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, WeiMing Dong, Changsheng Xu
The essence of a video lies in its dynamic motions, including character actions, object movements, and camera movements.
1 code implementation • CVPR 2024 • Hantao Yao, Rui Zhang, Changsheng Xu
However, those textual tokens have a limited generalization ability regarding unseen domains, as they cannot dynamically adjust to the distribution of testing classes.
1 code implementation • 22 Nov 2023 • Junyu Gao, Xuan Yao, Changsheng Xu
Such agents are typically required to execute user instructions in an online manner, leading us to explore the use of unlabeled test samples for effective online model adaptation.
no code implementations • 12 Oct 2023 • Junyu Gao, Xinhong Ma, Changsheng Xu
Despite the great progress of unsupervised domain adaptation (UDA) with the deep neural networks, current UDA models are opaque and cannot provide promising explanations, limiting their applications in the scenarios that require safe and controllable model decisions.
1 code implementation • 5 Sep 2023 • Dizhan Xue, Shengsheng Qian, Zuyi Zhou, Changsheng Xu
In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.
no code implementations • 30 Aug 2023 • Yifan Xu, Mengdan Zhang, Xiaoshan Yang, Changsheng Xu
In this paper, we for the first time explore helpful multi-modal contextual knowledge to understand novel categories for open-vocabulary object detection (OVD).
no code implementations • 13 Jul 2023 • Jiaming Zhang, Lingyu Qiu, Qi Yi, Yige Li, Jitao Sang, Changsheng Xu, Dit-yan Yeung
The vulnerability of Deep Neural Networks (DNNs) to adversarial attacks poses a significant challenge to their deployment in safety-critical applications.
no code implementations • 5 Jul 2023 • Jie Fu, Junyu Gao, Changsheng Xu
In this paper, to balance the feature learning processes of different modalities, a dynamic gradient modulation (DGM) mechanism is explored, where a novel and effective metric function is designed to measure the imbalanced feature learning between audio and visual modalities.
1 code implementation • NeurIPS 2023 • Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu
To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed.
Ranked #2 on
Few-Shot Object Detection
on ODinW-35
3 code implementations • 25 May 2023 • Yuxin Zhang, WeiMing Dong, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Oliver Deussen, Changsheng Xu
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models.
1 code implementation • 25 May 2023 • Hantao Yao, Lu Yu, Jifei Luo, Changsheng Xu
In this paper, we propose a novel Identity Knowledge Evolution (IKE) framework for CIOR, consisting of the Identity Knowledge Association (IKA), Identity Knowledge Distillation (IKD), and Identity Knowledge Update (IKU).
3 code implementations • 15 May 2023 • Linhui Xiao, Xiaoshan Yang, Fang Peng, Ming Yan, YaoWei Wang, Changsheng Xu
In order to utilize vision and language pre-trained models to address the grounding problem, and reasonably take advantage of pseudo-labels, we propose CLIP-VG, a novel method that can conduct self-paced curriculum adapting of CLIP with pseudo-language labels.
no code implementations • WWW 2023 • Shengsheng Qian, Hong Chen, Dizhan Xue, Quan Fang, Changsheng Xu
To tackle these challenges, we propose an Open-World Social Event Classifier (OWSEC) model in this paper.
1 code implementation • CVPR 2023 • Hantao Yao, Rui Zhang, Changsheng Xu
Representative CoOp-based work combines the learnable textual tokens with the class tokens to obtain specific textual knowledge.
1 code implementation • 9 Mar 2023 • Yuxin Zhang, Fan Tang, WeiMing Dong, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Changsheng Xu
Our framework consists of three key components, i. e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
1 code implementation • 23 Feb 2023 • Nisha Huang, Fan Tang, WeiMing Dong, Tong-Yee Lee, Changsheng Xu
Different from current mask-based image editing methods, we propose a novel region-aware diffusion model (RDM) for entity-level image editing, which could automatically locate the region of interest and replace it following given text prompts.
2 code implementations • CVPR 2023 • Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu
The complex scene understanding ability of CLIP enables the discriminator to accurately assess the image quality.
Ranked #4 on
Text-to-Image Generation
on CUB
1 code implementation • ICCV 2023 • Dizhan Xue, Shengsheng Qian, Changsheng Xu
To address these issues, we propose a Variational Causal Inference Network (VCIN) that establishes the causal correlation between predicted answers and explanations, and captures cross-modal relationships to generate rational explanations.
Ranked #1 on
Explanatory Visual Question Answering
on GQA-REX
no code implementations • CVPR 2023 • Yuyang Wanyan, Xiaoshan Yang, Chaofan Chen, Changsheng Xu
In meta-training, we design an Active Sample Selection (ASS) module to organize query samples with large differences in the reliability of modalities into different groups based on modality-specific posterior distributions.
no code implementations • CVPR 2023 • Mengyuan Chen, Junyu Gao, Changsheng Xu
Targeting at recognizing and localizing action instances with only video-level labels during training, Weakly-supervised Temporal Action Localization (WTAL) has achieved significant progress in recent years.
Open Set Learning
Weakly-supervised Temporal Action Localization
+1
no code implementations • CVPR 2023 • Sisi You, Hantao Yao, Bing-Kun Bao, Changsheng Xu
Recently, Multiple Object Tracking has achieved great success, which consists of object detection, feature embedding, and identity association.
1 code implementation • CVPR 2023 • Junyu Gao, Mengyuan Chen, Changsheng Xu
We argue that, for an event residing in one modality, the modality itself should provide ample presence evidence of this event, while the other complementary modality is encouraged to afford the absence evidence as a reference signal.
1 code implementation • CVPR 2023 • Xi Zhang, Feifei Zhang, Changsheng Xu
Research on continual learning has recently led to a variety of work in unimodal community, however little attention has been paid to multimodal tasks like visual question answering (VQA).
1 code implementation • CVPR 2023 • Jiaming Zhang, Xingjun Ma, Qi Yi, Jitao Sang, Yu-Gang Jiang, YaoWei Wang, Changsheng Xu
Furthermore, we propose to leverage VisionandLanguage Pre-trained Models (VLPMs) like CLIP as the surrogate model to improve the transferability of the crafted UCs to diverse domains.
1 code implementation • 28 Nov 2022 • Fang Peng, Xiaoshan Yang, Linhui Xiao, YaoWei Wang, Changsheng Xu
Although significant progress has been made in few-shot learning, most of existing few-shot image classification methods require supervised pre-training on a large amount of samples of base classes, which limits their generalization ability in real world application.
1 code implementation • CVPR 2023 • Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, WeiMing Dong, Changsheng Xu
Our key idea is to learn artistic style directly from a single painting and then guide the synthesis without providing complex textual descriptions.
Ranked #5 on
Style Transfer
on StyleBench
1 code implementation • 19 Nov 2022 • Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Yong Zhang, WeiMing Dong, Changsheng Xu
Despite the impressive results of arbitrary image-guided style transfer methods, text-driven image stylization has recently been proposed for transferring a natural image into a stylized one according to textual descriptions of the target style provided by the user.
1 code implementation • 4 Nov 2022 • Chengcheng Ma, Yang Liu, Jiankang Deng, Lingxi Xie, WeiMing Dong, Changsheng Xu
Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts.
1 code implementation • ACM MM 2022 • Dizhan Xue, Shengsheng Qian, Quan Fang, Changsheng Xu
Finally, a multimodal transformer decoder constructs attention among multimodal features to learn the story dependency and generates informative, reasonable, and coherent story endings.
Ranked #1 on
Image-guided Story Ending Generation
on LSMDC-E
1 code implementation • 27 Sep 2022 • Nisha Huang, Fan Tang, WeiMing Dong, Changsheng Xu
Extensive experimental results on the quality and quantity of the generated digital art paintings confirm the effectiveness of the combination of the diffusion model and multimodal guidance.
1 code implementation • IEEE Transactions on Pattern Analysis and Machine Intelligence 2022 • Dizhan Xue, Shengsheng Qian, Quan Fang, Changsheng Xu
To date, most of the existing techniques mainly convert multimodal data into a common representation space where similarities in semantics between samples can be easily measured across multiple modalities.
no code implementations • 22 May 2022 • Yufan Hu, Junyu Gao, Changsheng Xu
Most existing state-of-the-art video classification methods assume that the training data obey a uniform distribution.
1 code implementation • 19 May 2022 • Yuxin Zhang, Fan Tang, WeiMing Dong, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Changsheng Xu
Our framework consists of three key components, i. e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
Ranked #4 on
Style Transfer
on StyleBench
2 code implementations • 5 Apr 2022 • Jun Hu, Bryan Hooi, Shengsheng Qian, Quan Fang, Changsheng Xu
Based on a Markov process that trades off two types of distances, we present Markov Graph Diffusion Collaborative Filtering (MGDCF) to generalize some state-of-the-art GNN-based CF models.
Ranked #4 on
Multi-modal Recommendation
on Amazon Sports
1 code implementation • 4 Apr 2022 • Ziyue Wu, Junyu Gao, Shucheng Huang, Changsheng Xu
Then, a commonsense-aware interaction module is designed to obtain bridged visual and text features by utilizing the learned commonsense concepts.
1 code implementation • CVPR 2022 • Junyu Gao, Mengyuan Chen, Changsheng Xu
We target at the task of weakly-supervised action localization (WSAL), where only video-level action labels are available during model training.
1 code implementation • 26 Jan 2022 • Chengcheng Ma, Xingjia Pan, Qixiang Ye, Fan Tang, WeiMing Dong, Changsheng Xu
Semi-supervised object detection has recently achieved substantial progress.
3 code implementations • CVPR 2022 • Yingying Deng, Fan Tang, WeiMing Dong, Chongyang Ma, Xingjia Pan, Lei Wang, Changsheng Xu
The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content.
Ranked #3 on
Style Transfer
on StyleBench
no code implementations • CVPR 2022 • Yiming Li, Xiaoshan Yang, Changsheng Xu
Humans can not only see the collection of objects in visual scenes, but also identify the relationship between objects.
1 code implementation • 9 Dec 2021 • Hantao Yao, Changsheng Xu
Unlike the individual-based updating mechanism, the centroid-based updating mechanism that applies the mean feature of each cluster to update the cluster memory can reduce the impact of individual samples.
Ranked #58 on
Person Re-Identification
on Market-1501
no code implementations • 5 Dec 2021 • Pei Lv, Wentong Wang, Yunxin Wang, Yuzhen Zhang, Mingliang Xu, Changsheng Xu
In detail, when modeling social interaction, we propose a new \emph{social soft attention function}, which fully considers various interaction factors among pedestrians.
1 code implementation • 2 Dec 2021 • Jun Hu, Shengsheng Qian, Quan Fang, Changsheng Xu
Recently the field has advanced from local propagation schemes that focus on local neighbors towards extended propagation schemes that can directly deal with extended neighbors consisting of both local and high-order neighbors.
no code implementations • 1 Dec 2021 • Wei Wang, Junyu Gao, Changsheng Xu
With this in mind, we design a unified causal framework to learn the deconfounded object-relevant association for more accurate and robust video object grounding.
1 code implementation • 19 Nov 2021 • Desheng Cai, Jun Hu, Quan Zhao, Shengsheng Qian, Quan Fang, Changsheng Xu
In this paper, we present GRecX, an open-source TensorFlow framework for benchmarking GNN-based recommendation models in an efficient and unified way.
no code implementations • 18 Oct 2021 • Xiaowen Huang, Jitao Sang, Jian Yu, Changsheng Xu
The cold-start recommendation is an urgent problem in contemporary online applications.
no code implementations • 29 Sep 2021 • Guanhua Zheng, Jitao Sang, Wang Haonan, Changsheng Xu
Recently, backpropagation(BP)-based feature attribution methods have been widely adopted to interpret the internal mechanisms of convolutional neural networks (CNNs), and expected to be human-understandable (lucidity) and faithful to decision-making processes (fidelity).
1 code implementation • IEEE Transactions on Multimedia 2021 • Shengsheng Qian, Dizhan Xue, Quan Fang, Changsheng Xu
Firstly, we construct an instance representation learning branch to transform instances of different modalities into a common representation space.
1 code implementation • 3 Aug 2021 • Yifan Xu, Zhijie Zhang, Mengdan Zhang, Kekai Sheng, Ke Li, WeiMing Dong, Liqing Zhang, Changsheng Xu, Xing Sun
Vision transformers (ViTs) have recently received explosive popularity, but the huge computational cost is still a severe issue.
Ranked #11 on
Efficient ViTs
on ImageNet-1K (with DeiT-T)
1 code implementation • 10 Jul 2021 • Jianyu Wang, Bing-Kun Bao, Changsheng Xu
However, existing graph-based methods fail to perform multi-step reasoning well, neglecting two properties of VideoQA: (1) Even for the same video, different questions may require different amount of video clips or objects to infer the answer with relational reasoning; (2) During reasoning, appearance and motion features have complicated interdependence which are correlated and complementary to each other.
Ranked #29 on
Visual Question Answering (VQA)
on MSRVTT-QA
no code implementations • CVPR 2021 • Chaofan Chen, Xiaoshan Yang, Changsheng Xu, Xuhui Huang, Zhe Ma
Specifically, we first employ the comparison module to explore the pairwise sample relations to learn rich sample representations in the instance-level graph.
no code implementations • 14 Jun 2021 • Pei Lv, Jianqi Fan, Xixi Nie, WeiMing Dong, Xiaoheng Jiang, Bing Zhou, Mingliang Xu, Changsheng Xu
This framework leverages user interactions to retouch and rank images for aesthetic assessment based on deep reinforcement learning (DRL), and generates personalized aesthetic distribution that is more in line with the aesthetic preferences of different users.
4 code implementations • 30 May 2021 • Yingying Deng, Fan Tang, WeiMing Dong, Chongyang Ma, Xingjia Pan, Lei Wang, Changsheng Xu
The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content.
1 code implementation • AAAI 2021 • Shengsheng Qian, Dizhan Xue, Huaiwen Zhang, Quan Fang, Changsheng Xu
To date, most existing methods transform multimodal data into a common representation space where semantic similarities between items can be directly measured across different modalities.
no code implementations • 21 Apr 2021 • Yifan Xu, Kekai Sheng, WeiMing Dong, Baoyuan Wu, Changsheng Xu, Bao-Gang Hu
However, due to unpredictable corruptions (e. g., noise and blur) in real data like web images, domain adaptation methods are increasingly required to be corruption robust on target domains.
no code implementations • 23 Mar 2021 • Xuan Ma, Xiaoshan Yang, Junyu Gao, Changsheng Xu
However, these data streams are multi-source and heterogeneous, containing complex temporal structures with local contextual and global temporal aspects, which makes the feature learning and data joint utilization challenging.
1 code implementation • CVPR 2021 • Xingjia Pan, Yingguo Gao, Zhiwen Lin, Fan Tang, WeiMing Dong, Haolei Yuan, Feiyue Huang, Changsheng Xu
Weakly supervised object localization(WSOL) remains an open problem given the deficiency of finding object extent information using a classification network.
1 code implementation • 27 Jan 2021 • Jun Hu, Shengsheng Qian, Quan Fang, Youze Wang, Quan Zhao, Huaiwen Zhang, Changsheng Xu
We introduce tf_geometric, an efficient and friendly library for graph deep learning, which is compatible with both TensorFlow 1. x and 2. x.
no code implementations • ICCV 2021 • Junyu Gao, Changsheng Xu
To tackle this issue, we replace the cross-modal interaction module with a cross-modal common space, in which moment-query alignment is learned and efficient moment search can be performed.
no code implementations • ICCV 2021 • Xinhong Ma, Junyu Gao, Changsheng Xu
This paper proposes a new paradigm for unsupervised domain adaptation, termed as Active Universal Domain Adaptation (AUDA), which removes all label set assumptions and aims for not only recognizing target samples from source classes but also inferring those from target-private classes by using active learning to annotate a small budget of target data.
no code implementations • 4 Dec 2020 • Zhiyong Huang, Kekai Sheng, WeiMing Dong, Xing Mei, Chongyang Ma, Feiyue Huang, Dengwen Zhou, Changsheng Xu
For intra-domain propagation, we propose an effective self-training strategy to mitigate the noises in pseudo-labeled target domain data and improve the feature discriminability in the target domain.
no code implementations • 17 Sep 2020 • Yingying Deng, Fan Tang, Wei-Ming Dong, Haibin Huang, Chongyang Ma, Changsheng Xu
Towards this end, we propose Multi-Channel Correction network (MCCNet), which can be trained to fuse the exemplar style features and input content features for efficient style transfer while naturally maintaining the coherence of input videos.
3 code implementations • CVPR 2022 • Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu
To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).
Ranked #5 on
Text-to-Image Generation
on CUB
(Inception score metric)
no code implementations • 18 Jun 2020 • Guanhua Zheng, Jitao Sang, Changsheng Xu
Since the basic assumption of conventional manifold learning fails in case of sparse and uneven data distribution, we introduce a new target, Minimum Manifold Coding (MMC), for manifold learning to encourage simple and unfolded manifold.
no code implementations • ICMR 2020 • Youze Wang, Shengsheng Qian, Jun Hu1, Quan Fang, Changsheng Xu
Besides, we not only convert visual information as nodes of graphs but also retrieve external knowledge from real-world knowledge graph as nodes of graphs to provide complementary semantics information to improve fake news detection.
no code implementations • 2 Jun 2020 • Minxuan Lin, Fan Tang, Wei-Ming Dong, Xiao Li, Chongyang Ma, Changsheng Xu
Currently, there are few methods that can perform both multimodal and multi-domain stylization simultaneously.
no code implementations • 31 May 2020 • Hantao Yao, Shaobo Min, Yongdong Zhang, Changsheng Xu
Then, an attentional graph attribute embedding is proposed to reduce the semantic bias between seen and unseen categories, which utilizes the graph operation to capture the semantic relationship between categories.
no code implementations • 30 May 2020 • Hantao Yao, Changsheng Xu
Based on this repulsion constraint, the repulsion term is proposed to reduce the similarity of distractor images that are not most similar to the probe person.
2 code implementations • 27 May 2020 • Yingying Deng, Fan Tang, Wei-Ming Dong, Wen Sun, Feiyue Huang, Changsheng Xu
Arbitrary style transfer is a significant topic with research value and application prospect.
no code implementations • 25 May 2020 • Shangxi Wu, Jitao Sang, Kaiyuan Xu, Guanhua Zheng, Changsheng Xu
Specifically, AALP consists of an adaptive feature optimization module with Guided Dropout to systematically pursue fewer high-contribution features, and an adaptive sample weighting module by setting sample-specific training weights to balance between logits pairing loss and classification loss.
1 code implementation • CVPR 2020 • Xingjia Pan, Yuqiang Ren, Kekai Sheng, Wei-Ming Dong, Haolei Yuan, Xiaowei Guo, Chongyang Ma, Changsheng Xu
However, the detection of oriented and densely packed objects remains challenging because of following inherent reasons: (1) receptive fields of neurons are all axis-aligned and of the same shape, whereas objects are usually of diverse shapes and align along various directions; (2) detection models are typically trained with generic knowledge and may not generalize well to handle specific objects at test time; (3) the limited dataset hinders the development on this task.
no code implementations • 26 Feb 2020 • Minxuan Lin, Yingying Deng, Fan Tang, Wei-Ming Dong, Changsheng Xu
Controllable painting generation plays a pivotal role in image stylization.
no code implementations • 28 Nov 2019 • Yi Huang, Xiaoshan Yang, Changsheng Xu
(1) It can model longitudinal heterogeneous EHRs data via capturing the 3-order correlations of different modalities and the irregular temporal impact of historical events.
no code implementations • 28 Nov 2019 • Guanhua Zheng, Jitao Sang, Houqiang Li, Jian Yu, Changsheng Xu
The derived generalization bound based on the ITID assumption identifies the significance of hypothesis invariance in guaranteeing generalization performance.
1 code implementation • Proceedings of the AAAI Conference on Artificial Intelligence 2019 • Junyu Gao, Tianzhu Zhang, Changsheng Xu
To effectively leverage the knowledge graph, we design a novel Two-Stream Graph Convolutional Network (TS-GCN) consisting of a classifier branch and an instance branch.
Ranked #5 on
Zero-Shot Action Recognition
on Olympics
no code implementations • 24 Jun 2019 • Zhaoquan Yuan, Siyuan Sun, Lixin Duan, Xiao Wu, Changsheng Xu
In AMN, as inspired by generative adversarial networks, we propose to learn multimodal feature representations by finding a more coherent subspace for video clips and the corresponding texts (e. g., subtitles and questions).
no code implementations • 25 May 2019 • Ting-Ting Xie, Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu, Ioannis Patras
Temporal action localization has recently attracted significant interest in the Computer Vision community.
no code implementations • CVPR 2018 • Feifei Zhang, Tianzhu Zhang, Qirong Mao, Changsheng Xu
First, the encoder-decoder structure of the generator can learn a generative and discriminative identity representation for face images.
no code implementations • 3 Mar 2018 • Mingliang Xu, Zhaoyang Ge, Xiaoheng Jiang, Gaoge Cui, Pei Lv, Bing Zhou, Changsheng Xu
DigCrowd first uses the depth information of an image to segment the scene into a far-view region and a near-view region.
no code implementations • ICLR 2018 • Guanhua Zheng, Jitao Sang, Changsheng Xu
DNN is then regarded as approximating the feature conditions with multilayer feature learning, and proved to be a recursive solution towards maximum entropy principle.
no code implementations • CVPR 2017 • Tianzhu Zhang, Changsheng Xu, Ming-Hsuan Yang
In this paper, we propose a multi-task correlation particle filter (MCPF) for robust visual tracking.
no code implementations • CVPR 2016 • Si Liu, Tianzhu Zhang, Xiaochun Cao, Changsheng Xu
In this paper, we propose a novel structural correlation filter (SCF) model for robust visual tracking.
no code implementations • CVPR 2015 • Tianzhu Zhang, Si Liu, Changsheng Xu, Shuicheng Yan, Bernard Ghanem, Narendra Ahuja, Ming-Hsuan Yang
Sparse representation has been applied to visual tracking by finding the best target candidate with minimal reconstruction error by use of target templates.
no code implementations • CVPR 2015 • Si Liu, Xiaodan Liang, Luoqi Liu, Xiaohui Shen, Jianchao Yang, Changsheng Xu, Liang Lin, Xiaochun Cao, Shuicheng Yan
Under the classic K Nearest Neighbor (KNN)-based nonparametric framework, the parametric Matching Convolutional Neural Network (M-CNN) is proposed to predict the matching confidence and displacements of the best matched region in the testing image for a particular semantic region in one KNN image.
no code implementations • CVPR 2014 • Tianzhu Zhang, Kui Jia, Changsheng Xu, Yi Ma, Narendra Ahuja
The proposed part matching tracker (PMT) has a number of attractive properties.