no code implementations • ECCV 2020 • Xiaofeng Yang, Guosheng Lin, Fengmao Lv, Fayao Liu
Compositional visual question answering requires reasoning over both semantic and geometry object relations.
no code implementations • 3 Jan 2025 • Heqing Zou, Tianze Luo, Guiyang Xie, Victor, Zhang, Fengmao Lv, Guangcong Wang, Junyang Chen, Zhuochen Wang, Hansheng Zhang, Huaijian Zhang
Multimodal large language models have become a popular topic in deep visual understanding due to many promising real-world applications.
no code implementations • 30 Oct 2024 • Haowen Xiao, Guanghui Liu, Xinyi Gao, Yang Li, Fengmao Lv, Jielei Chu
However, most of these approaches focus on the joint optimization of all samples in the dataset or on constraining the category distribution, with little attention given to whether each individual sample is optimally guided during training.
no code implementations • 16 Oct 2024 • Zongxin Shen, Yanyong Huang, Dongjie Wang, Minbo Ma, Fengmao Lv, Tianrui Li
Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph.
1 code implementation • 27 Sep 2024 • Heqing Zou, Tianze Luo, Guiyang Xie, Victor, Zhang, Fengmao Lv, Guangcong Wang, Junyang Chen, Zhuochen Wang, Hansheng Zhang, Huaijian Zhang
Given the diverse nature of visual data, MultiModal Large Language Models (MM-LLMs) exhibit variations in model designing and training for understanding images, short videos, and long videos.
no code implementations • 18 Jun 2024 • Yanyong Huang, Li Yang, Dongjie Wang, Ke Li, Xiuwen Yi, Fengmao Lv, Tianrui Li
Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance.
1 code implementation • 9 Feb 2024 • Ziqiao Shang, Bin Liu, Fengmao Lv, Fei Teng, Tianrui Li
For the Facial Action Unit (AU) detection task, accurately capturing the subtle facial differences between distinct AUs is essential for reliable detection.
no code implementations • 22 Jan 2024 • Zhenyu Wu, Fengmao Lv, Chenglizhao Chen, Aimin Hao, Shuo Li
Colorectal polyp segmentation (CPS), an essential problem in medical image analysis, has garnered growing research attention.
no code implementations • 19 Jan 2024 • Yanyong Huang, Zongxin Shen, Tianrui Li, Fengmao Lv
UNIFIER explores the local structure of multi-view data by adaptively learning similarity-induced graphs from both the sample and feature spaces.
no code implementations • 28 Dec 2023 • Weide Liu, Huijing Zhan, Hao Chen, Fengmao Lv
Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues.
no code implementations • 31 Jul 2023 • Haochen Shi, Xinyao Liu, Fengmao Lv, Hongtao Xue, Jie Hu, Shengdong Du, Tianrui Li
In the era of big data, the issue of data quality has become increasingly prominent.
no code implementations • 20 Aug 2022 • Yanyong Huang, Zongxin Shen, Yuxin Cai, Xiuwen Yi, Dongjie Wang, Fengmao Lv, Tianrui Li
Besides, learning the complete similarity graph, as an important promising technology in existing MUFS methods, cannot achieve due to the missing views.
no code implementations • 6 Jan 2022 • Xiaofeng Yang, Fengmao Lv, Fayao Liu, Guosheng Lin
We use the labeled image data to train a teacher model and use the trained model to generate pseudo captions on unlabeled image data.
no code implementations • CVPR 2022 • Tao Liang, Guosheng Lin, Mingyang Wan, Tianrui Li, Guojun Ma, Fengmao Lv
Through the proposed MI2P unit, we can inject the language information into the vision backbone by attending the word-wise textual features to different visual channels, as well as inject the visual information into the language backbone by attending the channel-wise visual features to different textual words.
no code implementations • CVPR 2021 • Fengmao Lv, Xiang Chen, Yanyong Huang, Lixin Duan, Guosheng Lin
In turn, it also collects the reinforced features from each modality and uses them to generate a reinforced common message.
no code implementations • ICCV 2021 • Tao Liang, Guosheng Lin, Lei Feng, Yan Zhang, Fengmao Lv
To this end, both the marginal distribution and the elements with high-confidence correlations are aligned over the common space of the query and key vectors which are computed from different modalities.
no code implementations • 27 Dec 2020 • Yanyong Huang, Zongxin Shen, Fuxu Cai, Tianrui Li, Fengmao Lv
Other existing methods choose the discriminative features with low redundancy by constructing the graph matrix on the original feature space.
1 code implementation • 1 Jul 2020 • Haiyang Liu, Yichen Wang, Jiayi Zhao, Guowu Yang, Fengmao Lv
Our method assumes that both the source images with full pixel-level labels and unlabeled target images are available during training.
no code implementations • 16 Jun 2020 • Tao Liang, Wenya Wang, Fengmao Lv
Specifically, the aspect category information is used to construct pivot knowledge for transfer with assumption that the interactions between sentence-level aspect category and token-level aspect terms are invariant across domains.
no code implementations • CVPR 2020 • Fengmao Lv, Tao Liang, Xiang Chen, Guosheng Lin
Our method mainly focuses on constructing pivot information that is common knowledge shared across domains as a bridge to promote the adaptation of semantic segmentation model from synthetic domains to real-world domains.
Ranked #25 on
Domain Adaptation
on SYNTHIA-to-Cityscapes
no code implementations • 17 Apr 2020 • Senlin Shu, Fengmao Lv, Yan Yan, Li Li, Shuo He, Jun He
In this article, we propose to leverage the data augmentation technique to improve the performance of multi-label learning.
no code implementations • 31 Mar 2020 • Fengmao Lv, Jianyang Zhang, Guowu Yang, Lei Feng, YuFeng Yu, Lixin Duan
Zero-Shot Learning (ZSL) learns models for recognizing new classes.
1 code implementation • ICCV 2019 • Qing Lian, Fengmao Lv, Lixin Duan, Boqing Gong
We propose a new approach, called self-motivated pyramid curriculum domain adaptation (PyCDA), to facilitate the adaptation of semantic segmentation neural networks from synthetic source domains to real target domains.
Ranked #14 on
Image-to-Image Translation
on SYNTHIA-to-Cityscapes
no code implementations • 21 Apr 2019 • Chaofan Tao, Fengmao Lv, Lixin Duan, Min Wu
Unlike most existing approaches which employ a generator to deal with domain difference, MMEN focuses on learning the categorical information from unlabeled target samples with the help of labeled source samples.