1 code implementation • Findings (NAACL) 2022 • Jinpeng Hu, He Zhao, Dan Guo, Xiang Wan, Tsung-Hui Chang
In doing so, label information contained in the embedding vectors can be effectively transferred to the target domain, and Bi-LSTM can further model the label relationship among different domains by pre-train and then fine-tune setting.
Cross-Domain Named Entity Recognition
named-entity-recognition
+2
1 code implementation • 15 Apr 2025 • Peipei Song, Long Zhang, Long Lan, Weidong Chen, Dan Guo, Xun Yang, Meng Wang
Partially relevant video retrieval (PRVR) is a practical yet challenging task in text-to-video retrieval, where videos are untrimmed and contain much background content.
no code implementations • 20 Mar 2025 • Pengyu Liu, Guohua Dong, Dan Guo, Kun Li, Fengling Li, Xun Yang, Meng Wang, Xiaomin Ying
This survey systematically reviews recent progress in fMRI-based brain decoding, focusing on stimulus reconstruction from passive brain signals.
1 code implementation • 11 Feb 2025 • Sheng Zhou, Junbin Xiao, Qingyun Li, Yicong Li, Xun Yang, Dan Guo, Meng Wang, Tat-Seng Chua, Angela Yao
We introduce EgoTextVQA, a novel and rigorously constructed benchmark for egocentric QA assistance involving scene text.
no code implementations • 16 Jan 2025 • Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, Xun Yang
3D visual grounding (3DVG), which aims to correlate a natural language description with the target object within a 3D scene, is a significant yet challenging task.
no code implementations • 22 Dec 2024 • Xu Wang, Shengeng Tang, Peipei Song, Shuo Wang, Dan Guo, Richang Hong
Sign Language Production (SLP) aims to generate sign videos corresponding to spoken language sentences, where the conversion of sign Glosses to Poses (G2P) is the key step.
1 code implementation • 21 Dec 2024 • Jingjing Hu, Dan Guo, Zhan Si, Deguang Liu, Yunfeng Diao, Jing Zhang, Jinxing Zhou, Meng Wang
Molecular representation learning plays a crucial role in various downstream tasks, such as molecular property prediction and drug design.
1 code implementation • 19 Dec 2024 • Kun Li, Dan Guo, Guoliang Chen, Chunxiao Fan, Jingyuan Xu, Zhiliang Wu, Hehe Fan, Meng Wang
In addition, we propose a new prototypical diversity amplification loss to strengthen the model's capacity by amplifying the differences between different prototypes.
Ranked #1 on
Micro-Action Recognition
on MA-52
1 code implementation • 18 Dec 2024 • Shengeng Tang, Jiayi He, Dan Guo, Yanyan Wei, Feng Li, Richang Hong
Sign-IDD incorporates a novel Iconicity Disentanglement (ID) module to bridge the gap between relative positions among joints.
1 code implementation • 17 Dec 2024 • Ziheng Zhou, Jinxing Zhou, Wei Qian, Shengeng Tang, Xiaojun Chang, Dan Guo
Specifically, the CMCC module contains two branches: a cross-modal interaction branch and a temporal consistency-gated branch.
1 code implementation • 17 Dec 2024 • Zhenxing Zhang, Yaxiong Wang, Lechao Cheng, Zhun Zhong, Dan Guo, Meng Wang
We present ASAP, a new framework for detecting and grounding multi-modal media manipulation (DGM4). Upon thorough examination, we observe that accurate fine-grained cross-modal semantic alignment between the image and text is vital for accurately manipulation detection and grounding.
no code implementations • 15 Dec 2024 • Pengcheng Zhao, Jinxing Zhou, Yang Zhao, Dan Guo, Yanxiang Chen
However, each segment may contain multiple events, resulting in semantically mixed holistic features that can lead to semantic interference during intra- or cross-modal interactions: the event semantics of one segment may incorporate semantics of unrelated events from other segments.
no code implementations • 14 Dec 2024 • Zhangbin Li, Jinxing Zhou, Jing Zhang, Shengeng Tang, Kun Li, Dan Guo
The M-KPT and S-KPT modules are performed in parallel for each temporal segment, allowing balanced tracking of salient and sounding objects.
no code implementations • 10 Dec 2024 • Kun Li, Xinge Peng, Dan Guo, Xun Yang, Meng Wang
Repetitive Action Counting (RAC) aims to count the number of repetitive actions occurring in videos.
Ranked #4 on
Repetitive Action Counting
on RepCount
no code implementations • 10 Dec 2024 • Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong
To fill this gap, we first examine the current 'gold standard' in Machine Unlearning (MU), i. e., re-training the model after removing the undesirable training data, and find it does not work in SGMs.
no code implementations • 30 Nov 2024 • Feiyang Liu, Dan Guo, Jingyuan Xu, Zihao He, Shengeng Tang, Kun Li, Meng Wang
Following the gaze of other people and analyzing the target they are looking at can help us understand what they are thinking, and doing, and predict the actions that may follow.
no code implementations • 25 Nov 2024 • Shengeng Tang, Jiayi He, Lechao Cheng, Jingjing Wu, Dan Guo, Richang Hong
To address this, we propose a novel framework, Sign-D2C, that employs a conditional diffusion model to synthesize contextually smooth transition frames, enabling the seamless construction of continuous sign language sequences.
1 code implementation • 18 Nov 2024 • Jinxing Zhou, Dan Guo, Ruohao Guo, Yuxin Mao, Jingjing Hu, Yiran Zhong, Xiaojun Chang, Meng Wang
In this paper, we advance the field by introducing the Open-Vocabulary Audio-Visual Event Localization (OV-AVEL) problem, which requires localizing audio-visual events and predicting explicit categories for both seen and unseen data at inference.
no code implementations • 8 Oct 2024 • You Qin, Wei Ji, Xinze Lan, Hao Fei, Xun Yang, Dan Guo, Roger Zimmermann, Lizi Liao
In the realm of video dialog response generation, the understanding of video content and the temporal nuances of conversation history are paramount.
1 code implementation • 22 Sep 2024 • Sheng Zhou, Junbin Xiao, Xun Yang, Peipei Song, Dan Guo, Angela Yao, Meng Wang, Tat-Seng Chua
In this paper, we propose to study Grounded TextVideoQA by forcing models to answer questions and spatio-temporally localize the relevant scene-text regions, thus decoupling QA from scenetext recognition and promoting research towards interpretable QA.
no code implementations • 6 Aug 2024 • Guoliang Chen, Fei Wang, Kun Li, Zhiliang Wu, Hehe Fan, Yi Yang, Meng Wang, Dan Guo
In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024.
no code implementations • 11 Jul 2024 • Jinxing Zhou, Dan Guo, Yuxin Mao, Yiran Zhong, Xiaojun Chang, Meng Wang
Audio-Visual Video Parsing (AVVP) task aims to detect and temporally locate events within audio and visual modalities.
1 code implementation • 8 Jul 2024 • Jinpeng Hu, Tengteng Dong, Luo Gang, Hui Ma, Peng Zou, Xiao Sun, Dan Guo, Xun Yang, Meng Wang
Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis.
1 code implementation • 7 Jul 2024 • Kun Li, Pengyu Liu, Dan Guo, Fei Wang, Zhiliang Wu, Hehe Fan, Meng Wang
This paper specifically focuses on a subset of body actions known as micro-actions, which are subtle, low-intensity body movements with promising applications in human emotion analysis.
no code implementations • 5 Jul 2024 • Pengyu Liu, Fei Wang, Kun Li, Guoliang Chen, Yanyan Wei, Shengeng Tang, Zhiliang Wu, Dan Guo
The Micro-gesture Online Recognition task involves identifying the category and locating the start and end times of micro-gestures in video clips.
no code implementations • 7 Jun 2024 • Wei Qian, Qi Li, Kun Li, Xinke Wang, Xiao Sun, Meng Wang, Dan Guo
This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement in the 3rd Vision-based Remote Physiological Signal Sensing (RePSS) Challenge hosted at IJCAI 2024.
1 code implementation • 3 Jun 2024 • Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang
The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos.
3 code implementations • 16 Apr 2024 • Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi
In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.
1 code implementation • 21 Mar 2024 • Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang
Inspired by the activity-silent and persistent activity mechanisms in human visual perception biology, we design a Unified Static and Dynamic Network (UniSDNet), to learn the semantic association between the video and text/audio queries in a cross-modal environment for efficient video grounding.
2 code implementations • 17 Mar 2024 • Jing Zhang, Liang Zheng, Meng Wang, Dan Guo
This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language.
2 code implementations • CVPR 2024 • Fei Wang, Dan Guo, Kun Li, Zhun Zhong, Meng Wang
To this end, we present FD4MM, a new paradigm of Frequency Decoupling for Motion Magnification with a Multi-level Isomorphic Architecture to capture multi-level high-frequency details and a stable low-frequency structure (motion field) in video space.
1 code implementation • 8 Mar 2024 • Dan Guo, Kun Li, Bin Hu, Yan Zhang, Meng Wang
It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment.
Ranked #2 on
Micro-Action Recognition
on MA-52
no code implementations • CVPR 2024 • Chunxiao Fan, Ziqi Wang, Dan Guo, Meng Wang
Quantization for model compression can efficiently reduce the network complexity and storage requirement but the original training data is necessary to remedy the performance loss caused by quantization.
1 code implementation • 20 Dec 2023 • Zhangbin Li, Dan Guo, Jinxing Zhou, Jing Zhang, Meng Wang
These selected pairs are constrained to have larger similarity values than the mismatched pairs.
Audio-visual Question Answering
Audio-Visual Question Answering (AVQA)
+4
2 code implementations • 7 Dec 2023 • Fei Wang, Dan Guo, Kun Li, Meng Wang
Then, we introduce a novel dynamic filter that eliminates noise cues and preserves critical features in the motion magnification and amplification generation phases.
no code implementations • 13 Oct 2023 • Sheng Zhou, Dan Guo, Jia Li, Xun Yang, Meng Wang
The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers.
1 code implementation • 12 Sep 2023 • Jiaxiu Li, Kun Li, Jia Li, Guoliang Chen, Dan Guo, Meng Wang
Compared with the general video grounding task, MTVG focuses on meticulous actions and changes on the face.
no code implementations • 25 Aug 2023 • Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo, Meng Wang
Specifically, we achieve the results of 0. 8492 and 0. 8439 for MuSe-Personalisation in terms of arousal and valence CCC.
1 code implementation • 15 Aug 2023 • Wei Qian, Dan Guo, Kun Li, Xilan Tian, Meng Wang
Specifically, the proposed Dual-TL uses a Spatial TokenLearner (S-TL) to explore associations in different facial ROIs, which promises the rPPG prediction far away from noisy ROI disturbances.
no code implementations • 11 Aug 2023 • Yen Nhi Truong Vu, Dan Guo, Ahmed Taha, Jason Su, Thomas Paul Matthews
Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice.
no code implementations • 11 Aug 2023 • Kun Li, Dan Guo, Meng Wang
First, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention (i. e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality.
no code implementations • 3 Aug 2023 • Kun Li, Dan Guo, Guoliang Chen, Feiyang Liu, Meng Wang
In this paper, we present the solution of our team HFUT-VUT for the MultiMediate Grand Challenge 2023 at ACM Multimedia 2023.
1 code implementation • 20 Jul 2023 • Kun Li, Dan Guo, Guoliang Chen, Xinge Peng, Meng Wang
In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023.
Ranked #1 on
Micro-gesture Recognition
on iMiGUE
no code implementations • 4 Mar 2023 • Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang
We perform extensive experiments on the LLP dataset and demonstrate that our method can generate high-quality segment-level pseudo labels with the help of our newly proposed loss and the label denoising strategy.
1 code implementation • 30 Jan 2023 • Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong
To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
no code implementations • TMM 2022 • Zhao Xie, Jiansong Chen, Kewei Wu, Dan Guo, Richang Hong
In the global aggregation module, the global prior knowledge is learned by aggregating the visual feature sequence of video into a global vector.
Ranked #62 on
Action Recognition
on Something-Something V2
1 code implementation • 18 Nov 2022 • Jinxing Zhou, Dan Guo, Meng Wang
Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs).
1 code implementation • 14 Oct 2022 • Kang Liu, Feng Xue, Dan Guo, Le Wu, Shujie Li, Richang Hong
This paper aims at solving the mismatch problem between MFE and UIM, so as to generate high-quality embedding representations and better model multimodal user preferences.
1 code implementation • 10 Oct 2022 • Kang Liu, Feng Xue, Xiangnan He, Dan Guo, Richang Hong
In this work, we propose to model multi-grained popularity features and jointly learn them together with high-order connectivity, to match the differentiation of user preferences exhibited in popularity features.
no code implementations • 22 Jul 2022 • Jia Li, Jiantao Nie, Dan Guo, Richang Hong, Meng Wang
PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face, without the need for paired images.
Ranked #7 on
Facial Expression Recognition (FER)
on FER+
2 code implementations • 11 Jul 2022 • Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
no code implementations • 14 Sep 2020 • Yijue Wang, Jieren Deng, Dan Guo, Chenghong Wang, Xianrui Meng, Hang Liu, Caiwen Ding, Sanguthevar Rajasekaran
Distributed learning such as federated learning or collaborative learning enables model training on decentralized data from users and only collects local gradients, where data is processed close to its sources for data privacy.
no code implementations • 24 Jun 2020 • Dan Guo, Yang Wang, Peipei Song, Meng Wang
Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models.
1 code implementation • CVPR 2020 • Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts.
Ranked #12 on
Visual Dialog
on VisDial v0.9 val