Search Results for author: Dan Guo

Found 54 papers, 29 papers with code

A Label-Aware Autoregressive Framework for Cross-Domain NER

1 code implementation Findings (NAACL) 2022 Jinpeng Hu, He Zhao, Dan Guo, Xiang Wan, Tsung-Hui Chang

In doing so, label information contained in the embedding vectors can be effectively transferred to the target domain, and Bi-LSTM can further model the label relationship among different domains by pre-train and then fine-tune setting.

Cross-Domain Named Entity Recognition named-entity-recognition +2

Towards Efficient Partially Relevant Video Retrieval with Active Moment Discovering

1 code implementation15 Apr 2025 Peipei Song, Long Zhang, Long Lan, Weidong Chen, Dan Guo, Xun Yang, Meng Wang

Partially relevant video retrieval (PRVR) is a practical yet challenging task in text-to-video retrieval, where videos are untrimmed and contain much background content.

Partially Relevant Video Retrieval Retrieval +1

A Survey on fMRI-based Brain Decoding for Reconstructing Multimodal Stimuli

no code implementations20 Mar 2025 Pengyu Liu, Guohua Dong, Dan Guo, Kun Li, Fengling Li, Xun Yang, Meng Wang, Xiaomin Ying

This survey systematically reviews recent progress in fMRI-based brain decoding, focusing on stimulus reconstruction from passive brain signals.

Brain Decoding Image Generation +1

AugRefer: Advancing 3D Visual Grounding via Cross-Modal Augmentation and Spatial Relation-based Referring

no code implementations16 Jan 2025 Xinyi Wang, Na Zhao, Zhiyuan Han, Dan Guo, Xun Yang

3D visual grounding (3DVG), which aims to correlate a natural language description with the target object within a 3D scene, is a significant yet challenging task.

3D visual grounding Decoder +2

Linguistics-Vision Monotonic Consistent Network for Sign Language Production

no code implementations22 Dec 2024 Xu Wang, Shengeng Tang, Peipei Song, Shuo Wang, Dan Guo, Richang Hong

Sign Language Production (SLP) aims to generate sign videos corresponding to spoken language sentences, where the conversion of sign Glosses to Poses (G2P) is the key step.

Sign Language Production

MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights

1 code implementation21 Dec 2024 Jingjing Hu, Dan Guo, Zhan Si, Deguang Liu, Yunfeng Diao, Jing Zhang, Jinxing Zhou, Meng Wang

Molecular representation learning plays a crucial role in various downstream tasks, such as molecular property prediction and drug design.

Drug Design Mamba +4

Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

1 code implementation19 Dec 2024 Kun Li, Dan Guo, Guoliang Chen, Chunxiao Fan, Jingyuan Xu, Zhiliang Wu, Hehe Fan, Meng Wang

In addition, we propose a new prototypical diversity amplification loss to strengthen the model's capacity by amplifying the differences between different prototypes.

Emotion Recognition Micro-Action Recognition

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

1 code implementation18 Dec 2024 Shengeng Tang, Jiayi He, Dan Guo, Yanyan Wei, Feng Li, Richang Hong

Sign-IDD incorporates a novel Iconicity Disentanglement (ID) module to bridge the gap between relative positions among joints.

Attribute Disentanglement +1

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

1 code implementation17 Dec 2024 Zhenxing Zhang, Yaxiong Wang, Lechao Cheng, Zhun Zhong, Dan Guo, Meng Wang

We present ASAP, a new framework for detecting and grounding multi-modal media manipulation (DGM4). Upon thorough examination, we observe that accurate fine-grained cross-modal semantic alignment between the image and text is vital for accurately manipulation detection and grounding.

cross-modal alignment

Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing

no code implementations15 Dec 2024 Pengcheng Zhao, Jinxing Zhou, Yang Zhao, Dan Guo, Yanxiang Chen

However, each segment may contain multiple events, resulting in semantically mixed holistic features that can lead to semantic interference during intra- or cross-modal interactions: the event semantics of one segment may incorporate semantics of unrelated events from other segments.

Patch-level Sounding Object Tracking for Audio-Visual Question Answering

no code implementations14 Dec 2024 Zhangbin Li, Jinxing Zhou, Jing Zhang, Shengeng Tang, Kun Li, Dan Guo

The M-KPT and S-KPT modules are performed in parallel for each temporal segment, allowing balanced tracking of salient and sounding objects.

Audio-visual Question Answering Object Tracking +2

Moderating the Generalization of Score-based Generative Model

no code implementations10 Dec 2024 Wan Jiang, He Wang, Xin Zhang, Dan Guo, Zhaoxin Fan, Yunfeng Diao, Richang Hong

To fill this gap, we first examine the current 'gold standard' in Machine Unlearning (MU), i. e., re-training the model after removing the undesirable training data, and find it does not work in SGMs.

Image Inpainting Machine Unlearning +1

Towards Pixel-Level Prediction for Gaze Following: Benchmark and Approach

no code implementations30 Nov 2024 Feiyang Liu, Dan Guo, Jingyuan Xu, Zihao He, Shengeng Tang, Kun Li, Meng Wang

Following the gaze of other people and analyzing the target they are looking at can help us understand what they are thinking, and doing, and predict the actions that may follow.

Segmentation

Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observation

no code implementations25 Nov 2024 Shengeng Tang, Jiayi He, Lechao Cheng, Jingjing Wu, Dan Guo, Richang Hong

To address this, we propose a novel framework, Sign-D2C, that employs a conditional diffusion model to synthesize contextually smooth transition frames, enabling the seamless construction of continuous sign language sequences.

Denoising

Towards Open-Vocabulary Audio-Visual Event Localization

1 code implementation18 Nov 2024 Jinxing Zhou, Dan Guo, Ruohao Guo, Yuxin Mao, Jingjing Hu, Yiran Zhong, Xiaojun Chang, Meng Wang

In this paper, we advance the field by introducing the Open-Vocabulary Audio-Visual Event Localization (OV-AVEL) problem, which requires localizing audio-visual events and predicting explicit categories for both seen and unseen data at inference.

audio-visual event localization

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

no code implementations8 Oct 2024 You Qin, Wei Ji, Xinze Lan, Hao Fei, Xun Yang, Dan Guo, Roger Zimmermann, Lizi Liao

In the realm of video dialog response generation, the understanding of video content and the temporal nuances of conversation history are paramount.

All Contrastive Learning +1

Scene-Text Grounding for Text-Based Video Question Answering

1 code implementation22 Sep 2024 Sheng Zhou, Junbin Xiao, Xun Yang, Peipei Song, Dan Guo, Angela Yao, Meng Wang, Tat-Seng Chua

In this paper, we propose to study Grounded TextVideoQA by forcing models to answer questions and spatio-temporally localize the relevant scene-text regions, thus decoupling QA from scenetext recognition and promoting research towards interpretable QA.

2k Contrastive Learning +3

Prototype Learning for Micro-gesture Classification

no code implementations6 Aug 2024 Guoliang Chen, Fei Wang, Kun Li, Zhiliang Wu, Hehe Fan, Yi Yang, Meng Wang, Dan Guo

In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024.

Action Recognition Classification +2

PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

1 code implementation8 Jul 2024 Jinpeng Hu, Tengteng Dong, Luo Gang, Hui Ma, Peng Zou, Xiao Sun, Dan Guo, Xun Yang, Meng Wang

Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis.

Ethics Language Modeling +2

MMAD: Multi-label Micro-Action Detection in Videos

1 code implementation7 Jul 2024 Kun Li, Pengyu Liu, Dan Guo, Fei Wang, Zhiliang Wu, Hehe Fan, Meng Wang

This paper specifically focuses on a subset of body actions known as micro-actions, which are subtle, low-intensity body movements with promising applications in human emotion analysis.

Action Analysis Action Detection +2

Micro-gesture Online Recognition using Learnable Query Points

no code implementations5 Jul 2024 Pengyu Liu, Fei Wang, Kun Li, Guoliang Chen, Yanyan Wei, Shengeng Tang, Zhiliang Wu, Dan Guo

The Micro-gesture Online Recognition task involves identifying the category and locating the start and end times of micro-gestures in video clips.

Action Detection

Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement

no code implementations7 Jun 2024 Wei Qian, Qi Li, Kun Li, Xinke Wang, Xiao Sun, Meng Wang, Dan Guo

This paper briefly introduces the solutions developed by our team, HFUT-VUT, for Track 1 of self-supervised heart rate measurement in the 3rd Vision-based Remote Physiological Signal Sensing (RePSS) Challenge hosted at IJCAI 2024.

Contrastive Learning Self-Supervised Learning

Advancing Weakly-Supervised Audio-Visual Video Parsing via Segment-wise Pseudo Labeling

1 code implementation3 Jun 2024 Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang

The Audio-Visual Video Parsing task aims to identify and temporally localize the events that occur in either or both the audio and visual streams of audible videos.

audio-visual event localization Denoising +1

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

3 code implementations16 Apr 2024 Bin Ren, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang, Wei Zhai, Renjing Pei, Jiaming Guo, Songcen Xu, Yang Cao, ZhengJun Zha, Yan Wang, Yi Liu, Qing Wang, Gang Zhang, Liou Zhang, Shijie Zhao, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Xin Liu, Min Yan, Menghan Zhou, Yiqiang Yan, Yixuan Liu, Wensong Chan, Dehua Tang, Dong Zhou, Li Wang, Lu Tian, Barsoum Emad, Bohan Jia, Junbo Qiao, Yunshuai Zhou, Yun Zhang, Wei Li, Shaohui Lin, Shenglong Zhou, Binbin Chen, Jincheng Liao, Suiyi Zhao, Zhao Zhang, Bo wang, Yan Luo, Yanyan Wei, Feng Li, Mingshen Wang, Yawei Li, Jinhan Guan, Dehua Hu, Jiawei Yu, Qisheng Xu, Tao Sun, Long Lan, Kele Xu, Xin Lin, Jingtong Yue, Lehan Yang, Shiyi Du, Lu Qi, Chao Ren, Zeyu Han, YuHan Wang, Chaolin Chen, Haobo Li, Mingjun Zheng, Zhongbao Yang, Lianhong Song, Xingzhuo Yan, Minghan Fu, Jingyi Zhang, Baiang Li, Qi Zhu, Xiaogang Xu, Dan Guo, Chunle Guo, Jiadi Chen, Huanhuan Long, Chunjiang Duanmu, Xiaoyan Lei, Jie Liu, Weilin Jia, Weifeng Cao, Wenlong Zhang, Yanyu Mao, Ruilong Guo, Nihao Zhang, Qian Wang, Manoj Pandey, Maksym Chernozhukov, Giang Le, Shuli Cheng, Hongyuan Wang, Ziyan Wei, Qingting Tang, Liejun Wang, Yongming Li, Yanhui Guo, Hao Xu, Akram Khatami-Rizi, Ahmad Mahmoudi-Aznaveh, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi

In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking.

Image Super-Resolution

Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding

1 code implementation21 Mar 2024 Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang

Inspired by the activity-silent and persistent activity mechanisms in human visual perception biology, we design a Unified Static and Dynamic Network (UniSDNet), to learn the semantic association between the video and text/audio queries in a cross-modal environment for efficient video grounding.

Video Grounding

Training A Small Emotional Vision Language Model for Visual Art Comprehension

2 code implementations17 Mar 2024 Jing Zhang, Liang Zheng, Meng Wang, Dan Guo

This paper develops small vision language models to understand visual art, which, given an art work, aims to identify its emotion category and explain this prediction with natural language.

Language Modeling Language Modelling

Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

2 code implementations CVPR 2024 Fei Wang, Dan Guo, Kun Li, Zhun Zhong, Meng Wang

To this end, we present FD4MM, a new paradigm of Frequency Decoupling for Motion Magnification with a Multi-level Isomorphic Architecture to capture multi-level high-frequency details and a stable low-frequency structure (motion field) in video space.

Motion Magnification Representation Learning

Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

1 code implementation8 Mar 2024 Dan Guo, Kun Li, Bin Hu, Yan Zhang, Meng Wang

It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment.

Benchmarking Micro-Action Recognition

Data-Free Quantization via Pseudo-label Filtering

no code implementations CVPR 2024 Chunxiao Fan, Ziqi Wang, Dan Guo, Meng Wang

Quantization for model compression can efficiently reduce the network complexity and storage requirement but the original training data is necessary to remedy the performance loss caused by quantization.

Data Free Quantization Model Compression +2

EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering within Transformer

2 code implementations7 Dec 2023 Fei Wang, Dan Guo, Kun Li, Meng Wang

Then, we introduce a novel dynamic filter that eliminates noise cues and preserves critical features in the motion magnification and amplification generation phases.

Denoising Motion Magnification

Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA

no code implementations13 Oct 2023 Sheng Zhou, Dan Guo, Jia Li, Xun Yang, Meng Wang

The associations between these repetitive objects are superfluous for answer reasoning; (2) two spatially distant OCR tokens detected in the image frequently have weak semantic dependencies for answer reasoning; and (3) the co-existence of nearby objects and tokens may be indicative of important visual cues for predicting answers.

Graph Learning Object +6

Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding

1 code implementation12 Sep 2023 Jiaxiu Li, Kun Li, Jia Li, Guoliang Chen, Dan Guo, Meng Wang

Compared with the general video grounding task, MTVG focuses on meticulous actions and changes on the face.

Sentence text similarity +1

Exploiting Diverse Feature for Multimodal Sentiment Analysis

no code implementations25 Aug 2023 Jia Li, Wei Qian, Kun Li, Qi Li, Dan Guo, Meng Wang

Specifically, we achieve the results of 0. 8492 and 0. 8439 for MuSe-Personalisation in terms of arousal and valence CCC.

Multimodal Sentiment Analysis

Dual-path TokenLearner for Remote Photoplethysmography-based Physiological Measurement with Facial Videos

1 code implementation15 Aug 2023 Wei Qian, Dan Guo, Kun Li, Xilan Tian, Meng Wang

Specifically, the proposed Dual-TL uses a Spatial TokenLearner (S-TL) to explore associations in different facial ROIs, which promises the rPPG prediction far away from noisy ROI disturbances.

M&M: Tackling False Positives in Mammography with a Multi-view and Multi-instance Learning Sparse Detector

no code implementations11 Aug 2023 Yen Nhi Truong Vu, Dan Guo, Ahmed Taha, Jason Su, Thomas Paul Matthews

Deep-learning-based object detection methods show promise for improving screening mammography, but high rates of false positives can hinder their effectiveness in clinical practice.

object-detection Object Detection

ViGT: Proposal-free Video Grounding with Learnable Token in Transformer

no code implementations11 Aug 2023 Kun Li, Dan Guo, Meng Wang

First, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention (i. e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality.

Feature Correlation regression +1

Data Augmentation for Human Behavior Analysis in Multi-Person Conversations

no code implementations3 Aug 2023 Kun Li, Dan Guo, Guoliang Chen, Feiyang Liu, Meng Wang

In this paper, we present the solution of our team HFUT-VUT for the MultiMediate Grand Challenge 2023 at ACM Multimedia 2023.

Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification

1 code implementation20 Jul 2023 Kun Li, Dan Guo, Guoliang Chen, Xinge Peng, Meng Wang

In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023.

Action Classification Classification +2

Improving Audio-Visual Video Parsing with Pseudo Visual Labels

no code implementations4 Mar 2023 Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang

We perform extensive experiments on the LLP dataset and demonstrate that our method can generate high-quality segment-level pseudo labels with the help of our newly proposed loss and the label denoising strategy.

Denoising Pseudo Label

Audio-Visual Segmentation with Semantics

1 code implementation30 Jan 2023 Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation Semantic Segmentation +1

Global Temporal Difference Network for Action Recognition

no code implementations TMM 2022 Zhao Xie, Jiansong Chen, Kewei Wu, Dan Guo, Richang Hong

In the global aggregation module, the global prior knowledge is learned by aggregating the visual feature sequence of video into a global vector.

Action Recognition

Contrastive Positive Sample Propagation along the Audio-Visual Event Line

1 code implementation18 Nov 2022 Jinxing Zhou, Dan Guo, Meng Wang

Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs).

Contrastive Learning Representation Learning

MEGCF: Multimodal Entity Graph Collaborative Filtering for Personalized Recommendation

1 code implementation14 Oct 2022 Kang Liu, Feng Xue, Dan Guo, Le Wu, Shujie Li, Richang Hong

This paper aims at solving the mismatch problem between MFE and UIM, so as to generate high-quality embedding representations and better model multimodal user preferences.

Collaborative Filtering Image Classification

Joint Multi-grained Popularity-aware Graph Convolution Collaborative Filtering for Recommendation

1 code implementation10 Oct 2022 Kang Liu, Feng Xue, Xiangnan He, Dan Guo, Richang Hong

In this work, we propose to model multi-grained popularity features and jointly learn them together with high-order connectivity, to match the differentiation of user preferences exhibited in popularity features.

Collaborative Filtering Recommendation Systems

Emotion Separation and Recognition from a Facial Expression by Generating the Poker Face with Vision Transformers

no code implementations22 Jul 2022 Jia Li, Jiantao Nie, Dan Guo, Richang Hong, Meng Wang

PF-ViT aims to separate and recognize the disturbance-agnostic emotion from a static facial image via generating its corresponding poker face, without the need for paired images.

Disentanglement Face Generation +2

Audio-Visual Segmentation

2 code implementations11 Jul 2022 Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong

To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.

Segmentation

SAPAG: A Self-Adaptive Privacy Attack From Gradients

no code implementations14 Sep 2020 Yijue Wang, Jieren Deng, Dan Guo, Chenghong Wang, Xianrui Meng, Hang Liu, Caiwen Ding, Sanguthevar Rajasekaran

Distributed learning such as federated learning or collaborative learning enables model training on decentralized data from users and only collects local gradients, where data is processed close to its sources for data privacy.

Federated Learning Reconstruction Attack

Recurrent Relational Memory Network for Unsupervised Image Captioning

no code implementations24 Jun 2020 Dan Guo, Yang Wang, Peipei Song, Meng Wang

Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models.

Computational Efficiency Image Captioning +2

Iterative Context-Aware Graph Inference for Visual Dialog

1 code implementation CVPR 2020 Dan Guo, Hui Wang, Hanwang Zhang, Zheng-Jun Zha, Meng Wang

Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts.

Graph Attention Graph Embedding +2

Cannot find the paper you are looking for? You can Submit a new open access paper.