Search Results for author: Daizong Liu

Found 39 papers, 11 papers with code

Learning to Focus on the Foreground for Temporal Sentence Grounding

no code implementations COLING 2022 Daizong Liu, Wei Hu

Then, we develop a self-supervised coarse-to-fine paradigm to learn to locate the most query-relevant patch in each frame and aggregate them among the video for final grounding.

Sentence Temporal Sentence Grounding

A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

1 code implementation10 Jul 2024 Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the complexity of multi-modal processing.

Data Poisoning

A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions

1 code implementation9 Jun 2024 Daizong Liu, Yang Liu, Wencan Huang, Wei Hu

In this survey, we attempt to provide a comprehensive overview of the T-3DVG progress, including its fundamental elements, recent research advances, and future research directions.

3D visual grounding

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding

1 code implementation6 Nov 2023 Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, Meng Wang

In this manner, our framework is able to learn the unified representations of uni-modal or multi-modal skeleton input, which is flexible to different kinds of modality input for robust action understanding in practical cases.

Action Understanding Representation Learning +1

Dense Object Grounding in 3D Scenes

no code implementations5 Sep 2023 Wencan Huang, Daizong Liu, Wei Hu

Localizing objects in 3D scenes according to the semantics of a given natural language is a fundamental yet important task in the field of multimedia understanding, which benefits various real-world applications such as robotics and autonomous driving.

Autonomous Driving Decoder +2

3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

no code implementations ICCV 2023 Yunbo Tao, Daizong Liu, Pan Zhou, Yulai Xie, Wei Du, Wei Hu

With the maturity of depth sensors, the vulnerability of 3D point cloud models has received increasing attention in various applications such as autonomous driving and robot navigation.

Autonomous Driving Robot Navigation

From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval

1 code implementation17 May 2023 Jianfeng Dong, Xiaoman Peng, Zhe Ma, Daizong Liu, Xiaoye Qu, Xun Yang, Jixiang Zhu, Baolong Liu

As the attribute-specific similarity typically corresponds to the specific subtle regions of images, we propose a Region-to-Patch Framework (RPF) that consists of a region-aware branch and a patch-aware branch to extract fine-grained attribute-related visual features for precise retrieval in a coarse-to-fine manner.

Attribute Contrastive Learning +2

Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection

1 code implementation CVPR 2023 Qianjiang Hu, Daizong Liu, Wei Hu

Recently, few works attempt to tackle the domain gap in objects, but still fail to adapt to the gap of varying beam-densities between two domains, which is critical to mitigate the characteristic differences of the LiDAR collectors.

3D Object Detection Attribute +4

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

no code implementations CVPR 2023 Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan

To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding.

Sentence Temporal Sentence Grounding

Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos

no code implementations2 Mar 2023 Daizong Liu, Pan Zhou

Temporal sentence localization in videos (TSLV) aims to retrieve the most interested segment in an untrimmed video according to a given sentence query.

Representation Learning Sentence +1

Tracking Objects and Activities with Attention for Temporal Sentence Grounding

no code implementations21 Feb 2023 Zeyu Xiong, Daizong Liu, Pan Zhou, Jiahao Zhu

Temporal sentence grounding (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed video. Most existing methods extract frame-grained features or object-grained features by 3D ConvNet or detection network under a conventional TSG framework, failing to capture the subtle differences between frames or to model the spatio-temporal behavior of core persons/objects.

Sentence Temporal Sentence Grounding

Hypotheses Tree Building for One-Shot Temporal Sentence Localization

no code implementations5 Jan 2023 Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng

Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.

Sentence

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

no code implementations2 Jan 2023 Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong

All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.

Sentence Temporal Sentence Grounding

Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble

1 code implementation13 Dec 2022 Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou

Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples.

named-entity-recognition Named Entity Recognition +1

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

no code implementations23 Sep 2022 Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu

In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop.

cross-modal alignment Information Retrieval +2

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

no code implementations31 Aug 2022 Xiang Fang, Daizong Liu, Pan Zhou, Zichuan Xu, Ruixuan Li

To address this issue, in this paper, we propose a novel Hierarchical Local-Global Transformer (HLGT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities for learning more fine-grained multi-modal representations.

Sentence Temporal Sentence Grounding

Point Cloud Attacks in Graph Spectral Domain: When 3D Geometry Meets Graph Signal Processing

no code implementations27 Jul 2022 Daizong Liu, Wei Hu, Xin Li

Instead, we propose point cloud attacks from a new perspective -- the graph spectral domain attack, aiming to perturb graph transform coefficients in the spectral domain that corresponds to varying certain geometric structure.

3D geometry

Skimming, Locating, then Perusing: A Human-Like Framework for Natural Language Video Localization

no code implementations27 Jul 2022 Daizong Liu, Wei Hu

SLP consists of a Skimming-and-Locating (SL) module and a Bi-directional Perusing (BP) module.

Reducing the Vision and Language Bias for Temporal Sentence Grounding

no code implementations27 Jul 2022 Daizong Liu, Xiaoye Qu, Wei Hu

In this paper, we study the above issue of selection biases and accordingly propose a Debiasing-TSG (D-TSG) model to filter and remove the negative biases in both vision and language modalities for enhancing the model generalization ability.

Information Retrieval Multimodal Reasoning +3

Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding

no code implementations2 Jul 2022 Zeyu Xiong, Daizong Liu, Pan Zhou

Spatial-Temporal Video Grounding (STVG) is a challenging task which aims to localize the spatio-temporal tube of the interested object semantically according to a natural language query.

Spatio-Temporal Video Grounding Video Grounding

Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding

no code implementations8 Mar 2022 Shentong Mo, Daizong Liu, Wei Hu

Secondly, since some predicted frames (i. e., boundary frames) are relatively coarse and exhibit similar appearance to their adjacent frames, we propose a coarse-to-fine contrastive learning paradigm to learn more discriminative frame-wise representations for distinguishing the false positive frames.

Contrastive Learning Sentence +2

Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding

no code implementations6 Mar 2022 Daizong Liu, Xiang Fang, Wei Hu, Pan Zhou

Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.

Object object-detection +4

Exploring the Devil in Graph Spectral Domain for 3D Point Cloud Attacks

1 code implementation15 Feb 2022 Qianjiang Hu, Daizong Liu, Wei Hu

Instead, we propose point cloud attacks from a new perspective -- Graph Spectral Domain Attack (GSDA), aiming to perturb transform coefficients in the graph spectral domain that corresponds to varying certain geometric structure.

Autonomous Driving Denoising +1

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

no code implementations14 Jan 2022 Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou

Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.

Clustering Sentence +1

Exploring Motion and Appearance Information for Temporal Sentence Grounding

no code implementations3 Jan 2022 Daizong Liu, Xiaoye Qu, Pan Zhou, Yang Liu

Then, we develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations, respectively.

Object object-detection +3

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

no code implementations3 Jan 2022 Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou

To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.

Sentence Temporal Sentence Grounding

Imperceptible Transfer Attack and Defense on 3D Point Cloud Classification

no code implementations22 Nov 2021 Daizong Liu, Wei Hu

Although many efforts have been made into attack and defense on the 2D image domain in recent years, few methods explore the vulnerability of 3D models.

3D Point Cloud Classification Classification +1

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

no code implementations EMNLP 2021 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.

Sentence

Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding

no code implementations EMNLP 2021 Daizong Liu, Xiaoye Qu, Pan Zhou

A key solution to temporal sentence grounding (TSG) exists in how to learn effective alignment between vision and language features extracted from an untrimmed video and a sentence description.

Sentence Temporal Sentence Grounding

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

1 code implementation CVPR 2021 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Sentence Temporal Sentence Grounding

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

no code implementations10 Dec 2020 Daizong Liu, Shuangjie Xu, Xiao-Yang Liu, Zichuan Xu, Wei Wei, Pan Zhou

To capture temporal information from previous frames, we use a memory network to refine the mask of current frame by retrieving historic masks in a temporal graph.

Graph Neural Network Object +3

F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

no code implementations4 Dec 2020 Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou

Specifically, our proposed network consists of three main parts: Siamese Encoder Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module.

Semantic Segmentation Unsupervised Video Object Segmentation +1

Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network

no code implementations COLING 2020 Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.

Sentence

Video-based Facial Expression Recognition using Graph Convolutional Networks

no code implementations26 Oct 2020 Daizong Liu, Hongting Zhang, Pan Zhou

In terms of video based FER task, it is sensible to capture the dynamic expression variation among the frames to recognize facial expression.

Facial Expression Recognition Facial Expression Recognition (FER)

Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

1 code implementation4 Aug 2020 Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu

To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.

Graph Attention Sentence

Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels

no code implementations26 Feb 2020 Daizong Liu, Shuangjie Xu, Pan Zhou, Kun He, Wei Wei, Zichuan Xu

In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy.

Multi-Label Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.