Search Results for author: Daizong Liu

Found 37 papers, 9 papers with code

Learning to Focus on the Foreground for Temporal Sentence Grounding

no code implementations • COLING 2022 • Daizong Liu, Wei Hu

Then, we develop a self-supervised coarse-to-fine paradigm to learn to locate the most query-relevant patch in each frame and aggregate them among the video for final grounding.

Sentence Temporal Sentence Grounding

Paper
Add Code

Unified Multi-modal Unsupervised Representation Learning for Skeleton-based Action Understanding

1 code implementation • 6 Nov 2023 • Shengkai Sun, Daizong Liu, Jianfeng Dong, Xiaoye Qu, Junyu Gao, Xun Yang, Xun Wang, Meng Wang

In this manner, our framework is able to learn the unified representations of uni-modal or multi-modal skeleton input, which is flexible to different kinds of modality input for robust action understanding in practical cases.

Action Understanding Representation Learning +1

Paper
Code

Dense Object Grounding in 3D Scenes

no code implementations • 5 Sep 2023 • Wencan Huang, Daizong Liu, Wei Hu

Localizing objects in 3D scenes according to the semantics of a given natural language is a fundamental yet important task in the field of multimedia understanding, which benefits various real-world applications such as robotics and autonomous driving.

Autonomous Driving Object +1

Paper
Add Code

3DHacker: Spectrum-based Decision Boundary Generation for Hard-label 3D Point Cloud Attack

no code implementations • ICCV 2023 • Yunbo Tao, Daizong Liu, Pan Zhou, Yulai Xie, Wei Du, Wei Hu

With the maturity of depth sensors, the vulnerability of 3D point cloud models has received increasing attention in various applications such as autonomous driving and robot navigation.

Autonomous Driving Robot Navigation

Paper
Add Code

From Region to Patch: Attribute-Aware Foreground-Background Contrastive Learning for Fine-Grained Fashion Retrieval

1 code implementation • 17 May 2023 • Jianfeng Dong, Xiaoman Peng, Zhe Ma, Daizong Liu, Xiaoye Qu, Xun Yang, Jixiang Zhu, Baolong Liu

As the attribute-specific similarity typically corresponds to the specific subtle regions of images, we propose a Region-to-Patch Framework (RPF) that consists of a region-aware branch and a patch-aware branch to extract fine-grained attribute-related visual features for precise retrieval in a coarse-to-fine manner.

Attribute Contrastive Learning +2

Paper
Code

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

no code implementations • 6 May 2023 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Zichuan Xu, Haozhao Wang, Xing Di, Weining Lu, Yu Cheng

This paper addresses the temporal sentence grounding (TSG).

Sentence Temporal Sentence Grounding

Paper
Add Code

Density-Insensitive Unsupervised Domain Adaption on 3D Object Detection

1 code implementation • CVPR 2023 • Qianjiang Hu, Daizong Liu, Wei Hu

Recently, few works attempt to tackle the domain gap in objects, but still fail to adapt to the gap of varying beam-densities between two domains, which is critical to mitigate the characteristic differences of the LiDAR collectors.

3D Object Detection Attribute +4

Paper
Code

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

no code implementations • CVPR 2023 • Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan

To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding.

Sentence Temporal Sentence Grounding

Paper
Add Code

Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos

no code implementations • 2 Mar 2023 • Daizong Liu, Pan Zhou

Temporal sentence localization in videos (TSLV) aims to retrieve the most interested segment in an untrimmed video according to a given sentence query.

Representation Learning Sentence +1

Paper
Add Code

Tracking Objects and Activities with Attention for Temporal Sentence Grounding

no code implementations • 21 Feb 2023 • Zeyu Xiong, Daizong Liu, Pan Zhou, Jiahao Zhu

Temporal sentence grounding (TSG) aims to localize the temporal segment which is semantically aligned with a natural language query in an untrimmed video. Most existing methods extract frame-grained features or object-grained features by 3D ConvNet or detection network under a conventional TSG framework, failing to capture the subtle differences between frames or to model the spatio-temporal behavior of core persons/objects.

Sentence Temporal Sentence Grounding

Paper
Add Code

Hypotheses Tree Building for One-Shot Temporal Sentence Localization

no code implementations • 5 Jan 2023 • Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng

Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.

Sentence

Paper
Add Code

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

no code implementations • 2 Jan 2023 • Jiahao Zhu, Daizong Liu, Pan Zhou, Xing Di, Yu Cheng, Song Yang, Wenzheng Xu, Zichuan Xu, Yao Wan, Lichao Sun, Zeyu Xiong

All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning.

Sentence Temporal Sentence Grounding

Paper
Add Code

Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrieval

1 code implementation • ICCV 2023 • Jianfeng Dong, Minsong Zhang, Zheng Zhang, Xianke Chen, Daizong Liu, Xiaoye Qu, Xun Wang, Baolong Liu

During the knowledge distillation, an inheritance student branch is devised to absorb the knowledge from the teacher model.

Knowledge Distillation Language Modelling +3

Paper
Code

Distantly-Supervised Named Entity Recognition with Adaptive Teacher Learning and Fine-grained Student Ensemble

1 code implementation • 13 Dec 2022 • Xiaoye Qu, Jun Zeng, Daizong Liu, Zhefeng Wang, Baoxing Huai, Pan Zhou

Distantly-Supervised Named Entity Recognition (DS-NER) effectively alleviates the data scarcity problem in NER by automatically generating training samples.

named-entity-recognition Named Entity Recognition +1

Paper
Code

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

no code implementations • 23 Sep 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu

In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop.

Information Retrieval Moment Retrieval +1

Paper
Add Code

Hierarchical Local-Global Transformer for Temporal Sentence Grounding

no code implementations • 31 Aug 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Zichuan Xu, Ruixuan Li

To address this issue, in this paper, we propose a novel Hierarchical Local-Global Transformer (HLGT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities for learning more fine-grained multi-modal representations.

Sentence Temporal Sentence Grounding

Paper
Add Code

Point Cloud Attacks in Graph Spectral Domain: When 3D Geometry Meets Graph Signal Processing

no code implementations • 27 Jul 2022 • Daizong Liu, Wei Hu, Xin Li

Instead, we propose point cloud attacks from a new perspective -- the graph spectral domain attack, aiming to perturb graph transform coefficients in the spectral domain that corresponds to varying certain geometric structure.

Paper
Add Code

Reducing the Vision and Language Bias for Temporal Sentence Grounding

no code implementations • 27 Jul 2022 • Daizong Liu, Xiaoye Qu, Wei Hu

In this paper, we study the above issue of selection biases and accordingly propose a Debiasing-TSG (D-TSG) model to filter and remove the negative biases in both vision and language modalities for enhancing the model generalization ability.

Information Retrieval Multimodal Reasoning +3

Paper
Add Code

Skimming, Locating, then Perusing: A Human-Like Framework for Natural Language Video Localization

no code implementations • 27 Jul 2022 • Daizong Liu, Wei Hu

SLP consists of a Skimming-and-Locating (SL) module and a Bi-directional Perusing (BP) module.

Paper
Add Code

Gaussian Kernel-based Cross Modal Network for Spatio-Temporal Video Grounding

no code implementations • 2 Jul 2022 • Zeyu Xiong, Daizong Liu, Pan Zhou

Spatial-Temporal Video Grounding (STVG) is a challenging task which aims to localize the spatio-temporal tube of the interested object semantically according to a natural language query.

Spatio-Temporal Video Grounding Video Grounding

Paper
Add Code

Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding

no code implementations • 8 Mar 2022 • Shentong Mo, Daizong Liu, Wei Hu

Secondly, since some predicted frames (i. e., boundary frames) are relatively coarse and exhibit similar appearance to their adjacent frames, we propose a coarse-to-fine contrastive learning paradigm to learn more discriminative frame-wise representations for distinguishing the false positive frames.

Contrastive Learning Sentence +2

Paper
Add Code

Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding

no code implementations • 6 Mar 2022 • Daizong Liu, Xiang Fang, Wei Hu, Pan Zhou

Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.

Object object-detection +4

Paper
Add Code

Exploring the Devil in Graph Spectral Domain for 3D Point Cloud Attacks

1 code implementation • 15 Feb 2022 • Qianjiang Hu, Daizong Liu, Wei Hu

Instead, we propose point cloud attacks from a new perspective -- Graph Spectral Domain Attack (GSDA), aiming to perturb transform coefficients in the graph spectral domain that corresponds to varying certain geometric structure.

Autonomous Driving Denoising +1

Paper
Code

Unsupervised Temporal Video Grounding with Deep Semantic Clustering

no code implementations • 14 Jan 2022 • Daizong Liu, Xiaoye Qu, Yinzhen Wang, Xing Di, Kai Zou, Yu Cheng, Zichuan Xu, Pan Zhou

Temporal video grounding (TVG) aims to localize a target segment in a video according to a given sentence query.

Clustering Sentence +1

Paper
Add Code

Memory-Guided Semantic Learning Network for Temporal Sentence Grounding

no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Xing Di, Yu Cheng, Zichuan Xu, Pan Zhou

To tackle this issue, we propose a memory-augmented network, called Memory-Guided Semantic Learning Network (MGSL-Net), that learns and memorizes the rarely appeared content in TSG tasks.

Sentence Temporal Sentence Grounding

Paper
Add Code

Exploring Motion and Appearance Information for Temporal Sentence Grounding

no code implementations • 3 Jan 2022 • Daizong Liu, Xiaoye Qu, Pan Zhou, Yang Liu

Then, we develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations, respectively.

Object object-detection +3

Paper
Add Code

Imperceptible Transfer Attack and Defense on 3D Point Cloud Classification

no code implementations • 22 Nov 2021 • Daizong Liu, Wei Hu

Although many efforts have been made into attack and defense on the 2D image domain in recent years, few methods explore the vulnerability of 3D models.

3D Point Cloud Classification Classification +1

Paper
Add Code

Adaptive Proposal Generation Network for Temporal Sentence Localization in Videos

no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

However, the performance of bottom-up model is inferior to the top-down counterpart as it fails to exploit the segment-level interaction.

Sentence

Paper
Add Code

Progressively Guide to Attend: An Iterative Alignment Framework for Temporal Sentence Grounding

no code implementations • EMNLP 2021 • Daizong Liu, Xiaoye Qu, Pan Zhou

A key solution to temporal sentence grounding (TSG) exists in how to learn effective alignment between vision and language features extracted from an untrimmed video and a sentence description.

Sentence Temporal Sentence Grounding

Paper
Add Code

Context-aware Biaffine Localizing Network for Temporal Sentence Grounding

1 code implementation • CVPR 2021 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou, Yu Cheng, Wei Wei, Zichuan Xu, Yulai Xie

This paper addresses the problem of temporal sentence grounding (TSG), which aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.

Sentence Temporal Sentence Grounding

Paper
Code

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

no code implementations • 10 Dec 2020 • Daizong Liu, Shuangjie Xu, Xiao-Yang Liu, Zichuan Xu, Wei Wei, Pan Zhou

To capture temporal information from previous frames, we use a memory network to refine the mask of current frame by retrieving historic masks in a temporal graph.

Ranked #11 on Semi-Supervised Video Object Segmentation on DAVIS (no YouTube-VOS training)

Object One-shot visual object segmentation +2

Paper
Add Code

F2Net: Learning to Focus on the Foreground for Unsupervised Video Object Segmentation

no code implementations • 4 Dec 2020 • Daizong Liu, Dongdong Yu, Changhu Wang, Pan Zhou

Specifically, our proposed network consists of three main parts: Siamese Encoder Module, Center Guiding Appearance Diffusion Module, and Dynamic Information Fusion Module.

Ranked #6 on Unsupervised Video Object Segmentation on FBMS test

Semantic Segmentation Unsupervised Video Object Segmentation +1

Paper
Add Code

Reasoning Step-by-Step: Temporal Sentence Localization in Videos via Deep Rectification-Modulation Network

no code implementations • COLING 2020 • Daizong Liu, Xiaoye Qu, Jianfeng Dong, Pan Zhou

In this paper, we propose a novel deep rectification-modulation network (RMN), transforming this task into a multi-step reasoning process by repeating rectification and modulation.

Sentence

Paper
Add Code

Video-based Facial Expression Recognition using Graph Convolutional Networks

no code implementations • 26 Oct 2020 • Daizong Liu, Hongting Zhang, Pan Zhou

In terms of video based FER task, it is sensible to capture the dynamic expression variation among the frames to recognize facial expression.

Facial Expression Recognition Facial Expression Recognition (FER)

Paper
Add Code

Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization

1 code implementation • 4 Aug 2020 • Daizong Liu, Xiaoye Qu, Xiao-Yang Liu, Jianfeng Dong, Pan Zhou, Zichuan Xu

To this end, we propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.

Graph Attention Sentence

Paper
Code

Dynamic Graph Correlation Learning for Disease Diagnosis with Incomplete Labels

no code implementations • 26 Feb 2020 • Daizong Liu, Shuangjie Xu, Pan Zhou, Kun He, Wei Wei, Zichuan Xu

In this work, we propose a Disease Diagnosis Graph Convolutional Network (DD-GCN) that presents a novel view of investigating the inter-dependency among different diseases by using a dynamic learnable adjacency matrix in graph structure to improve the diagnosis accuracy.

Multi-Label Classification

Paper
Add Code

MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation

1 code implementation • CVPR 2019 • Shuangjie Xu, Daizong Liu, Linchao Bao, Wei Liu, Pan Zhou

Extensive experiments on challenging datasets demonstrate the effectiveness of the proposed method, especially in the case of object missing.

Ranked #40 on Semi-Supervised Video Object Segmentation on DAVIS 2017 (test-dev)

Decision Making Object +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.