no code implementations • 5 Dec 2024 • HUI ZHANG, Dexiang Hong, Tingwei Gao, Yitong Wang, Jie Shao, Xinglong Wu, Zuxuan Wu, Yu-Gang Jiang
To Inherit the advantages of MM-DiT, we use a separate set of network weights to process the layout, treating it as equally important as the image and text modalities.
1 code implementation • 26 Jul 2022 • Weidong Chen, Dexiang Hong, Yuankai Qi, Zhenjun Han, Shuhui Wang, Laiyun Qing, Qingming Huang, Guorong Li
To address this problem, we propose a multi-attention network which consists of dual-path dual-attention module and a query-based cross-modal Transformer module.
Ranked #5 on
Referring Expression Segmentation
on A2D Sentences
1 code implementation • 25 Jun 2022 • Dexiang Hong, Xiaoqi Ma, Xinyao Wang, CongCong Li, YuFei Wang, Longyin Wen
This report presents the algorithm used in the submission of Generic Event Boundary Detection (GEBD) Challenge at CVPR 2022.
no code implementations • 7 Jun 2022 • CongCong Li, Xinyao Wang, Dexiang Hong, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen
To capture temporal context information of each frame, we design the structure context transformer (SC-Transformer) by re-partitioning input frame sequence.
no code implementations • CVPR 2022 • CongCong Li, Xinyao Wang, Longyin Wen, Dexiang Hong, Tiejian Luo, Libo Zhang
Generic event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
1 code implementation • 1 Jul 2021 • Dexiang Hong, CongCong Li, Longyin Wen, Xinyao Wang, Libo Zhang
In this work, we design a Cascaded Temporal Attention Network (CASTANET) for GEBD, which is formed by three parts, the backbone network, the temporal attention module, and the classification module.
Ranked #1 on
Boundary Detection
on Kinetics-400