1 code implementation • 25 Jul 2024 • Shuming Liu, Lin Sui, Chen-Lin Zhang, Fangzhou Mu, Chen Zhao, Bernard Ghanem
As a fundamental task in long-form video understanding, temporal action detection (TAD) aims to capture inherent temporal relations in untrimmed videos and identify candidate actions with precise boundaries.
1 code implementation • 17 Jul 2024 • Carlos Hinojosa, Shuming Liu, Bernard Ghanem
However, these strategies depend on the input data thus commonly increasing the model complexity and requiring additional calculations to generate the mask patterns.
Ranked #1 on
Instance Segmentation
on COCO
1 code implementation • 8 Jan 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem
We use two coefficients on either type of residual connections respectively, and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.
no code implementations • CVPR 2024 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Guocheng Qian, Fatimah Zohra, Abdulmohsen Alghannam, Jitendra Malik, Bernard Ghanem
We use two coefficients on either type of residual connections respectively and introduce a dynamic training strategy that seamlessly transitions the pretrained model to a reversible network with much higher numerical precision.
2 code implementations • CVPR 2024 • Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem
In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1, 536 frames, leading to significant detection performance.
Ranked #1 on
Temporal Action Localization
on EPIC-KITCHENS-100
no code implementations • 26 May 2023 • Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem, Jürgen Schmidhuber
What should be the social structure of an NLSOM?
1 code implementation • 6 Apr 2023 • Mengmeng Xu, Mattia Soldan, Jialin Gao, Shuming Liu, Juan-Manuel Pérez-Rúa, Bernard Ghanem
To alleviate the boundary ambiguity, we propose to study the video activity localization problem from a denoising perspective.
Ranked #1 on
Video Grounding
on MAD
no code implementations • 3 Jan 2023 • Hasan Abed Al Kader Hammoud, Shuming Liu, Mohammed Alkhrashi, Fahad Albalawi, Bernard Ghanem
Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain.
no code implementations • CVPR 2023 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem
Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.
1 code implementation • 25 Nov 2022 • Chen Zhao, Shuming Liu, Karttikeya Mangalam, Bernard Ghanem
Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content.
1 code implementation • 14 May 2022 • Shuming Liu, Mengmeng Xu, Chen Zhao, Xu Zhao, Bernard Ghanem
We propose to sequentially forward the snippet frame through the video encoder, and backward only a small necessary portion of gradients to update the encoder.
no code implementations • 13 Apr 2020 • Ziqing Ma, Shuming Liu, Guancheng Guo, Xipeng Yu
Specifically, a hybrid spatial attention mechanism that employs inputs along temporal and spatial axes is proposed.
no code implementations • 29 Jul 2019 • Haisheng Su, Xu Zhao, Shuming Liu
This technical report presents an overview of our solution used in the submission to ActivityNet Challenge 2019 Task 1 (\textbf{temporal action proposal generation}) and Task 2 (\textbf{temporal action localization/detection}).