no code implementations • 25 Mar 2024 • Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei
Next, TRIP executes a residual-like dual-path scheme for noise prediction: 1) a shortcut path that directly takes image noise prior as the reference noise of each frame to amplify the alignment between the first frame and subsequent frames; 2) a residual path that employs 3D-UNet over noised video and static image latent codes to enable inter-frame relational reasoning, thereby easing the learning of the residual noise for each frame.
no code implementations • 25 Mar 2024 • Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei
Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE.
no code implementations • 2 Jan 2024 • Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei
The diffusion model incorporates the reference images as the condition and alignment to strengthen the content consistency of multi-scene videos.
1 code implementation • CVPR 2023 • Fuchen Long, Ting Yao, Zhaofan Qiu, Lusong Li, Tao Mei
Feature invariance under different data transformations, i. e., transformation invariance, can be regarded as a type of self-supervision for representation learning.
1 code implementation • CVPR 2023 • Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei
Point cloud completion aims to recover the completed 3D shape of an object from its partial observation.
1 code implementation • 15 Nov 2022 • Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Chong-Wah Ngo, Tao Mei
The pre-determined kernel size severely limits the temporal receptive fields and the fixed weights treat each spatial location across frames equally, resulting in sub-optimal solution for long-range temporal modeling in natural scenes.
1 code implementation • 21 Jun 2022 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
The video-to-text/video-to-query projections over text prototypes/query vocabulary then start the text-to-query or query-to-text calibration to estimate the amendment to query or text.
1 code implementation • CVPR 2022 • Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Jiebo Luo, Tao Mei
In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location.
Ranked #13 on Action Recognition on Something-Something V1
1 code implementation • 13 Jun 2022 • Yingwei Pan, Yehao Li, Yiheng Zhang, Qi Cai, Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei
This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories.
1 code implementation • ECCV 2020 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.
1 code implementation • CVPR 2019 • Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei
Temporally localizing actions in a video is a fundamental challenge in video understanding.
no code implementations • 20 Jun 2019 • Fuchen Long, Qi Cai, Zhaofan Qiu, Zhijian Hou, Yingwei Pan, Ting Yao, Chong-Wah Ngo
This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019.