Search Results for author: Zhaofan Qiu

Found 40 papers, 17 papers with code

Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

no code implementations25 Mar 2024 Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

Technically, SATeCo freezes all the parameters of the pre-trained UNet and VAE, and only optimizes two deliberately-designed spatial feature adaptation (SFA) and temporal feature alignment (TFA) modules, in the decoder of UNet and VAE.

Denoising Image Super-Resolution +3

TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models

no code implementations25 Mar 2024 Zhongwei Zhang, Fuchen Long, Yingwei Pan, Zhaofan Qiu, Ting Yao, Yang Cao, Tao Mei

Next, TRIP executes a residual-like dual-path scheme for noise prediction: 1) a shortcut path that directly takes image noise prior as the reference noise of each frame to amplify the alignment between the first frame and subsequent frames; 2) a residual path that employs 3D-UNet over noised video and static image latent codes to enable inter-frame relational reasoning, thereby easing the learning of the residual noise for each frame.

Image to Video Generation Relational Reasoning +1

VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM

no code implementations2 Jan 2024 Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei

The diffusion model incorporates the reference images as the condition and alignment to strengthen the content consistency of multi-scene videos.

Descriptive Video Generation

CARIS: Context-Augmented Referring Image Segmentation

1 code implementation ACM MM 2023 Sun-Ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao

Technically, CARIS develops a context-aware mask decoder with sequential bidirectional cross-modal attention to integrate the linguistic features with visual context, which are then aligned with pixel-wise visual features.

Image Segmentation Segmentation +1

Selective Volume Mixup for Video Action Recognition

no code implementations18 Sep 2023 Yi Tan, Zhaofan Qiu, Yanbin Hao, Ting Yao, Xiangnan He, Tao Mei

In this paper, we propose a novel video augmentation strategy named Selective Volume Mixup (SV-Mix) to improve the generalization ability of deep models with limited training videos.

Action Recognition Image Augmentation +1

Learning Orthogonal Prototypes for Generalized Few-Shot Semantic Segmentation

1 code implementation CVPR 2023 Sun-Ao Liu, Yiheng Zhang, Zhaofan Qiu, Hongtao Xie, Yongdong Zhang, Ting Yao

POP builds a set of orthogonal prototypes, each of which represents a semantic class, and makes the prediction for each class separately based on the features projected onto its prototype.

Generalized Few-Shot Semantic Segmentation

Learning Neural Implicit Surfaces with Object-Aware Radiance Fields

no code implementations ICCV 2023 Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei

Then, we build the geometric correspondence between 2D planes and 3D meshes by rasterization, and project the estimated object regions into 3D explicit object surfaces by aggregating the object information across multiple views.

3D Object Reconstruction Object

PointClustering: Unsupervised Point Cloud Pre-Training Using Transformation Invariance in Clustering

1 code implementation CVPR 2023 Fuchen Long, Ting Yao, Zhaofan Qiu, Lusong Li, Tao Mei

Feature invariance under different data transformations, i. e., transformation invariance, can be regarded as a type of self-supervision for representation learning.

Clustering Deep Clustering +4

3D Human Pose Estimation With Spatio-Temporal Criss-Cross Attention

1 code implementation CVPR 2023 Zhenhua Tang, Zhaofan Qiu, Yanbin Hao, Richang Hong, Ting Yao

On this basis, we devise STCFormer by stacking multiple STC blocks and further integrate a new Structure-enhanced Positional Embedding (SPE) into STCFormer to take the structure of human body into consideration.

3D Human Pose Estimation

Dynamic Temporal Filtering in Video Models

1 code implementation15 Nov 2022 Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Chong-Wah Ngo, Tao Mei

The pre-determined kernel size severely limits the temporal receptive fields and the fixed weights treat each spatial location across frames equally, resulting in sub-optimal solution for long-range temporal modeling in natural scenes.

Explaining Cross-Domain Recognition with Interpretable Deep Classifier

no code implementations15 Nov 2022 Yiheng Zhang, Ting Yao, Zhaofan Qiu, Tao Mei

In this paper, we ask the question: how much each sample in source domain contributes to the network's prediction on the samples from target domain.

Unsupervised Domain Adaptation

SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement

1 code implementation15 Nov 2022 Zhaofan Qiu, Yehao Li, Yu Wang, Yingwei Pan, Ting Yao, Tao Mei

In this paper, we propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net.

Lightweight and Progressively-Scalable Networks for Semantic Segmentation

1 code implementation27 Jul 2022 Yiheng Zhang, Ting Yao, Zhaofan Qiu, Tao Mei

In this paper, we thoroughly analyze the design of convolutional blocks (the type of convolutions and the number of channels in convolutions), and the ways of interactions across multiple scales, all from lightweight standpoint for semantic segmentation.

Segmentation Semantic Segmentation

Bi-Calibration Networks for Weakly-Supervised Video Representation Learning

1 code implementation21 Jun 2022 Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei

The video-to-text/video-to-query projections over text prototypes/query vocabulary then start the text-to-query or query-to-text calibration to estimate the amendment to query or text.

Representation Learning

Stand-Alone Inter-Frame Attention in Video Models

1 code implementation CVPR 2022 Fuchen Long, Zhaofan Qiu, Yingwei Pan, Ting Yao, Jiebo Luo, Tao Mei

In this paper, we present a new recipe of inter-frame attention block, namely Stand-alone Inter-Frame Attention (SIFA), that novelly delves into the deformation across frames to estimate local self-attention on each spatial location.

Action Classification Action Recognition +1

MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing

no code implementations CVPR 2022 Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

By deriving the novel grouped time mixing (GTM) operations, we equip the basic token-mixing MLP with the ability of temporal modeling.

3D Architecture Action Classification +2

Silver-Bullet-3D at ManiSkill 2021: Learning-from-Demonstrations and Heuristic Rule-based Methods for Object Manipulation

1 code implementation13 Jun 2022 Yingwei Pan, Yehao Li, Yiheng Zhang, Qi Cai, Fuchen Long, Zhaofan Qiu, Ting Yao, Tao Mei

This paper presents an overview and comparative analysis of our systems designed for the following two tracks in SAPIEN ManiSkill Challenge 2021: No Interaction Track: The No Interaction track targets for learning policies from pre-collected demonstration trajectories.

Imitation Learning

Condensing a Sequence to One Informative Frame for Video Recognition

no code implementations ICCV 2021 Zhaofan Qiu, Ting Yao, Yan Shu, Chong-Wah Ngo, Tao Mei

This paper studies a two-step alternative that first condenses the video sequence to an informative "frame" and then exploits off-the-shelf image recognition system on the synthetic frame.

Motion Estimation valid +1

Motion-Focused Contrastive Learning of Video Representations

1 code implementation ICCV 2021 Rui Li, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

To this end, we compose a duet of exploiting the motion for data augmentation and feature learning in the regime of contrastive learning.

Contrastive Learning Data Augmentation +2

Representing Videos as Discriminative Sub-graphs for Action Recognition

no code implementations CVPR 2021 Dong Li, Zhaofan Qiu, Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition.

Action Recognition Graph Learning +1

Optimization Planning for 3D ConvNets

1 code implementation11 Jan 2022 Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Tao Mei

In this paper, we decompose the path into a series of training "states" and specify the hyper-parameters, e. g., learning rate and the length of input clips, in each state.

Video Recognition

Learning to Localize Actions from Moments

1 code implementation ECCV 2020 Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, Tao Mei

In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes.

Action Localization Transfer Learning

SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning

3 code implementations3 Aug 2020 Ting Yao, Yiheng Zhang, Zhaofan Qiu, Yingwei Pan, Tao Mei

In this paper, we compose a trilogy of exploring the basic and generic supervision in the sequence from spatial, spatiotemporal and sequential perspectives.

Action Recognition Contrastive Learning +3

Transferring and Regularizing Prediction for Semantic Segmentation

no code implementations CVPR 2020 Yiheng Zhang, Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Dong Liu, Tao Mei

In the view of extremely expensive expert labeling, recent research has shown that the models trained on photo-realistic synthetic data (e. g., computer games) with computer-generated annotations can be adapted to real images.

Domain Adaptation Segmentation +1

Long Short-Term Relation Networks for Video Action Detection

no code implementations31 Mar 2020 Dong Li, Ting Yao, Zhaofan Qiu, Houqiang Li, Tao Mei

It has been well recognized that modeling human-object or object-object relations would be helpful for detection task.

Action Detection Object +2

Scheduled Differentiable Architecture Search for Visual Recognition

no code implementations23 Sep 2019 Zhaofan Qiu, Ting Yao, Yiheng Zhang, Yongdong Zhang, Tao Mei

Moreover, we enlarge the search space of SDAS particularly for video recognition by devising several unique operations to encode spatio-temporal dynamics and demonstrate the impact in affecting the architecture search of SDAS.

Video Recognition

vireoJD-MM at Activity Detection in Extended Videos

no code implementations20 Jun 2019 Fuchen Long, Qi Cai, Zhaofan Qiu, Zhijian Hou, Yingwei Pan, Ting Yao, Chong-Wah Ngo

This notebook paper presents an overview and comparative analysis of our system designed for activity detection in extended videos (ActEV-PC) in ActivityNet Challenge 2019.

Action Detection Action Localization +1

Trimmed Action Recognition, Dense-Captioning Events in Videos, and Spatio-temporal Action Localization with Focus on ActivityNet Challenge 2019

no code implementations14 Jun 2019 Zhaofan Qiu, Dong Li, Yehao Li, Qi Cai, Yingwei Pan, Ting Yao

This notebook paper presents an overview and comparative analysis of our systems designed for the following three tasks in ActivityNet Challenge 2019: trimmed action recognition, dense-captioning events in videos, and spatio-temporal action localization.

Action Recognition Dense Captioning +2

Learning Spatio-Temporal Representation with Local and Global Diffusion

no code implementations CVPR 2019 Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei

Diffusions effectively interact two aspects of information, i. e., localized and holistic, for more powerful way of representation learning.

Action Classification Action Detection +5

Recurrent Tubelet Proposal and Recognition Networks for Action Detection

no code implementations ECCV 2018 Dong Li, Zhaofan Qiu, Qi Dai, Ting Yao, Tao Mei

The RTP initializes action proposals of the start frame through a Region Proposal Network and then estimates the movements of proposals in next frame in a recurrent manner.

Action Detection Region Proposal

Fully Convolutional Adaptation Networks for Semantic Segmentation

no code implementations CVPR 2018 Yiheng Zhang, Zhaofan Qiu, Ting Yao, Dong Liu, Tao Mei

The recent advances in deep neural networks have convincingly demonstrated high capability in learning vision models on large datasets.

Domain Adaptation Semantic Segmentation

To Create What You Tell: Generating Videos from Captions

no code implementations23 Apr 2018 Yingwei Pan, Zhaofan Qiu, Ting Yao, Houqiang Li, Tao Mei

In this paper, we present a novel Temporal GANs conditioning on Captions, namely TGANs-C, in which the input to the generator network is a concatenation of a latent noise vector and caption embedding, and then is transformed into a frame sequence with 3D spatio-temporal convolutions.

Philosophy

Deep Semantic Hashing with Generative Adversarial Networks

no code implementations23 Apr 2018 Zhaofan Qiu, Yingwei Pan, Ting Yao, Tao Mei

Specifically, a novel deep semantic hashing with GANs (DSH-GANs) is presented, which mainly consists of four components: a deep convolution neural networks (CNN) for learning image representations, an adversary stream to distinguish synthetic images from real ones, a hash stream for encoding image representations to hash codes and a classification stream.

General Classification Image Retrieval +1

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

2 code implementations ICCV 2017 Zhaofan Qiu, Ting Yao, Tao Mei

In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.

Action Recognition Philosophy +2

Deep Quantization: Encoding Convolutional Activations with Deep Generative Model

no code implementations CVPR 2017 Zhaofan Qiu, Ting Yao, Tao Mei

In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative model, by training them in an end-to-end manner.

Action Recognition Fine-Grained Image Classification +3

Boosting Image Captioning with Attributes

no code implementations ICCV 2017 Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, Tao Mei

Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing.

Image Captioning

Cannot find the paper you are looking for? You can Submit a new open access paper.