Search Results for author: Peisen Zhao

Found 12 papers, 2 papers with code

UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

no code implementations12 Jan 2024 Bowen Shi, Peisen Zhao, Zichen Wang, Yuhang Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian, Xiaopeng Zhang

Vision-language foundation models, represented by Contrastive language-image pre-training (CLIP), have gained increasing attention for jointly understanding both vision and textual tasks.

Panoptic Segmentation Retrieval +1

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation

1 code implementation ICCV 2023 Shuangrui Ding, Peisen Zhao, Xiaopeng Zhang, Rui Qian, Hongkai Xiong, Qi Tian

Based on the STA score, we are able to progressively prune the tokens without introducing any additional parameters or requiring further re-training.

Video Recognition

Multi-modal Prompting for Low-Shot Temporal Action Localization

no code implementations21 Mar 2023 Chen Ju, Zeqian Li, Peisen Zhao, Ya zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie

In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.

Action Classification Temporal Action Localization

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

no code implementations20 Feb 2023 Chen Ju, Haicheng Wang, Jinxiang Liu, Chaofan Ma, Ya zhang, Peisen Zhao, Jianlong Chang, Qi Tian

Temporal sentence grounding aims to detect the event timestamps described by the natural language query from given untrimmed videos.

Sentence Temporal Sentence Grounding

Adaptive Mutual Supervision for Weakly-Supervised Temporal Action Localization

no code implementations6 Apr 2021 Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Xiaoyun Zhang, Qi Tian

To solve this issue, we introduce an adaptive mutual supervision framework (AMS) with two branches, where the base branch adopts CAS to localize the most discriminative action regions, while the supplementary branch localizes the less discriminative action regions through a novel adaptive sampler.

Weakly Supervised Action Localization Weakly-supervised Temporal Action Localization +1

Divide and Conquer for Single-Frame Temporal Action Localization

no code implementations ICCV 2021 Chen Ju, Peisen Zhao, Siheng Chen, Ya zhang, Yanfeng Wang, Qi Tian

Single-frame temporal action localization (STAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Temporal Action Localization

Point-Level Temporal Action Localization: Bridging Fully-supervised Proposals to Weakly-supervised Losses

no code implementations15 Dec 2020 Chen Ju, Peisen Zhao, Ya zhang, Yanfeng Wang, Qi Tian

Point-Level temporal action localization (PTAL) aims to localize actions in untrimmed videos with only one timestamp annotation for each action instance.

Weakly Supervised Action Localization

ESAD: End-to-end Deep Semi-supervised Anomaly Detection

no code implementations9 Dec 2020 Chaoqin Huang, Fei Ye, Peisen Zhao, Ya zhang, Yan-Feng Wang, Qi Tian

This paper explores semi-supervised anomaly detection, a more practical setting for anomaly detection where a small additional set of labeled samples are provided.

Ranked #25 on Anomaly Detection on One-class CIFAR-10 (using extra training data)

Medical Diagnosis Semi-supervised Anomaly Detection +1

Privileged Knowledge Distillation for Online Action Detection

no code implementations18 Nov 2020 Peisen Zhao, Lingxi Xie, Ya zhang, Yanfeng Wang, Qi Tian

Knowledge distillation is employed to transfer the privileged information from the offline teacher to the online student.

Knowledge Distillation Online Action Detection

Universal-to-Specific Framework for Complex Action Recognition

no code implementations13 Jul 2020 Peisen Zhao, Lingxi Xie, Ya zhang, Qi Tian

The U2S framework is composed of three subnetworks: a universal network, a category-specific network, and a mask network.

Action Recognition Decision Making

Bottom-Up Temporal Action Localization with Mutual Regularization

1 code implementation ECCV 2020 Peisen Zhao, Lingxi Xie, Chen Ju, Ya zhang, Yan-Feng Wang, Qi Tian

To alleviate this problem, we introduce two regularization terms to mutually regularize the learning procedure: the Intra-phase Consistency (IntraC) regularization is proposed to make the predictions verified inside each phase; and the Inter-phase Consistency (InterC) regularization is proposed to keep consistency between these phases.

Temporal Action Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.