Search Results for author: Chenyang Si

Found 24 papers, 10 papers with code

Scaling Supervised Local Learning with Augmented Auxiliary Networks

1 code implementation27 Feb 2024 Chenxiang Ma, Jibin Wu, Chenyang Si, Kay Chen Tan

AugLocal constructs each hidden layer's auxiliary network by uniformly selecting a small subset of layers from its subsequent network layers to enhance their synergy.

Image Classification

Towards Language-Driven Video Inpainting via Multimodal Large Language Models

no code implementations18 Jan 2024 Jianzong Wu, Xiangtai Li, Chenyang Si, Shangchen Zhou, Jingkang Yang, Jiangning Zhang, Yining Li, Kai Chen, Yunhai Tong, Ziwei Liu, Chen Change Loy

We introduce a new task -- language-driven video inpainting, which uses natural language instructions to guide the inpainting process.

Video Inpainting

FreeInit: Bridging Initialization Gap in Video Diffusion Models

1 code implementation12 Dec 2023 Tianxing Wu, Chenyang Si, Yuming Jiang, Ziqi Huang, Ziwei Liu

Though diffusion-based video generation has witnessed rapid progress, the inference results of existing models still exhibit unsatisfactory temporal consistency and unnatural dynamics.

Denoising Text-to-Video Generation +1

VideoBooth: Diffusion-based Video Generation with Image Prompts

no code implementations1 Dec 2023 Yuming Jiang, Tianxing Wu, Shuai Yang, Chenyang Si, Dahua Lin, Yu Qiao, Chen Change Loy, Ziwei Liu

In this paper, we study the task of video generation with image prompts, which provide more accurate and direct content control beyond the text prompts.

Video Generation

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation29 Nov 2023 Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

2 code implementations26 Sep 2023 Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Text-to-Video Generation Video Generation +1

FreeU: Free Lunch in Diffusion U-Net

1 code implementation20 Sep 2023 Chenyang Si, Ziqi Huang, Yuming Jiang, Ziwei Liu

In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves the generation quality on the fly.

Decoder Denoising +1

FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation

no code implementations ICCV 2023 Jingwen Guo, Hong Liu, Shitong Sun, Tianyu Guo, Min Zhang, Chenyang Si

Existing skeleton-based action recognition methods typically follow a centralized learning paradigm, which can pose privacy concerns when exposing human-related videos.

Action Recognition Federated Learning +3

Semantic Prompt for Few-Shot Image Recognition

1 code implementation CVPR 2023 Wentao Chen, Chenyang Si, Zhang Zhang, Liang Wang, Zilei Wang, Tieniu Tan

Instead of the naive exploitation of semantic information for remedying classifiers, we explore leveraging semantic information as prompts to tune the visual feature extraction network adaptively.

Few-Shot Learning

MetaFormer Baselines for Vision

7 code implementations24 Oct 2022 Weihao Yu, Chenyang Si, Pan Zhou, Mi Luo, Yichen Zhou, Jiashi Feng, Shuicheng Yan, Xinchao Wang

By simply applying depthwise separable convolutions as token mixer in the bottom stages and vanilla self-attention in the top stages, the resulting model CAFormer sets a new record on ImageNet-1K: it achieves an accuracy of 85. 5% at 224x224 resolution, under normal supervised training without external data or distillation.

Ranked #2 on Domain Generalization on ImageNet-C (using extra training data)

Domain Generalization Image Classification

Exploring Semantic Attributes from A Foundation Model for Federated Learning of Disjoint Label Spaces

no code implementations29 Aug 2022 Shitong Sun, Chenyang Si, Guile Wu, Shaogang Gong

To resolve this problem, federated learning has been introduced to transfer knowledge across multiple sources (clients) with non-shared data while optimising a globally generalised central model (server).

Attribute Federated Learning +3

Inception Transformer

3 code implementations25 May 2022 Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan

Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies that predominantly convey local information.

Image Classification

Mugs: A Multi-Granular Self-Supervised Learning Framework

1 code implementation27 Mar 2022 Pan Zhou, Yichen Zhou, Chenyang Si, Weihao Yu, Teck Khim Ng, Shuicheng Yan

It provides complementary instance supervision to IDS via an extra alignment on local neighbors, and scatters different local-groups separately to increase discriminability.

Contrastive Learning Self-Supervised Image Classification +3

Generalizable Person Re-Identification via Self-Supervised Batch Norm Test-Time Adaption

no code implementations1 Mar 2022 Ke Han, Chenyang Si, Yan Huang, Liang Wang, Tieniu Tan

In this paper, we investigate the generalization problem of person re-identification (re-id), whose major challenge is the distribution shift on an unseen domain.

Generalizable Person Re-identification

Contrast-reconstruction Representation Learning for Self-supervised Skeleton-based Action Recognition

no code implementations22 Nov 2021 Peng Wang, Jun Wen, Chenyang Si, Yuntao Qian, Liang Wang

Finally, in the Information Fuser, we explore varied strategies to combine the Sequence Reconstructor and Contrastive Motion Learner, and propose to capture postures and motions simultaneously via a knowledge-distillation based fusion strategy that transfers the motion learning from the Contrastive Motion Learner to the Sequence Reconstructor.

Action Recognition Contrastive Learning +4

MetaFormer Is Actually What You Need for Vision

14 code implementations CVPR 2022 Weihao Yu, Mi Luo, Pan Zhou, Chenyang Si, Yichen Zhou, Xinchao Wang, Jiashi Feng, Shuicheng Yan

Based on this observation, we hypothesize that the general architecture of the Transformers, instead of the specific token mixer module, is more essential to the model's performance.

Image Classification Object Detection +1

Progressive Cluster Purification for Transductive Few-shot Learning

no code implementations10 Jun 2019 Chenyang Si, Wentao Chen, Wei Wang, Liang Wang, Tieniu Tan

Furthermore, the inter-class classification and the intra-class transduction are extremely flexible to be repeated several times to progressively purify the clusters.

Few-Shot Learning General Classification

Pose-Guided Multi-Granularity Attention Network for Text-Based Person Search

no code implementations22 Sep 2018 Ya Jing, Chenyang Si, Jun-Bo Wang, Wei Wang, Liang Wang, Tieniu Tan

To exploit the multilevel corresponding visual contents, we propose a pose-guided multi-granularity attention network (PMA).

Person Search Sentence +1

Multistage Adversarial Losses for Pose-Based Human Image Synthesis

no code implementations CVPR 2018 Chenyang Si, Wei Wang, Liang Wang, Tieniu Tan

Human image synthesis has extensive practical applications e. g. person re-identification and data augmentation for human pose estimation.

Data Augmentation Image Generation +2

Pose-Based Two-Stream Relational Networks for Action Recognition in Videos

no code implementations22 May 2018 Wei Wang, Jinjin Zhang, Chenyang Si, Liang Wang

Second, few pose-based methods model the action-related objects in recognizing human-object interaction actions in which objects play an important role.

Action Recognition In Videos Human-Object Interaction Detection +2

Cannot find the paper you are looking for? You can Submit a new open access paper.