Search Results for author: Xinyuan Chen

Found 24 papers, 15 papers with code

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

1 code implementation • 13 Jul 2023 • Yi Wang, Yinan He, Yizhuo Li, Kunchang Li, Jiashuo Yu, Xin Ma, Xinhao Li, Guo Chen, Xinyuan Chen, Yaohui Wang, Conghui He, Ping Luo, Ziwei Liu, Yali Wang, LiMin Wang, Yu Qiao

Specifically, we utilize a multi-scale approach to generate video-related descriptions.

Action Recognition Contrastive Learning +7

897

Paper
Code

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

2 code implementations • 26 Sep 2023 • Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu

To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.

Ranked #4 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation +1

719

Paper
Code

VBench: Comprehensive Benchmark Suite for Video Generative Models

1 code implementation • 29 Nov 2023 • Ziqi Huang, Yinan He, Jiashuo Yu, Fan Zhang, Chenyang Si, Yuming Jiang, Yuanhan Zhang, Tianxing Wu, Qingyang Jin, Nattapol Chanpaisit, Yaohui Wang, Xinyuan Chen, LiMin Wang, Dahua Lin, Yu Qiao, Ziwei Liu

We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference annotations, and also include more video generation models in VBench to drive forward the field of video generation.

Image Generation Video Generation

265

Paper
Code

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

1 code implementation • CVPR 2021 • Yangchen Xie, Xinyuan Chen, Li Sun, Yue Lu

Font generation is a challenging problem especially for some writing systems that consist of a large number of characters and has attracted a lot of attention in recent years.

Font Generation Image-to-Image Translation

196

Paper
Code

DGFont++: Robust Deformable Generative Networks for Unsupervised Font Generation

1 code implementation • 30 Dec 2022 • Xinyuan Chen, Yangchen Xie, Li Sun, Yue Lu

Moreover, we introduce contrastive self-supervised learning to learn a robust style representation for fonts by understanding the similarity and dissimilarities of fonts.

Font Generation Self-Supervised Learning +1

196

Paper
Code

Latte: Latent Diffusion Transformer for Video Generation

2 code implementations • 5 Jan 2024 • Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, Yu Qiao

We propose a novel Latent Diffusion Transformer, namely Latte, for video generation.

Text-to-Video Generation Video Generation

135

Paper
Code

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

1 code implementation • 23 Nov 2023 • YuFei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen

Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference.

Image Super-Resolution

125

Paper
Code

Diff-Font: Diffusion Model for Robust One-Shot Font Generation

1 code implementation • 12 Dec 2022 • Haibin He, Xinyuan Chen, Chaoyue Wang, Juhua Liu, Bo Du, DaCheng Tao, Yu Qiao

Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character.

Font Generation

Paper
Code

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

1 code implementation • 19 Dec 2023 • Lingjun Zhang, Xinyuan Chen, Yaohui Wang, Yue Lu, Yu Qiao

To tackle this problem, we propose Diff-Text, which is a training-free scene text generation framework for any language.

Text Generation Text-to-Image Generation

Paper
Code

Cross Attention Based Style Distribution for Controllable Person Image Synthesis

1 code implementation • 1 Aug 2022 • Xinyue Zhou, Mingyu Yin, Xinyuan Chen, Li Sun, Changxin Gao, Qingli Li

In this paper, we propose a cross attention based style distribution module that computes between the source semantic styles and target pose for pose transfer.

Pose Transfer Virtual Try-on

Paper
Code

Long-Term Rhythmic Video Soundtracker

1 code implementation • 2 May 2023 • Jiashuo Yu, Yaohui Wang, Xinyuan Chen, Xiao Sun, Yu Qiao

To this end, we present Long-Term Rhythmic Video Soundtracker (LORIS), a novel framework to synthesize long-term conditional waveforms.

Paper
Code

ConditionVideo: Training-Free Condition-Guided Text-to-Video Generation

1 code implementation • 11 Oct 2023 • Bo Peng, Xinyuan Chen, Yaohui Wang, Chaochao Lu, Yu Qiao

In this work, we introduce ConditionVideo, a training-free approach to text-to-video generation based on the provided condition, video, and input text, by leveraging the power of off-the-shelf text-to-image generation methods (e. g., Stable Diffusion).

Text-to-Image Generation Text-to-Video Generation +1

Paper
Code

LEO: Generative Latent Image Animator for Human Video Synthesis

5 code implementations • 6 May 2023 • Yaohui Wang, Xin Ma, Xinyuan Chen, Antitza Dantcheva, Bo Dai, Yu Qiao

Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance.

Disentanglement Video Editing

Paper
Code

Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer

2 code implementations • 4 Apr 2019 • Xinyuan Chen, Chang Xu, Xiaokang Yang, Li Song, DaCheng Tao

We propose adversarial gated networks (Gated GAN) to transfer multiple styles in a single model.

Style Transfer

Paper
Code

Vlogger: Make Your Dream A Vlog

1 code implementation • 17 Jan 2024 • Shaobin Zhuang, Kunchang Li, Xinyuan Chen, Yaohui Wang, Ziwei Liu, Yu Qiao, Yali Wang

More importantly, Vlogger can generate over 5-minute vlogs from open-world descriptions, without loss of video coherence on script and actor.

Language Modelling Large Language Model +1

Paper
Code

Attention-GAN for Object Transfiguration in Wild Images

no code implementations • ECCV 2018 • Xinyuan Chen, Chang Xu, Xiaokang Yang, DaCheng Tao

This paper studies the object transfiguration problem in wild images.

Object

Paper
Add Code

S-OHEM: Stratified Online Hard Example Mining for Object Detection

no code implementations • 5 May 2017 • Minne Li, Zhaoning Zhang, Hao Yu, Xinyuan Chen, Dongsheng Li

S-OHEM exploits OHEM with stratified sampling, a widely-adopted sampling technique, to choose the training examples according to this influence during hard example mining, and thus enhance the performance of object detectors.

object-detection Object Detection

Paper
Add Code

OCR-RTPS: An OCR-based real-time positioning system for the valet parking

no code implementations • 8 Dec 2022 • Zizhang Wu, Xinyuan Chen, Jizheng Wang, Xiaoquan Wang, Yuanzhu Gan, Muqing Fang, Tianhao Xu

Obtaining the position of ego-vehicle is a crucial prerequisite for automatic control and path planning in the field of autonomous driving.

Autonomous Driving Optical Character Recognition (OCR) +1

Paper
Add Code

Hierarchical Diffusion Autoencoders and Disentangled Image Manipulation

no code implementations • 24 Apr 2023 • Zeyu Lu, Chengyue Wu, Xinyuan Chen, Yaohui Wang, Lei Bai, Yu Qiao, Xihui Liu

To mitigate those limitations, we propose Hierarchical Diffusion Autoencoders (HDAE) that exploit the fine-grained-to-abstract and lowlevel-to-high-level feature hierarchy for the latent space of diffusion models.

Image Generation Image Manipulation +1

Paper
Add Code

Weakly Supervised Scene Text Generation for Low-resource Languages

no code implementations • 25 Jun 2023 • Yangchen Xie, Xinyuan Chen, Hongjian Zhan, Palaiahankote Shivakum, Bing Yin, Cong Liu, Yue Lu

A large number of annotated training images is crucial for training successful scene text recognition models.

Scene Text Recognition Text Generation

Paper
Add Code

PPD: A New Valet Parking Pedestrian Fisheye Dataset for Autonomous Driving

no code implementations • 20 Sep 2023 • Zizhang Wu, Xinyuan Chen, Fan Song, Yuanzhu Gan, Tianhao Xu, Jian Pu, Rui Tang

In this paper, wepresent the Parking Pedestrian Dataset (PPD), a large-scale fisheye dataset to support research dealing with real-world pedestrians, especially with occlusions and diverse postures.

Autonomous Driving Data Augmentation +1

Paper
Add Code

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

no code implementations • 31 Oct 2023 • Xinyuan Chen, Yaohui Wang, Lingjun Zhang, Shaobin Zhuang, Xin Ma, Jiashuo Yu, Yali Wang, Dahua Lin, Yu Qiao, Ziwei Liu

The goal is to generate high-quality long videos with smooth and creative transitions between scenes and varying lengths of shot-level videos.

Paper
Add Code

EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion

no code implementations • 11 Dec 2023 • Zehuan Huang, Hao Wen, Junting Dong, Yaohui Wang, Yangguang Li, Xinyuan Chen, Yan-Pei Cao, Ding Liang, Yu Qiao, Bo Dai, Lu Sheng

Generating multiview images from a single view facilitates the rapid generation of a 3D mesh conditioned on a single image.

SSIM

Paper
Add Code

A flexible Bayesian g-formula for causal survival analyses with time-dependent confounding

no code implementations • 4 Feb 2024 • Xinyuan Chen, Liangyuan Hu, Fan Li

To enhance the traditional parametric g-formula approach, we developed a more adaptable Bayesian g-formula estimator.

Causal Inference Dimensionality Reduction +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.