Search Results for author: Jiaxi Gu

Found 9 papers, 3 papers with code

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

no code implementations11 Jun 2024 Xing Zhang, Jiaxi Gu, Haoyu Zhao, Shicong Wang, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu, Yu-Gang Jiang

Temporal Video Grounding (TVG) aims to localize a moment from an untrimmed video given the language description.

BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

1 code implementation CVPR 2024 Fengyuan Shi, Jiaxi Gu, Hang Xu, Songcen Xu, Wei zhang, LiMin Wang

Now text-to-image foundation models are widely applied to various downstream image synthesis tasks, such as controllable image generation and image editing, while downstream video synthesis tasks are less explored for several reasons.

Image Generation Model Selection +3

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance

no code implementations5 Dec 2023 Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang

Especially for fidelity, our model has a powerful image retention ability and delivers the best results in UCF101 compared to other image-to-video models to our best knowledge.

Image to Video Generation

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model

1 code implementation29 Nov 2023 Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Zuxuan Wu, Hang Xu, Yu-Gang Jiang

Identity-consistent video generation seeks to synthesize videos that are guided by both textual prompts and reference images of entities.

Denoising Image to Video Generation +1

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models

no code implementations25 Oct 2023 Tianyi Lu, Xing Zhang, Jiaxi Gu, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu

In this way, temporal consistency can be kept with video LDM while high-fidelity from the image LDM can also be exploited.

Denoising Video Editing

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

no code implementations7 Sep 2023 Jiaxi Gu, Shicong Wang, Haoyu Zhao, Tianyi Lu, Xing Zhang, Zuxuan Wu, Songcen Xu, Wei zhang, Yu-Gang Jiang, Hang Xu

Conditioned on an initial video clip with a small number of frames, additional frames are iteratively generated by reusing the original latent features and following the previous diffusion process.

Action Recognition Decoder +4

Towards Universal Vision-language Omni-supervised Segmentation

no code implementations12 Mar 2023 Bowen Dong, Jiaxi Gu, Jianhua Han, Hang Xu, WangMeng Zuo

To improve the open-world segmentation ability, we leverage omni-supervised data (i. e., panoptic segmentation data, object detection data, and image-text pairs data) into training, thus enriching the open-world segmentation ability and achieving better segmentation accuracy.

Instance Segmentation object-detection +4

PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval

no code implementations ICCV 2023 Peiyan Guan, Renjing Pei, Bin Shao, Jianzhuang Liu, Weimian Li, Jiaxi Gu, Hang Xu, Songcen Xu, Youliang Yan, Edmund Y. Lam

The parallel isomeric attention module is used as the video encoder, which consists of two parallel branches modeling the spatial-temporal information of videos from both patch and frame levels.

Representation Learning Retrieval +3

Cannot find the paper you are looking for? You can Submit a new open access paper.