Search Results for author: Xuan Ju

Found 19 papers, 10 papers with code

Cobra: Efficient Line Art COlorization with BRoAder References

no code implementations16 Apr 2025 Junhao Zhuang, Lingen Li, Xuan Ju, Zhaoyang Zhang, Chun Yuan, Ying Shan

The comic production industry requires reference-based line art colorization with high accuracy, efficiency, contextual consistency, and flexible control.

Image Generation Line Art Colorization

FullDiT: Multi-Task Video Generative Foundation Model with Full Attention

no code implementations25 Mar 2025 Xuan Ju, Weicai Ye, Quande Liu, Qiulin Wang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Qiang Xu

Current video generative foundation models primarily focus on text-to-video tasks, providing limited control for fine-grained video content creation.

Video Generation

ColorFlow: Retrieval-Augmented Image Sequence Colorization

no code implementations16 Dec 2024 Junhao Zhuang, Xuan Ju, Zhaoyang Zhang, Yong liu, Shiyi Zhang, Chun Yuan, Ying Shan

Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization.

Colorization Image Colorization +2

BrushEdit: All-In-One Image Inpainting and Editing

no code implementations13 Dec 2024 Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Junhao Zhuang, Ying Shan, Yuexian Zou, Qiang Xu

Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods.

All Image Inpainting

Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing

no code implementations24 Oct 2024 Ling-Hao Chen, Shunlin Lu, Wenxun Dai, Zhiyang Dou, Xuan Ju, Jingbo Wang, Taku Komura, Lei Zhang

Previous motion diffusion models lack explicit modeling of the word-level text-motion correspondence and good explainability, hence restricting their fine-grained editing ability.

Motion Generation

MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls

1 code implementation30 Jul 2024 Yuxuan Bian, Ailing Zeng, Xuan Ju, Xian Liu, Zhaoyang Zhang, Wei Liu, Qiang Xu

However, employing a unified model to achieve various generation tasks with different condition modalities presents two main challenges: motion distribution drifts across different tasks (e. g., co-speech gestures and text-driven daily actions) and the complex optimization of mixed conditions with varying granularities (e. g., text and audio).

Gesture Generation Motion Generation +2

Image Inpainting Models are Effective Tools for Instruction-guided Image Editing

no code implementations18 Jul 2024 Xuan Ju, Junhao Zhuang, Zhaoyang Zhang, Yuxuan Bian, Qiang Xu, Ying Shan

The most advanced methods, such as SmartEdit and MGIE, usually combine large language models with diffusion models through joint training, where the former provides text understanding ability, and the latter provides image generation ability.

Image Inpainting

MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions

2 code implementations8 Jul 2024 Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan

Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention.

Video Alignment Video Generation

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

1 code implementation11 Mar 2024 Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, Qiang Xu

Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs).

Image Inpainting

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

no code implementations10 Mar 2024 Youyuan Zhang, Xuan Ju, James J. Clark

By leveraging the self-consistency property of CMs, we eliminate the need for time-consuming inversion or additional condition extraction, reducing editing time.

Image Generation Text-to-Video Editing +3

VLPose: Bridging the Domain Gap in Pose Estimation with Language-Vision Tuning

no code implementations22 Feb 2024 Jingyao Li, Pengguang Chen, Xuan Ju, Hong Xu, Jiaya Jia

Our research aims to bridge the domain gap between natural and artificial scenarios with efficient tuning strategies.

Pose Estimation

Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning

1 code implementation7 Feb 2024 Yuxuan Bian, Xuan Ju, Jiangtong Li, Zhijian Xu, Dawei Cheng, Qiang Xu

In this study, we present aLLM4TS, an innovative framework that adapts Large Language Models (LLMs) for time-series representation learning.

Contrastive Learning Prediction +4

Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code

3 code implementations2 Oct 2023 Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu

Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model.

Image Generation Text-based Image Editing

HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation

3 code implementations ICCV 2023 Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu

While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement.

Denoising Image Generation

DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation

1 code implementation16 Mar 2022 Ailing Zeng, Xuan Ju, Lei Yang, Ruiyuan Gao, Xizhou Zhu, Bo Dai, Qiang Xu

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch.

2D Human Pose Estimation 3D Human Pose Estimation +2

SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

2 code implementations27 Dec 2021 Ailing Zeng, Lei Yang, Xuan Ju, Jiefeng Li, Jianyi Wang, Qiang Xu

With a simple yet effective motion-aware fully-connected network, SmoothNet improves the temporal smoothness of existing pose estimators significantly and enhances the estimation accuracy of those challenging frames as a side-effect.

2D Human Pose Estimation 3D Human Pose Estimation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.