no code implementations • 24 Apr 2025 • Xu Ma, Peize Sun, Haoyu Ma, Hao Tang, Chih-Yao Ma, Jialiang Wang, Kunpeng Li, Xiaoliang Dai, Yujun Shi, Xuan Ju, Yushi Hu, Artsiom Sanakoyeu, Felix Juefei-Xu, Ji Hou, Junjiao Tian, Tao Xu, Tingbo Hou, Yen-Cheng Liu, Zecheng He, Zijian He, Matt Feiszli, Peizhao Zhang, Peter Vajda, Sam Tsai, Yun Fu
A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution.
no code implementations • 16 Apr 2025 • Junhao Zhuang, Lingen Li, Xuan Ju, Zhaoyang Zhang, Chun Yuan, Ying Shan
The comic production industry requires reference-based line art colorization with high accuracy, efficiency, contextual consistency, and flexible control.
no code implementations • 25 Mar 2025 • Xuan Ju, Weicai Ye, Quande Liu, Qiulin Wang, Xintao Wang, Pengfei Wan, Di Zhang, Kun Gai, Qiang Xu
Current video generative foundation models primarily focus on text-to-video tasks, providing limited control for fine-grained video content creation.
1 code implementation • 7 Mar 2025 • Yuxuan Bian, Zhaoyang Zhang, Xuan Ju, Mingdeng Cao, Liangbin Xie, Ying Shan, Qiang Xu
Video inpainting, which aims to restore corrupted video content, has experienced substantial progress.
no code implementations • 16 Dec 2024 • Junhao Zhuang, Xuan Ju, Zhaoyang Zhang, Yong liu, Shiyi Zhang, Chun Yuan, Ying Shan
Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization.
no code implementations • 13 Dec 2024 • Yaowei Li, Yuxuan Bian, Xuan Ju, Zhaoyang Zhang, Junhao Zhuang, Ying Shan, Yuexian Zou, Qiang Xu
Image editing has advanced significantly with the development of diffusion models using both inversion-based and instruction-based methods.
no code implementations • 24 Oct 2024 • Ling-Hao Chen, Shunlin Lu, Wenxun Dai, Zhiyang Dou, Xuan Ju, Jingbo Wang, Taku Komura, Lei Zhang
Previous motion diffusion models lack explicit modeling of the word-level text-motion correspondence and good explainability, hence restricting their fine-grained editing ability.
1 code implementation • 30 Jul 2024 • Yuxuan Bian, Ailing Zeng, Xuan Ju, Xian Liu, Zhaoyang Zhang, Wei Liu, Qiang Xu
However, employing a unified model to achieve various generation tasks with different condition modalities presents two main challenges: motion distribution drifts across different tasks (e. g., co-speech gestures and text-driven daily actions) and the complex optimization of mixed conditions with varying granularities (e. g., text and audio).
no code implementations • 18 Jul 2024 • Xuan Ju, Junhao Zhuang, Zhaoyang Zhang, Yuxuan Bian, Qiang Xu, Ying Shan
The most advanced methods, such as SmartEdit and MGIE, usually combine large language models with diffusion models through joint training, where the former provides text understanding ability, and the latter provides image generation ability.
2 code implementations • 8 Jul 2024 • Xuan Ju, Yiming Gao, Zhaoyang Zhang, Ziyang Yuan, Xintao Wang, Ailing Zeng, Yu Xiong, Qiang Xu, Ying Shan
Sora's high-motion intensity and long consistent videos have significantly impacted the field of video generation, attracting unprecedented attention.
1 code implementation • 11 Mar 2024 • Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, Qiang Xu
Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs).
no code implementations • 10 Mar 2024 • Youyuan Zhang, Xuan Ju, James J. Clark
By leveraging the self-consistency property of CMs, we eliminate the need for time-consuming inversion or additional condition extraction, reducing editing time.
no code implementations • 22 Feb 2024 • Jingyao Li, Pengguang Chen, Xuan Ju, Hong Xu, Jiaya Jia
Our research aims to bridge the domain gap between natural and artificial scenarios with efficient tuning strategies.
1 code implementation • 7 Feb 2024 • Yuxuan Bian, Xuan Ju, Jiangtong Li, Zhijian Xu, Dawei Cheng, Qiang Xu
In this study, we present aLLM4TS, an innovative framework that adapts Large Language Models (LLMs) for time-series representation learning.
3 code implementations • 2 Oct 2023 • Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
Specifically, in the context of diffusion-based editing, where a source image is edited according to a target prompt, the process commences by acquiring a noisy latent vector corresponding to the source image via the diffusion model.
Ranked #5 on
Text-based Image Editing
on PIE-Bench
3 code implementations • ICCV 2023 • Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, Qiang Xu
While such a plug-and-play approach is appealing, the inevitable and uncertain conflicts between the original images produced from the frozen SD branch and the given condition incur significant challenges for the learnable branch, which essentially conducts image feature editing for condition enforcement.
1 code implementation • CVPR 2023 • Xuan Ju, Ailing Zeng, Jianan Wang, Qiang Xu, Lei Zhang
Humans have long been recorded in a variety of forms since antiquity.
1 code implementation • 16 Mar 2022 • Ailing Zeng, Xuan Ju, Lei Yang, Ruiyuan Gao, Xizhou Zhu, Bo Dai, Qiang Xu
This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch.
Ranked #1 on
2D Human Pose Estimation
on JHMDB (2D poses only)
2 code implementations • 27 Dec 2021 • Ailing Zeng, Lei Yang, Xuan Ju, Jiefeng Li, Jianyi Wang, Qiang Xu
With a simple yet effective motion-aware fully-connected network, SmoothNet improves the temporal smoothness of existing pose estimators significantly and enhances the estimation accuracy of those challenging frames as a side-effect.