no code implementations • 17 May 2025 • Tianxiong Zhong, Xingye Tian, Boyuan Jiang, Xuebo Wang, Xin Tao, Pengfei Wan, Zhiwei Zhang
Modern video generation frameworks based on Latent Diffusion Models suffer from inefficiencies in tokenization due to the Frame-Proportional Information Assumption.
no code implementations • 20 Feb 2025 • Donghao Luo, Yujie Liang, Xu Peng, Xiaobin Hu, Boyuan Jiang, Chengming Xu, Taisong Jin, Chengjie Wang, Yanwei Fu
This framework systematically decomposes the model image into three distinct regions: try-on, reconstruction, and imagination zones.
no code implementations • 4 Dec 2024 • Qingdong He, Jinlong Peng, Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Yong liu, Yabiao Wang, Chengjie Wang, Xiangtai Li, Jiangning Zhang
To enhance the controllability of text-to-image diffusion models, current ControlNet-like models have explored various control signals to dictate image attributes.
no code implementations • CVPR 2025 • Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Charles Ling, Boyu Wang
Leveraging the large generative prior of the flow transformer for tuning-free image editing requires authentic inversion to project the image into the model's domain and a flexible invariance control mechanism to preserve non-target contents.
no code implementations • 22 Nov 2024 • Jiahao Hu, Tianxiong Zhong, Xuebo Wang, Boyuan Jiang, Xingye Tian, Fei Yang, Pengfei Wan, Di Zhang
VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model training costs, which comprises 9. 7M samples that encompass a wide range of video editing tasks.
2 code implementations • 15 Nov 2024 • Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Chengming Xu, Jinlong Peng, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Yanwei Fu
Although image-based virtual try-on has made considerable progress, emerging approaches still encounter challenges in producing high-fidelity and robust fitting images across diverse scenarios.
Ranked #1 on
Virtual Try-on
on VITON-HD
no code implementations • CVPR 2025 • Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang
As visual generation technologies continue to advance, the scale of video datasets has expanded rapidly, and the quality of these datasets is critical to the performance of video generation models.
2 code implementations • CVPR 2025 • Yujie Liang, Xiaobin Hu, Boyuan Jiang, Donghao Luo, Kai Wu, Wenhui Han, Taisong Jin, Chengjie Wang
To tackle this issue widely existing in real-world scenarios, we propose VTON-HandFit, leveraging the power of hand priors to reconstruct the appearance and structure for hand occlusion cases.
no code implementations • 4 Jul 2024 • Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu
Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography.
1 code implementation • 30 May 2024 • Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang
Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models.
no code implementations • 22 Jan 2024 • Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, Yong liu
In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named \name to address these challenges, preserving both high supervised performance and robust transferability.
no code implementations • CVPR 2024 • Xu Peng, Junwei Zhu, Boyuan Jiang, Ying Tai, Donghao Luo, Jiangning Zhang, Wei Lin, Taisong Jin, Chengjie Wang, Rongrong Ji
Moreover, these methods often grapple with identity distortion and limited expression diversity.
1 code implementation • ICCV 2023 • Boyuan Jiang, Lei Hu, Shihong Xia
The key idea is to use a probability distribution to model the camera pose and iteratively update the distribution from 2D features instead of using camera pose.
1 code implementation • 7 Sep 2023 • Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Ying Tai, Chengjie Wang, Jie Yang
Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience.
1 code implementation • 13 Jun 2023 • Lei Hu, Zihao Zhang, Chongyang Zhong, Boyuan Jiang, Shihong Xia
Moreover, we also show that our framework can generate reasonable results even for a more challenging retargeting scenario, like retargeting between bipedal and quadrupedal skeletons because of the body part retargeting strategy and PAN.
2 code implementations • CVPR 2022 • Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, Jie Yang
Prevailing video frame interpolation algorithms, that generate the intermediate frames from consecutive inputs, typically rely on complex model architectures with heavy parameters or large delay, hindering them from diverse real-time applications.
Ranked #1 on
Video Frame Interpolation
on Middlebury
no code implementations • 23 Mar 2021 • Mingyu Wu, Boyuan Jiang, Donghao Luo, Junchi Yan, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Xiaokang Yang
For action recognition learning, 2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
no code implementations • 24 Feb 2021 • Jingwei Yan, Boyuan Jiang, Jingjing Wang, Qiang Li, Chunmao Wang, ShiLiang Pu
In order to incorporate the intra-level AU relation and inter-level AU regional relevance simultaneously, a multi-level AU relation graph is constructed and graph convolution is performed to further enhance AU regional features of each level.
no code implementations • ICCV 2019 • Boyuan Jiang, Mengmeng Wang, Weihao Gan, Wei Wu, Junjie Yan
Spatiotemporal and motion features are two complementary and crucial information for video action recognition.
Ranked #1 on
Action Recognition In Videos
on HMDB-51
no code implementations • CVPR 2020 • Zhihong Chen, Chao Chen, Zhaowei Cheng, Boyuan Jiang, Ke Fang, Xinyu Jin
However, since the domain shift between source and target domains, only using the deep features for sample selection is defective.
Ranked #6 on
Partial Domain Adaptation
on Office-31
1 code implementation • 4 Sep 2018 • Chao Chen, Boyuan Jiang, Xinyu Jin
Unlike the existing parameter transfer approaches, which incorporate the source model information into the target by regularizing the di erence between the source and target domain parameters, an intuitively appealing projective-model is proposed to bridge the source and target model parameters.
1 code implementation • 28 Aug 2018 • Chao Chen, Zhihong Chen, Boyuan Jiang, Xinyu Jin
Recently, considerable effort has been devoted to deep domain adaptation in computer vision and machine learning communities.