Search Results for author: Boyuan Jiang

Found 22 papers, 9 papers with code

VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption

no code implementations17 May 2025 Tianxiong Zhong, Xingye Tian, Boyuan Jiang, Xuebo Wang, Xin Tao, Pengfei Wan, Zhiwei Zhang

Modern video generation frameworks based on Latent Diffusion Models suffer from inefficiencies in tokenization due to the Frame-Proportional Information Assumption.

Decoder Position +1

CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors

no code implementations20 Feb 2025 Donghao Luo, Yujie Liang, Xu Peng, Xiaobin Hu, Boyuan Jiang, Chengming Xu, Taisong Jin, Chengjie Wang, Yanwei Fu

This framework systematically decomposes the model image into three distinct regions: try-on, reconstruction, and imagination zones.

Virtual Try-on

Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing

no code implementations CVPR 2025 Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Charles Ling, Boyu Wang

Leveraging the large generative prior of the flow transformer for tuning-free image editing requires authentic inversion to project the image into the model's domain and a flexible invariance control mechanism to preserve non-target contents.

VIVID-10M: A Dataset and Baseline for Versatile and Interactive Video Local Editing

no code implementations22 Nov 2024 Jiahao Hu, Tianxiong Zhong, Xuebo Wang, Boyuan Jiang, Xingye Tian, Fei Yang, Pengfei Wan, Di Zhang

VIVID-10M is the first large-scale hybrid image-video local editing dataset aimed at reducing data construction and model training costs, which comprises 9. 7M samples that encompass a wide range of video editing tasks.

Video Editing

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

2 code implementations15 Nov 2024 Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Chengming Xu, Jinlong Peng, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Yanwei Fu

Although image-based virtual try-on has made considerable progress, emerging approaches still encounter challenges in producing high-fidelity and robust fitting images across diverse scenarios.

Virtual Try-on

Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content

no code implementations CVPR 2025 Qiuheng Wang, Yukai Shi, Jiarong Ou, Rui Chen, Ke Lin, Jiahao Wang, Boyuan Jiang, Haotian Yang, Mingwu Zheng, Xin Tao, Fei Yang, Pengfei Wan, Di Zhang

As visual generation technologies continue to advance, the scale of video datasets has expanded rapidly, and the quality of these datasets is critical to the performance of video generation models.

Video Alignment Video Generation

VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding

2 code implementations CVPR 2025 Yujie Liang, Xiaobin Hu, Boyuan Jiang, Donghao Luo, Kai Wu, Wenhui Han, Taisong Jin, Chengjie Wang

To tackle this issue widely existing in real-world scenarios, we propose VTON-HandFit, leveraging the power of hand priors to reconstruct the appearance and structure for hand occlusion cases.

Disentanglement Virtual Try-on

Oracle Bone Inscriptions Multi-modal Dataset

no code implementations4 Jul 2024 Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography.

Decipherment Denoising

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

1 code implementation30 May 2024 Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models.

Hallucination

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition

no code implementations22 Jan 2024 Mengmeng Wang, Jiazheng Xing, Boyuan Jiang, Jun Chen, Jianbiao Mei, Xingxing Zuo, Guang Dai, Jingdong Wang, Yong liu

In this paper, we introduce a novel Multimodal, Multi-task CLIP adapting framework named \name to address these challenges, preserving both high supervised performance and robust transferability.

Action Recognition Decoder +1

Probabilistic Triangulation for Uncalibrated Multi-View 3D Human Pose Estimation

1 code implementation ICCV 2023 Boyuan Jiang, Lei Hu, Shihong Xia

The key idea is to use a probability distribution to model the camera pose and iteratively update the distribution from 2D features instead of using camera pose.

3D Human Pose Estimation 3D Pose Estimation +1

Dynamic Frame Interpolation in Wavelet Domain

1 code implementation7 Sep 2023 Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Ying Tai, Chengjie Wang, Jie Yang

Video frame interpolation is an important low-level vision task, which can increase frame rate for more fluent visual experience.

Optical Flow Estimation Video Frame Interpolation

Pose-aware Attention Network for Flexible Motion Retargeting by Body Part

1 code implementation13 Jun 2023 Lei Hu, Zihao Zhang, Chongyang Zhong, Boyuan Jiang, Shihong Xia

Moreover, we also show that our framework can generate reasonable results even for a more challenging retargeting scenario, like retargeting between bipedal and quadrupedal skeletons because of the body part retargeting strategy and PAN.

motion retargeting

IFRNet: Intermediate Feature Refine Network for Efficient Frame Interpolation

2 code implementations CVPR 2022 Lingtong Kong, Boyuan Jiang, Donghao Luo, Wenqing Chu, Xiaoming Huang, Ying Tai, Chengjie Wang, Jie Yang

Prevailing video frame interpolation algorithms, that generate the intermediate frames from consecutive inputs, typically rely on complex model architectures with heavy parameters or large delay, hindering them from diverse real-time applications.

Decoder Optical Flow Estimation +1

Learning Comprehensive Motion Representation for Action Recognition

no code implementations23 Mar 2021 Mingyu Wu, Boyuan Jiang, Donghao Luo, Junchi Yan, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Xiaokang Yang

For action recognition learning, 2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.

Action Recognition

Multi-Level Adaptive Region of Interest and Graph Learning for Facial Action Unit Recognition

no code implementations24 Feb 2021 Jingwei Yan, Boyuan Jiang, Jingjing Wang, Qiang Li, Chunmao Wang, ShiLiang Pu

In order to incorporate the intra-level AU relation and inter-level AU regional relevance simultaneously, a multi-level AU relation graph is constructed and graph convolution is performed to further enhance AU regional features of each level.

Facial Action Unit Detection Graph Learning +1

Parameter Transfer Extreme Learning Machine based on Projective Model

1 code implementation4 Sep 2018 Chao Chen, Boyuan Jiang, Xinyu Jin

Unlike the existing parameter transfer approaches, which incorporate the source model information into the target by regularizing the di erence between the source and target domain parameters, an intuitively appealing projective-model is proposed to bridge the source and target model parameters.

Domain Adaptation feature selection +1

Joint Domain Alignment and Discriminative Feature Learning for Unsupervised Deep Domain Adaptation

1 code implementation28 Aug 2018 Chao Chen, Zhihong Chen, Boyuan Jiang, Xinyu Jin

Recently, considerable effort has been devoted to deep domain adaptation in computer vision and machine learning communities.

Domain Adaptation

Cannot find the paper you are looking for? You can Submit a new open access paper.