Search Results for author: Qinglin Lu

Found 23 papers, 12 papers with code

Hunyuan-GameCraft: High-dynamic Interactive Game Video Generation with Hybrid History Condition

no code implementations20 Jun 2025 Jiaqi Li, Junshu Tang, Zhiyong Xu, Longhuang Wu, Yuan Zhou, Shuai Shao, Tianbao Yu, Zhiguo Cao, Qinglin Lu

To address these gaps, we introduce Hunyuan-GameCraft, a novel framework for high-dynamic interactive video generation in game environments.

Temporal Sequences Video Generation

HunyuanVideo-HOMA: Generic Human-Object Interaction in Multimodal Driven Human Animation

no code implementations10 Jun 2025 Ziyao Huang, Zixiang Zhou, Juan Cao, Yifeng Ma, Yi Chen, Zejing Rao, Zhiyong Xu, Hongmei Wang, Qin Lin, Yuan Zhou, Qinglin Lu, Fan Tang

To address key limitations in human-object interaction (HOI) video generation -- specifically the reliance on curated motion data, limited generalization to novel objects/scenarios, and restricted accessibility -- we introduce HunyuanVideo-HOMA, a weakly conditioned multimodal-driven framework.

Human Animation Human-Object Interaction Detection +1

PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement

no code implementations9 Jun 2025 Teng Hu, Zhentao Yu, Zhengguang Zhou, Jiangning Zhang, Yuan Zhou, Qinglin Lu, Ran Yi

Despite recent advances in video generation, existing models still lack fine-grained controllability, especially for multi-subject customization with consistent identity and interaction.

Video Generation

OmniV2V: Versatile Video Generation and Editing via Dynamic Content Manipulation

1 code implementation2 Jun 2025 Sen Liang, Zhentao Yu, Zhengguang Zhou, Teng Hu, Hongmei Wang, Yi Chen, Qin Lin, Yuan Zhou, Xin Li, Qinglin Lu, Zhibo Chen

Although video generation is widely applied in various fields, most existing models are limited to single scenarios and cannot perform diverse video generation and editing through dynamic content manipulation.

Data Augmentation Human Animation +1

HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters

1 code implementation26 May 2025 Yi Chen, Sen Liang, Zixiang Zhou, Ziyao Huang, Yifeng Ma, Junshu Tang, Qin Lin, Yuan Zhou, Qinglin Lu

This ensures the dynamic motion and strong character consistency; (ii) An Audio Emotion Module (AEM) is introduced to extract and transfer the emotional cues from an emotion reference image to the target generated video, enabling fine-grained and accurate emotion style control; (iii) A Face-Aware Audio Adapter (FAA) is proposed to isolate the audio-driven character with latent-level face mask, enabling independent audio injection via cross-attention for multi-character scenarios.

Human Animation

HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation

1 code implementation7 May 2025 Teng Hu, Zhentao Yu, Zhengguang Zhou, Sen Liang, Yuan Zhou, Qin Lin, Qinglin Lu

Customized video generation aims to produce videos featuring specific subjects under flexible user-defined conditions, yet existing methods often struggle with identity consistency and limited input modalities.

Human-Domain Subject-to-Video Single-Domain Subject-to-Video +2

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

1 code implementation6 May 2025 Yibin Wang, Zhimin Li, Yuhang Zang, Chunyu Wang, Qinglin Lu, Cheng Jin, Jiaqi Wang

To this end, this paper proposes UnifiedReward-Think, the first unified multimodal CoT-based reward model, capable of multi-dimensional, step-by-step long-chain reasoning for both visual understanding and generation reward tasks.

Image Generation

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

1 code implementation3 Apr 2025 Fa-Ting Hong, Zunnan Xu, Zixiang Zhou, Jun Zhou, Xiu Li, Qin Lin, Qinglin Lu, Dan Xu

To this end, we introduce \textbf{ACTalker}, an end-to-end video diffusion framework that supports both multi-signals control and single-signal control for talking head video generation.

Mamba Talking Head Generation +1

FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model

no code implementations CVPR 2025 Jun Zhou, Jiahao Li, Zunnan Xu, Hanhui Li, Yiji Cheng, Fa-Ting Hong, Qin Lin, Qinglin Lu, Xiaodan Liang

By combining the VLM enhanced with fine-grained region tokens and the time-dependent diffusion model, FireEdit demonstrates significant advantages in comprehending editing instructions and maintaining high semantic consistency.

Denoising Language Modeling +1

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

no code implementations CVPR 2025 Xiaozhong Ji, Xiaobin Hu, Zhihong Xu, Junwei Zhu, Chuming Lin, Qingdong He, Jiangning Zhang, Donghao Luo, Yi Chen, Qin Lin, Qinglin Lu, Chengjie Wang

The study of talking face generation mainly explores the intricacies of synchronizing facial movements and crafting visually appealing, temporally-coherent animations.

Portrait Animation Talking Face Generation

Searching Priors Makes Text-to-Video Synthesis Better

no code implementations5 Jun 2024 Haoran Cheng, Liang Peng, Linxuan Xia, Yuepeng Hu, Hengjia Li, Qinglin Lu, Xiaofei He, Boxi Wu

Specifically, we divide T2V generation process into two steps: (i) For a given prompt input, we search existing text-video datasets to find videos with text labels that closely match the prompt motions.

GPU

LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

2 code implementations18 Mar 2024 Yang Yang, Wen Wang, Liang Peng, Chaotian Song, Yao Chen, Hengjia Li, Xiaolong Yang, Qinglin Lu, Deng Cai, Boxi Wu, Wei Liu

Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts.

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

1 code implementation13 Mar 2024 Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, Wei Liu

However, many of these works face challenges in identifying correct output modalities and generating coherent images accordingly as the number of output modalities increases and the conversations go deeper.

Prompt Engineering Text to Image Generation +1

Local Conditional Controlling for Text-to-Image Diffusion Models

no code implementations14 Dec 2023 Yibo Zhao, Liang Peng, Yang Yang, Zekai Luo, Hengjia Li, Yao Chen, Zheng Yang, Xiaofei He, Wei Zhao, Qinglin Lu, Boxi Wu, Wei Liu

It focuses on controlling specific local region according to user-defined image conditions, while the remaining regions are only conditioned by the original text prompt.

Image Generation

SmoothVideo: Smooth Video Synthesis with Noise Constraints on Diffusion Models for One-shot Video Tuning

1 code implementation29 Nov 2023 Liang Peng, Haoran Cheng, Zheng Yang, Ruisi Zhao, Linxuan Xia, Chaotian Song, Qinglin Lu, Boxi Wu, Wei Liu

By applying the loss to existing one-shot video tuning methods, we significantly improve the overall consistency and smoothness of the generated videos.

Tencent AVS: A Holistic Ads Video Dataset for Multi-modal Scene Segmentation

no code implementations9 Dec 2022 Jie Jiang, Zhimin Li, Jiangfeng Xiong, Rongwei Quan, Qinglin Lu, Wei Liu

Therefore, TAVS is distinguished from previous temporal segmentation datasets due to its multi-modal information, holistic view of categories, and hierarchical granularities.

Multi-Label Classification MUlTI-LABEL-ClASSIFICATION +4

Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward

1 code implementation25 Sep 2022 Yunlong Tang, Siting Xu, Teng Wang, Qin Lin, Qinglin Lu, Feng Zheng

The existing method performs well at video segmentation stages but suffers from the problems of dependencies on extra cumbersome models and poor performance at the segment assemblage stage.

Decoder Video Editing +2

Cannot find the paper you are looking for? You can Submit a new open access paper.