Search Results for author: Biao Gong

Found 28 papers, 10 papers with code

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

1 code implementation5 May 2025 Inclusion AI, Biao Gong, Cheng Zou, Dandan Zheng, Hu Yu, Jingdong Chen, Jianxin Sun, Junbo Zhao, Jun Zhou, Kaixiang Ji, Lixiang Ru, Libin Wang, Qingpei Guo, Rui Liu, Weilong Chai, Xinyu Xiao, Ziyuan Huang

We introduce Ming-Lite-Uni, an open-source multimodal framework featuring a newly designed unified visual generator and a native multimodal autoregressive model tailored for unifying vision and language.

multimodal interaction Text-to-Image Generation

DreamRelation: Relation-Centric Video Customization

no code implementations10 Mar 2025 Yujie Wei, Shiwei Zhang, Hangjie Yuan, Biao Gong, Longxiang Tang, Xiang Wang, Haonan Qiu, Hengjia Li, Shuai Tan, Yingya Zhang, Hongming Shan

First, in Relational Decoupling Learning, we disentangle relations from subject appearances using relation LoRA triplet and hybrid mask training strategy, ensuring better generalization across diverse relationships.

Relation Triplet +1

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

1 code implementation11 Dec 2024 Fan Lu, Wei Wu, Kecheng Zheng, Shuailei Ma, Biao Gong, Jiawei Liu, Wei Zhai, Yang Cao, Yujun Shen, Zheng-Jun Zha

Generating detailed captions comprehending text-rich visual content in images has received growing attention for Large Vision-Language Models (LVLMs).

Attribute Benchmarking +2

Learning Visual Generative Priors without Text

no code implementations10 Dec 2024 Shuailei Ma, Kecheng Zheng, Ying WEI, Wei Wu, Fan Lu, Yifei Zhang, Chen-Wei Xie, Biao Gong, Jiapeng Zhu, Yujun Shen

Although text-to-image (T2I) models have recently thrived as visual generative priors, their reliance on high-quality text-image pairs makes scaling up expensive.

Image to 3D Philosophy

LumiSculpt: A Consistency Lighting Control Network for Video Generation

no code implementations30 Oct 2024 Yuxin Zhang, Dandan Zheng, Biao Gong, Jingdong Chen, Ming Yang, WeiMing Dong, Changsheng Xu

Lighting plays a pivotal role in ensuring the naturalness of video generation, significantly influencing the aesthetic quality of the generated content.

Video Generation

Framer: Interactive Frame Interpolation

no code implementations24 Oct 2024 Wen Wang, Qiuyu Wang, Kecheng Zheng, Hao Ouyang, Zhekai Chen, Biao Gong, Hao Chen, Yujun Shen, Chunhua Shen

We propose Framer for interactive frame interpolation, which targets producing smoothly transitioning frames between two images as per user creativity.

Image Morphing Video Generation

Animate-X: Universal Character Image Animation with Enhanced Motion Representation

no code implementations14 Oct 2024 Shuai Tan, Biao Gong, Xiang Wang, Shiwei Zhang, Dandan Zheng, Ruobing Zheng, Kecheng Zheng, Jingdong Chen, Ming Yang

Our in-depth analysis suggests to attribute this limitation to their insufficient modeling of motion, which is unable to comprehend the movement pattern of the driving video, thus imposing a pose sequence rigidly onto the target character.

Attribute Image Animation

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

1 code implementation4 Sep 2024 Wen Li, Muyuan Fang, Cheng Zou, Biao Gong, Ruobing Zheng, Meng Wang, Jingdong Chen, Ming Yang

To tackle these challenges, we introduce StyleTokenizer, a zero-shot style control image generation method that aligns style representation with text representation using a style tokenizer.

Denoising Text-to-Image Generation

Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning

no code implementations30 Aug 2024 Fengyuan Dai, Siteng Huang, Min Zhang, Biao Gong, Donglin Wang

To transfer knowledge from seen attribute-object compositions to recognize unseen ones, recent compositional zero-shot learning (CZSL) methods mainly discuss the optimal classification branches to identify the elements, leading to the popularity of employing a three-branch architecture.

Attribute Compositional Zero-Shot Learning +1

CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence

1 code implementation26 Aug 2024 Chaochao Chen, Jiaming Zhang, Yizhao Zhang, Li Zhang, Lingjuan Lyu, Yuyuan Li, Biao Gong, Chenggang Yan

Specifically, we consider the deeper influence of unlearning on recommendation fairness and robustness towards data with varying impact levels.

Fairness Machine Unlearning +1

Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight

no code implementations22 Jul 2024 Ziyuan Huang, Kaixiang Ji, Biao Gong, Zhiwu Qing, Qinglong Zhang, Kecheng Zheng, Jian Wang, Jingdong Chen, Ming Yang

This paper introduces Chain-of-Sight, a vision-language bridge module that accelerates the pre-training of Multimodal Large Language Models (MLLMs).

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

no code implementations24 Jun 2024 Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng

Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns.

4k Denoising +3

Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations CVPR 2024 Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text-to-Image Generation

A Recipe for Scaling up Text-to-Video Generation with Text-free Videos

1 code implementation CVPR 2024 Xiang Wang, Shiwei Zhang, Hangjie Yuan, Zhiwu Qing, Biao Gong, Yingya Zhang, Yujun Shen, Changxin Gao, Nong Sang

Following such a pipeline, we study the effect of doubling the scale of training set (i. e., video-only WebVid10M) with some randomly collected text-free videos and are encouraged to observe the performance improvement (FID from 9. 67 to 8. 19 and FVD from 484 to 441), demonstrating the scalability of our approach.

Text-to-Image Generation Text-to-Video Generation +2

Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following

1 code implementation CVPR 2024 Yutong Feng, Biao Gong, Di Chen, Yujun Shen, Yu Liu, Jingren Zhou

Existing text-to-image (T2I) diffusion models usually struggle in interpreting complex prompts, especially those with quantity, object-attribute binding, and multi-subject descriptions.

Attribute Denoising +1

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation

no code implementations27 Nov 2023 Biao Gong, Siteng Huang, Yutong Feng, Shiwei Zhang, Yuyuan Li, Yu Liu

To align the generated image with layout instructions, we present a training-free layout calibration system SimM that intervenes in the generative process on the fly during inference time.

Text-to-Image Generation

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation

no code implementations CVPR 2024 Siteng Huang, Biao Gong, Yutong Feng, Xi Chen, Yuqian Fu, Yu Liu, Donglin Wang

Experimental results show that existing subject-driven customization methods fail to learn the representative characteristics of actions and struggle in decoupling actions from context features, including appearance.

Text-to-Image Generation

Logic Diffusion for Knowledge Graph Reasoning

no code implementations6 Jun 2023 Xiaoying Xie, Biao Gong, Yiliang Lv, Zhen Han, Guoshuai Zhao, Xueming Qian

Most recent works focus on answering first order logical queries to explore the knowledge graph reasoning via multi-hop logic predictions.

Selective and Collaborative Influence Function for Efficient Recommendation Unlearning

no code implementations20 Apr 2023 Yuyuan Li, Chaochao Chen, Xiaolin Zheng, Yizhao Zhang, Biao Gong, Jun Wang

In this paper, we first identify two main disadvantages of directly applying existing unlearning methods in the context of recommendation, i. e., (i) unsatisfactory efficiency for large-scale recommendation models and (ii) destruction of collaboration across users and items.

Recommendation Systems

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

1 code implementation CVPR 2024 Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, Donglin Wang

Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs.

Compositional Zero-Shot Learning Object

ViM: Vision Middleware for Unified Downstream Transferring

no code implementations ICCV 2023 Yutong Feng, Biao Gong, Jianwen Jiang, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou

ViM consists of a zoo of lightweight plug-in modules, each of which is independently learned on a midstream dataset with a shared frozen backbone.

VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

1 code implementation CVPR 2023 Siteng Huang, Biao Gong, Yulin Pan, Jianwen Jiang, Yiliang Lv, Yuyuan Li, Donglin Wang

Many recent studies leverage the pre-trained CLIP for text-video cross-modal retrieval by tuning the backbone with additional heavy modules, which not only brings huge computational burdens with much more parameters, but also leads to the knowledge forgetting from upstream models.

Cross-Modal Retrieval Retrieval +1

Deep Multi-View Enhancement Hashing for Image Retrieval

no code implementations1 Feb 2020 Chenggang Yan, Biao Gong, Yuxuan Wei, Yue Gao

Therefore, we try to introduce the multi-view deep neural network into the hash learning field, and design an efficient and innovative retrieval model, which has achieved a significant improvement in retrieval performance.

Image Retrieval Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.