Search Results for author: Haonan Lu

Found 31 papers, 14 papers with code

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

no code implementations1 Apr 2025 Zhenyi Liao, Qingsong Xie, Yanhao Zhang, Zijian Kong, Haonan Lu, Zhenyu Yang, Zhijie Deng

Increasing attention has been placed on improving the reasoning capacities of multi-modal large language models (MLLMs).

Spatial Reasoning

H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding

no code implementations31 Mar 2025 Qi Wu, Quanlong Zheng, Yanhao Zhang, Junlin Xie, Jinguo Luo, Kuo Wang, Peng Liu, Qingsong Xie, Ru Zhen, Haonan Lu, Zhenyu Yang

To tackle this challenge, we propose a hierarchical and holistic video understanding (H2VU) benchmark designed to evaluate both general video and online streaming video comprehension.

Video Understanding

Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens

no code implementations11 Mar 2025 Qingsong Xie, Zhao Zhang, Zhe Huang, Yanhao Zhang, Haonan Lu, Zhenyu Yang

Experiments demonstrate Layton's superiority in high-fidelity reconstruction, with 10. 8 reconstruction Frechet Inception Distance on MSCOCO-2017 5K benchmark for 1024x1024 image reconstruction.

Decoder Image Reconstruction +2

X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation

1 code implementation8 Mar 2025 Jian Ma, Qirong Peng, Xu Guo, Chen Chen, Haonan Lu, Zhenyu Yang

Compared to the teacher model, X2I shows a decrease in performance degradation of less than 1\% while gaining various multimodal understanding abilities, including multilingual to image, image to image, image-text to image, video to image, audio to image, and utilizing creative fusion to enhance imagery.

Text-to-Image Generation

GenX: Mastering Code and Test Generation with Execution Feedback

no code implementations18 Dec 2024 Nan Wang, Yafei Liu, Chen Chen, Haonan Lu

Recent advancements in language modeling have enabled the translation of natural language into code, and the use of execution feedback to improve code generation.

Code Generation Data Augmentation +2

LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image

no code implementations14 Aug 2024 Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding

Recent advancements in autonomous driving, augmented reality, robotics, and embodied intelligence have necessitated 3D perception algorithms.

Autonomous Driving Logical Reasoning +2

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

1 code implementation2 Jul 2024 Jian Ma, Yonglin Deng, Chen Chen, Haonan Lu, Zhenyu Yang

Posters play a crucial role in marketing and advertising by enhancing visual communication and brand visibility, making significant contributions to industrial design.

Marketing

TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps

1 code implementation9 Jun 2024 Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen Chen, Haonan Lu

Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest.

Image Generation Style Transfer

Probing Language Models for Pre-training Data Detection

1 code implementation3 Jun 2024 Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Haonan Lu, Bing Liu, Wenliang Chen

Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase.

Probing Language Models

LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models

no code implementations17 Apr 2024 Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, Haonan Lu

To this end, we proposed the layer pruning and normalized distillation for compressing diffusion models (LAPTOP-Diff).

Knowledge Distillation

SCott: Accelerating Diffusion Models with Stochastic Consistency Distillation

no code implementations3 Mar 2024 Hongjian Liu, Qingsong Xie, Zhijie Deng, Chen Chen, Shixiang Tang, Fueyang Fu, Zheng-Jun Zha, Haonan Lu

In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher.

Text-to-Image Generation

Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting

no code implementations19 Jan 2024 Hao Ai, Zidong Cao, Haonan Lu, Chen Chen, Jian Ma, Pengyuan Zhou, Tae-Kyun Kim, Pan Hui, Lin Wang

To this end, we propose a transformer-based 360 image outpainting framework called Dream360, which can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports, considering the spherical properties of 360 images.

Image Outpainting

MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval

no code implementations30 Oct 2023 Youbo Lei, Feifei He, Chen Chen, Yingbin Mo, Si Jia Li, Defeng Xie, Haonan Lu

Due to the success of large-scale visual-language pretraining (VLP) models and the widespread use of image-text retrieval in industry areas, it is now critically necessary to reduce the model size and streamline their mobile-device deployment.

cross-modal alignment Image-text Retrieval +1

MoEController: Instruction-based Arbitrary Image Manipulation with Mixture-of-Expert Controllers

no code implementations8 Sep 2023 Sijia Li, Chen Chen, Haonan Lu

In this work, we propose a method with a mixture-of-expert (MOE) controllers to align the text-guided capacity of diffusion models with different kinds of human instructions, enabling our model to handle various open-domain image manipulation tasks with natural language instructions.

Diversity Image Manipulation

Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning

1 code implementation21 Jul 2023 Jian Ma, Junhao Liang, Chen Chen, Haonan Lu

In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or multi-subject in any domain.

Diffusion Personalization Tuning Free Personalized Image Generation +1

Towards Language-guided Interactive 3D Generation: LLMs as Layout Interpreter with Generative Feedback

no code implementations25 May 2023 Yiqi Lin, Hao Wu, Ruichen Wang, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang

Generating and editing a 3D scene guided by natural language poses a challenge, primarily due to the complexity of specifying the positional relations and volumetric changes within the 3D space.

3D Generation

Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models

1 code implementation23 May 2023 Ruichen Wang, Zekang Chen, Chen Chen, Jian Ma, Haonan Lu, Xiaodong Lin

Our approach produces a more semantically accurate synthesis by constraining the attention regions of each token in the prompt to the image.

Attribute Image Generation

Edit Everything: A Text-Guided Generative System for Images Editing

1 code implementation27 Apr 2023 Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin

We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs.

GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation

3 code implementations31 Mar 2023 Jian Ma, Mingjun Zhao, Chen Chen, Ruichen Wang, Di Niu, Haonan Lu, Xiaodong Lin

Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate text coherently within images, particularly for complex glyph structures like Chinese characters.

Optical Character Recognition (OCR) parameter-efficient fine-tuning +1

CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout

no code implementations24 Mar 2023 Haotian Bai, Yuanhuiyi Lyu, Lutao Jiang, Sijia Li, Haonan Lu, Xiaodong Lin, Lin Wang

To tackle the issue of 'guidance collapse' and further enhance scene consistency, we propose a novel framework, dubbed CompoNeRF, by integrating an editable 3D scene layout with object-specific and scene-wide guidance mechanisms.

NeRF Object +2

AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation

1 code implementation11 Jun 2021 Mingxiang Chen, Zhanguo Chang, Haonan Lu, Bitao Yang, Zhuang Li, Liufang Guo, Zhecheng Wang

In our evaluations, the method outperforms all the state-of-the-art image retrieval algorithms on some out-of-domain image datasets.

Clustering Image Augmentation +4

DensE: An Enhanced Non-commutative Representation for Knowledge Graph Embedding with Adaptive Semantic Hierarchy

1 code implementation11 Aug 2020 Haonan Lu, Hailin Hu, Xiaodong Lin

This design principle leads to several advantages of our method: (1) For composite relations, the corresponding diagonal relation matrices can be non-commutative, reflecting a predominant scenario in real world applications; (2) Our model preserves the natural interaction between relational operations and entity embeddings; (3) The scaling operation provides the modeling power for the intrinsic semantic hierarchical structure of entities; (4) The enhanced expressiveness of DensE is achieved with high computational efficiency in terms of both parameter size and training time; and (5) Modeling entities in Euclidean space instead of quaternion space keeps the direct geometrical interpretations of relational patterns.

Computational Efficiency Entity Embeddings +2

Cannot find the paper you are looking for? You can Submit a new open access paper.