no code implementations • 1 Apr 2025 • Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang, Zhen Lei
Masked Image Modeling (MIM) with Vector Quantization (VQ) has achieved great success in both self-supervised pre-training and image generation.
no code implementations • 1 Apr 2025 • Zhenyi Liao, Qingsong Xie, Yanhao Zhang, Zijian Kong, Haonan Lu, Zhenyu Yang, Zhijie Deng
Increasing attention has been placed on improving the reasoning capacities of multi-modal large language models (MLLMs).
no code implementations • 31 Mar 2025 • Qi Wu, Quanlong Zheng, Yanhao Zhang, Junlin Xie, Jinguo Luo, Kuo Wang, Peng Liu, Qingsong Xie, Ru Zhen, Haonan Lu, Zhenyu Yang
To tackle this challenge, we propose a hierarchical and holistic video understanding (H2VU) benchmark designed to evaluate both general video and online streaming video comprehension.
no code implementations • 11 Mar 2025 • Qingsong Xie, Zhao Zhang, Zhe Huang, Yanhao Zhang, Haonan Lu, Zhenyu Yang
Experiments demonstrate Layton's superiority in high-fidelity reconstruction, with 10. 8 reconstruction Frechet Inception Distance on MSCOCO-2017 5K benchmark for 1024x1024 image reconstruction.
1 code implementation • 8 Mar 2025 • Jian Ma, Qirong Peng, Xu Guo, Chen Chen, Haonan Lu, Zhenyu Yang
Compared to the teacher model, X2I shows a decrease in performance degradation of less than 1\% while gaining various multimodal understanding abilities, including multilingual to image, image to image, image-text to image, video to image, audio to image, and utilizing creative fusion to enhance imagery.
no code implementations • 18 Dec 2024 • Nan Wang, Yafei Liu, Chen Chen, Haonan Lu
Recent advancements in language modeling have enabled the translation of natural language into code, and the use of execution feedback to improve code generation.
no code implementations • 2 Dec 2024 • Ruichen Wang, Junliang Zhang, Qingsong Xie, Chen Chen, Haonan Lu
Recently, diffusion models have exhibited superior performance in the area of image inpainting.
no code implementations • 26 Nov 2024 • Fan Yang, Ru Zhen, Jianing Wang, Yanhao Zhang, Haoxiang Chen, Haonan Lu, Sicheng Zhao, Guiguang Ding
To address these challenges, we propose HEIE: a novel MLLM-Based Hierarchical Explainable image Implausibility Evaluator.
no code implementations • 14 Aug 2024 • Fan Yang, Sicheng Zhao, Yanhao Zhang, Haoxiang Chen, Hui Chen, Wenbo Tang, Haonan Lu, Pengfei Xu, Zhenyu Yang, Jungong Han, Guiguang Ding
Recent advancements in autonomous driving, augmented reality, robotics, and embodied intelligence have necessitated 3D perception algorithms.
1 code implementation • 2 Jul 2024 • Jian Ma, Yonglin Deng, Chen Chen, Haonan Lu, Zhenyu Yang
Posters play a crucial role in marketing and advertising by enhancing visual communication and brand visibility, making significant contributions to industrial design.
1 code implementation • 9 Jun 2024 • Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen Chen, Haonan Lu
Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest.
1 code implementation • 3 Jun 2024 • Zhenhua Liu, Tong Zhu, Chuanyuan Tan, Haonan Lu, Bing Liu, Wenliang Chen
Large Language Models (LLMs) have shown their impressive capabilities, while also raising concerns about the data contamination problems due to privacy issues and leakage of benchmark datasets in the pre-training phase.
no code implementations • 17 Apr 2024 • Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, Haonan Lu
To this end, we proposed the layer pruning and normalized distillation for compressing diffusion models (LAPTOP-Diff).
no code implementations • 18 Mar 2024 • Yifan Wang, Yafei Liu, Chufan Shi, Haoling Li, Chen Chen, Haonan Lu, Yujiu Yang
Instruction tuning effectively optimizes Large Language Models (LLMs) for downstream tasks.
no code implementations • 3 Mar 2024 • Hongjian Liu, Qingsong Xie, Zhijie Deng, Chen Chen, Shixiang Tang, Fueyang Fu, Zheng-Jun Zha, Haonan Lu
In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher.
no code implementations • 19 Jan 2024 • Hao Ai, Zidong Cao, Haonan Lu, Chen Chen, Jian Ma, Pengyuan Zhou, Tae-Kyun Kim, Pan Hui, Lin Wang
To this end, we propose a transformer-based 360 image outpainting framework called Dream360, which can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports, considering the spherical properties of 360 images.
1 code implementation • 28 Nov 2023 • Jian Ma, Chen Chen, Qingsong Xie, Haonan Lu
In this paper, we are inspired to propose a simple plug-and-play language transfer method based on knowledge distillation.
Cross-lingual Text-to-Image Generation
Knowledge Distillation
+1
no code implementations • 30 Oct 2023 • Youbo Lei, Feifei He, Chen Chen, Yingbin Mo, Si Jia Li, Defeng Xie, Haonan Lu
Due to the success of large-scale visual-language pretraining (VLP) models and the widespread use of image-text retrieval in industry areas, it is now critically necessary to reduce the model size and streamline their mobile-device deployment.
no code implementations • 8 Sep 2023 • Sijia Li, Chen Chen, Haonan Lu
In this work, we propose a method with a mixture-of-expert (MOE) controllers to align the text-guided capacity of diffusion models with different kinds of human instructions, enabling our model to handle various open-domain image manipulation tasks with natural language instructions.
1 code implementation • 21 Jul 2023 • Jian Ma, Junhao Liang, Chen Chen, Haonan Lu
In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or multi-subject in any domain.
Diffusion Personalization Tuning Free
Personalized Image Generation
+1
1 code implementation • 6 Jun 2023 • Fobo Shi, Peijun Qing, Dong Yang, Nan Wang, Youbo Lei, Haonan Lu, Xiaodong Lin, Duantengchuan Li
To address this issue in prompt engineering, we propose a new and effective approach called Prompt Space.
no code implementations • 25 May 2023 • Yiqi Lin, Hao Wu, Ruichen Wang, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang
Generating and editing a 3D scene guided by natural language poses a challenge, primarily due to the complexity of specifying the positional relations and volumetric changes within the 3D space.
1 code implementation • 23 May 2023 • Ruichen Wang, Zekang Chen, Chen Chen, Jian Ma, Haonan Lu, Xiaodong Lin
Our approach produces a more semantically accurate synthesis by constraining the attention regions of each token in the prompt to the image.
1 code implementation • 27 Apr 2023 • Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin
We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs.
3 code implementations • 31 Mar 2023 • Jian Ma, Mingjun Zhao, Chen Chen, Ruichen Wang, Di Niu, Haonan Lu, Xiaodong Lin
Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate text coherently within images, particularly for complex glyph structures like Chinese characters.
Optical Character Recognition (OCR)
parameter-efficient fine-tuning
+1
no code implementations • 24 Mar 2023 • Haotian Bai, Yuanhuiyi Lyu, Lutao Jiang, Sijia Li, Haonan Lu, Xiaodong Lin, Lin Wang
To tackle the issue of 'guidance collapse' and further enhance scene consistency, we propose a novel framework, dubbed CompoNeRF, by integrating an editable 3D scene layout with object-specific and scene-wide guidance mechanisms.
1 code implementation • 27 Oct 2022 • Dong Yang, Peijun Qing, Yang Li, Haonan Lu, Xiaodong Lin
However, it remains challenging to model the negation and union operator.
no code implementations • 18 Sep 2021 • Weixuan Wang, Xiaoling Cai, Chong Hsuan Huang, Haoran Wang, Haonan Lu, Ximing Liu, Wei Peng
In this paper, we describe approaches for developing Emily, an emotion-affective open-domain chatbot.
1 code implementation • 9 Sep 2021 • Yinquan Lu, Haonan Lu, Guirong Fu, Qun Liu
Incorporating factual knowledge into pre-trained language models (PLM) such as BERT is an emerging trend in recent NLP studies.
Ranked #11 on
Common Sense Reasoning
on ReCoRD
1 code implementation • 11 Jun 2021 • Mingxiang Chen, Zhanguo Chang, Haonan Lu, Bitao Yang, Zhuang Li, Liufang Guo, Zhecheng Wang
In our evaluations, the method outperforms all the state-of-the-art image retrieval algorithms on some out-of-domain image datasets.
1 code implementation • 11 Aug 2020 • Haonan Lu, Hailin Hu, Xiaodong Lin
This design principle leads to several advantages of our method: (1) For composite relations, the corresponding diagonal relation matrices can be non-commutative, reflecting a predominant scenario in real world applications; (2) Our model preserves the natural interaction between relational operations and entity embeddings; (3) The scaling operation provides the modeling power for the intrinsic semantic hierarchical structure of entities; (4) The enhanced expressiveness of DensE is achieved with high computational efficiency in terms of both parameter size and training time; and (5) Modeling entities in Euclidean space instead of quaternion space keeps the direct geometrical interpretations of relational patterns.
Ranked #7 on
Link Prediction
on WN18