no code implementations • 17 Apr 2024 • Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, Haonan Lu
To this end, we proposed the layer pruning and normalized distillation for compressing diffusion models (LAPTOP-Diff).
no code implementations • 18 Mar 2024 • Yifan Wang, Yafei Liu, Chufan Shi, Haoling Li, Chen Chen, Haonan Lu, Yujiu Yang
Instruction tuning effectively optimizes Large Language Models (LLMs) for downstream tasks.
no code implementations • 3 Mar 2024 • Hongjian Liu, Qingsong Xie, Zhijie Deng, Chen Chen, Shixiang Tang, Fueyang Fu, Zheng-Jun Zha, Haonan Lu
In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher.
no code implementations • 19 Jan 2024 • Hao Ai, Zidong Cao, Haonan Lu, Chen Chen, Jian Ma, Pengyuan Zhou, Tae-Kyun Kim, Pan Hui, Lin Wang
To this end, we propose a transformer-based 360 image outpainting framework called Dream360, which can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports, considering the spherical properties of 360 images.
1 code implementation • 28 Nov 2023 • Jian Ma, Chen Chen, Qingsong Xie, Haonan Lu
In this paper, we are inspired to propose a simple plug-and-play language transfer method based on knowledge distillation.
Cross-lingual Text-to-Image Generation Knowledge Distillation +1
no code implementations • 30 Oct 2023 • Youbo Lei, Feifei He, Chen Chen, Yingbin Mo, Si Jia Li, Defeng Xie, Haonan Lu
Due to the success of large-scale visual-language pretraining (VLP) models and the widespread use of image-text retrieval in industry areas, it is now critically necessary to reduce the model size and streamline their mobile-device deployment.
no code implementations • 8 Sep 2023 • Sijia Li, Chen Chen, Haonan Lu
In this work, we propose a method with a mixture-of-expert (MOE) controllers to align the text-guided capacity of diffusion models with different kinds of human instructions, enabling our model to handle various open-domain image manipulation tasks with natural language instructions.
1 code implementation • 21 Jul 2023 • Jian Ma, Junhao Liang, Chen Chen, Haonan Lu
In this paper, we propose Subject-Diffusion, a novel open-domain personalized image generation model that, in addition to not requiring test-time fine-tuning, also only requires a single reference image to support personalized generation of single- or multi-subject in any domain.
Diffusion Personalization Tuning Free Text-to-Image Generation
1 code implementation • 6 Jun 2023 • Fobo Shi, Peijun Qing, Dong Yang, Nan Wang, Youbo Lei, Haonan Lu, Xiaodong Lin, Duantengchuan Li
To address this issue in prompt engineering, we propose a new and effective approach called Prompt Space.
no code implementations • 25 May 2023 • Yiqi Lin, Hao Wu, Ruichen Wang, Haonan Lu, Xiaodong Lin, Hui Xiong, Lin Wang
Generating and editing a 3D scene guided by natural language poses a challenge, primarily due to the complexity of specifying the positional relations and volumetric changes within the 3D space.
1 code implementation • 23 May 2023 • Ruichen Wang, Zekang Chen, Chen Chen, Jian Ma, Haonan Lu, Xiaodong Lin
Our approach produces a more semantically accurate synthesis by constraining the attention regions of each token in the prompt to the image.
1 code implementation • 27 Apr 2023 • Defeng Xie, Ruichen Wang, Jian Ma, Chen Chen, Haonan Lu, Dong Yang, Fobo Shi, Xiaodong Lin
We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs.
3 code implementations • 31 Mar 2023 • Jian Ma, Mingjun Zhao, Chen Chen, Ruichen Wang, Di Niu, Haonan Lu, Xiaodong Lin
Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions. Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate text coherently within images, particularly for complex glyph structures like Chinese characters.
Optical Character Recognition (OCR) Text-to-Image Generation
no code implementations • 24 Mar 2023 • Haotian Bai, Yuanhuiyi Lyu, Lutao Jiang, Sijia Li, Haonan Lu, Xiaodong Lin, Lin Wang
To tackle the issue of 'guidance collapse' and enhance consistency, we propose a novel framework, dubbed CompoNeRF, by integrating an editable 3D scene layout with object specific and scene-wide guidance mechanisms.
1 code implementation • 27 Oct 2022 • Dong Yang, Peijun Qing, Yang Li, Haonan Lu, Xiaodong Lin
However, it remains challenging to model the negation and union operator.
no code implementations • 18 Sep 2021 • Weixuan Wang, Xiaoling Cai, Chong Hsuan Huang, Haoran Wang, Haonan Lu, Ximing Liu, Wei Peng
In this paper, we describe approaches for developing Emily, an emotion-affective open-domain chatbot.
1 code implementation • 9 Sep 2021 • Yinquan Lu, Haonan Lu, Guirong Fu, Qun Liu
Incorporating factual knowledge into pre-trained language models (PLM) such as BERT is an emerging trend in recent NLP studies.
Ranked #11 on Common Sense Reasoning on ReCoRD
1 code implementation • 11 Jun 2021 • Mingxiang Chen, Zhanguo Chang, Haonan Lu, Bitao Yang, Zhuang Li, Liufang Guo, Zhecheng Wang
In our evaluations, the method outperforms all the state-of-the-art image retrieval algorithms on some out-of-domain image datasets.
1 code implementation • 11 Aug 2020 • Haonan Lu, Hailin Hu, Xiaodong Lin
This design principle leads to several advantages of our method: (1) For composite relations, the corresponding diagonal relation matrices can be non-commutative, reflecting a predominant scenario in real world applications; (2) Our model preserves the natural interaction between relational operations and entity embeddings; (3) The scaling operation provides the modeling power for the intrinsic semantic hierarchical structure of entities; (4) The enhanced expressiveness of DensE is achieved with high computational efficiency in terms of both parameter size and training time; and (5) Modeling entities in Euclidean space instead of quaternion space keeps the direct geometrical interpretations of relational patterns.
Ranked #7 on Link Prediction on WN18