Search Results for author: Yingda Chen

Found 9 papers, 6 papers with code

Nexus-Gen: A Unified Model for Image Understanding, Generation, and Editing

1 code implementation30 Apr 2025 Hong Zhang, Zhongjie Duan, Xingjun Wang, Yuze Zhao, Weiyi Lu, Zhipeng Di, YiXuan Xu, Yingda Chen, Yu Zhang

To bridge this gap, we present Nexus-Gen, a unified model that synergizes the language reasoning capabilities of LLMs with the image synthesis power of diffusion models.

Image Generation

Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs

no code implementations24 Apr 2025 Tiancheng Gu, Kaicheng Yang, Ziyong Feng, Xingjun Wang, Yanzhao Zhang, Dingkun Long, Yingda Chen, Weidong Cai, Jiankang Deng

The Contrastive Language-Image Pre-training (CLIP) framework has become a widely used approach for multimodal representation learning, particularly in image-text retrieval and clustering.

Image-text Retrieval Instruction Following +3

EliGen: Entity-Level Controlled Image Generation with Regional Attention

1 code implementation2 Jan 2025 Hong Zhang, Zhongjie Duan, Xingjun Wang, Yingda Chen, Yu Zhang

Recent advancements in diffusion models have significantly advanced text-to-image generation, yet global text prompts alone remain insufficient for achieving fine-grained control over individual entities within an image.

Image Inpainting Text-to-Image Generation

ArtAug: Enhancing Text-to-Image Generation through Synthesis-Understanding Interaction

1 code implementation17 Dec 2024 Zhongjie Duan, Qianyi Zhao, Cen Chen, Daoyuan Chen, Wenmeng Zhou, Yaliang Li, Yingda Chen

This enables the synthesis model to directly produce aesthetically pleasing images without any extra computational cost.

Text-to-Image Generation

Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key

no code implementations14 Oct 2024 Yingda Chen, Xingjun Wang, Jintao Huang, Yunlin Mao, Daoze Zhang, Yuze Zhao

As large language models rapidly evolve to support longer context, there is a notable disparity in their capability to generate output at greater lengths.

SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

2 code implementations10 Aug 2024 Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wenmeng Zhou, Yingda Chen

With support of over $300+$ LLMs and $50+$ MLLMs, SWIFT stands as the open-source framework that provide the most comprehensive support for fine-tuning large models.

Hallucination Optical Character Recognition +6

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

3 code implementations2 Sep 2023 Chenliang Li, Hehong Chen, Ming Yan, Weizhou Shen, Haiyang Xu, Zhikai Wu, Zhicheng Zhang, Wenmeng Zhou, Yingda Chen, Chen Cheng, Hongzhu Shi, Ji Zhang, Fei Huang, Jingren Zhou

Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior.

FaceChain: A Playground for Human-centric Artificial Intelligence Generated Content

1 code implementation28 Aug 2023 Yang Liu, Cheng Yu, Lei Shang, Yongyi He, Ziheng Wu, Xingjun Wang, Chao Xu, Haoyu Xie, Weida Wang, Yuze Zhao, Lin Zhu, Chen Cheng, Weitao Chen, Yuan YAO, Wenmeng Zhou, Jiaqi Xu, Qiang Wang, Yingda Chen, Xuansong Xie, Baigui Sun

In this paper, we present FaceChain, a personalized portrait generation framework that combines a series of customized image-generation model and a rich set of face-related perceptual understanding models (\eg, face detection, deep face embedding extraction, and facial attribute recognition), to tackle aforementioned challenges and to generate truthful personalized portraits, with only a handful of portrait images as input.

Attribute Personalized Image Generation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.