Search Results for author: Shengming Yin

Found 7 papers, 2 papers with code

Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models

3 code implementations8 Mar 2023 Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, Nan Duan

To this end, We build a system called \textbf{Visual ChatGPT}, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps.

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

no code implementations16 Aug 2023 Shengming Yin, Chenfei Wu, Jian Liang, Jie Shi, Houqiang Li, Gong Ming, Nan Duan

Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation.

Trajectory Modeling Video Generation

ORES: Open-vocabulary Responsible Visual Synthesis

1 code implementation26 Aug 2023 Minheng Ni, Chenfei Wu, Xiaodong Wang, Shengming Yin, Lijuan Wang, Zicheng Liu, Nan Duan

In this work, we formalize a new task, Open-vocabulary Responsible Visual Synthesis (ORES), where the synthesis model is able to avoid forbidden visual concepts while allowing users to input any desired content.

Image Generation Language Modelling

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

no code implementations30 Jan 2024 Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan

To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes.

Vector Graphics

Using Left and Right Brains Together: Towards Vision and Language Planning

no code implementations16 Feb 2024 Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, JianGuo Zhang

Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.