1 code implementation • 7 May 2024 • Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang
However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.
1 code implementation • 6 Feb 2024 • Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang
Drawing inspiration from human cognition in solving visual problems (e. g., marking, zoom in), this paper introduces Chain of Manipulations, a mechanism that enables VLMs to solve problems step-by-step with evidence.
1 code implementation • 14 Dec 2023 • Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang
People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e. g., computer or smartphone screens.
Ranked #15 on Visual Question Answering on MM-Vet
1 code implementation • 6 Nov 2023 • Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang
We introduce CogVLM, a powerful open-source visual language foundation model.
Ranked #4 on Visual Question Answering (VQA) on InfiMM-Eval
1 code implementation • 4 Sep 2023 • Jiayan Teng, Wendi Zheng, Ming Ding, Wenyi Hong, Jianqiao Wangni, Zhuoyi Yang, Jie Tang
Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation.
Ranked #1 on Image Generation on CelebA-HQ 256x256
1 code implementation • 29 May 2022 • Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.
Ranked #12 on Video Generation on UCF-101
1 code implementation • 28 Apr 2022 • Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang
The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images.
Ranked #44 on Text-to-Image Generation on MS COCO
4 code implementations • NeurIPS 2021 • Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang
Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.
Ranked #56 on Text-to-Image Generation on MS COCO (using extra training data)