no code implementations • 2 Jan 2025 • Tao Feng, Wei Li, Didi Zhu, Hangjie Yuan, Wendi Zheng, Dan Zhang, Jie Tang
This work provides essential insights and tools for advancing forward pass methods to overcome forgetting.
1 code implementation • 30 Dec 2024 • Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, YuAn Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, Yuxiao Dong
We present a general strategy to aligning visual generation models -- both image and video generation -- with human preference.
no code implementations • 3 Nov 2024 • Yean Cheng, Ziqi Cai, Ming Ding, Wendi Zheng, Shiyu Huang, Yuxiao Dong, Jie Tang, Boxin Shi
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
1 code implementation • 12 Aug 2024 • Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang
We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.
1 code implementation • 7 May 2024 • Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang
However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.
no code implementations • 8 Mar 2024 • Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang
Recent advancements in text-to-image generative systems have been largely driven by diffusion models.
2 code implementations • 23 Feb 2024 • Zhefan Wang, Yuanqing Yu, Wendi Zheng, Weizhi Ma, Min Zhang
LLM-based agents have gained considerable attention for their decision-making skills and ability to handle complex tasks.
1 code implementation • 4 Sep 2023 • Jiayan Teng, Wendi Zheng, Ming Ding, Wenyi Hong, Jianqiao Wangni, Zhuoyi Yang, Jie Tang
Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation.
Ranked #1 on Image Generation on CelebA-HQ 256x256
9 code implementations • 5 Oct 2022 • Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, WenGuang Chen, Peng Zhang, Yuxiao Dong, Jie Tang
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.
Ranked #1 on Language Modelling on CLUE (OCNLI_50K)
1 code implementation • 29 May 2022 • Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang
Large-scale pretrained transformers have created milestones in text (GPT-3) and text-to-image (DALL-E and CogView) generation.
Ranked #19 on Video Generation on UCF-101
1 code implementation • 28 Apr 2022 • Ming Ding, Wendi Zheng, Wenyi Hong, Jie Tang
The development of the transformer-based text-to-image models are impeded by its slow generation and complexity for high-resolution images.
Ranked #42 on Text-to-Image Generation on MS COCO
4 code implementations • NeurIPS 2021 • Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang
Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.
Ranked #53 on Text-to-Image Generation on MS COCO (using extra training data)