no code implementations • 6 Jan 2025 • Wenyi Hong, Yean Cheng, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang
To address this gap, we propose MotionBench, a comprehensive evaluation benchmark designed to assess the fine-grained motion comprehension of video understanding models.
1 code implementation • 30 Dec 2024 • Jiazheng Xu, Yu Huang, Jiale Cheng, Yuanming Yang, Jiajun Xu, YuAn Wang, Wenbo Duan, Shen Yang, Qunlin Jin, Shurun Li, Jiayan Teng, Zhuoyi Yang, Wendi Zheng, Xiao Liu, Ming Ding, Xiaohan Zhang, Xiaotao Gu, Shiyu Huang, Minlie Huang, Jie Tang, Yuxiao Dong
We present a general strategy to aligning visual generation models -- both image and video generation -- with human preference.
1 code implementation • 1 Sep 2024 • Haobo Yang, Shiyan Zhang, Zhuoyi Yang, Xinyu Zhang, Li Wang, Yifan Tang, Jilong Guo, Jun Li
Entropy Loss is formulated based on the functionality of feature compression networks within the perception model.
3 code implementations • 29 Aug 2024 • Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications.
Ranked #10 on
Visual Question Answering
on MM-Vet
1 code implementation • 12 Aug 2024 • Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang
We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.
1 code implementation • 7 May 2024 • Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang
However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.
no code implementations • 8 Mar 2024 • Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang
Recent advancements in text-to-image generative systems have been largely driven by diffusion models.
4 code implementations • 6 Nov 2023 • Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang
We introduce CogVLM, a powerful open-source visual language foundation model.
Ranked #4 on
Visual Question Answering (VQA)
on InfiMM-Eval
1 code implementation • 4 Sep 2023 • Jiayan Teng, Wendi Zheng, Ming Ding, Wenyi Hong, Jianqiao Wangni, Zhuoyi Yang, Jie Tang
Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation.
Ranked #1 on
Image Generation
on CelebA-HQ 256x256
1 code implementation • 2 Feb 2023 • Haobo Yang, Shiyan Zhang, Zhuoyi Yang, Xinyu Zhang
With the increasing complexity of the traffic environment, the importance of safety perception in intelligent driving is growing.
Ranked #9 on
3D Object Detection
on KITTI Cyclists Moderate
1 code implementation • 30 Oct 2022 • Zhuoyi Yang, Ming Ding, Yanhui Guo, Qingsong Lv, Jie Tang
In this paper, we find that parameter-efficient tuning makes a good classification head, with which we can simply replace the randomly initialized heads for a stable performance gain.
9 code implementations • 5 Oct 2022 • Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, WenGuang Chen, Peng Zhang, Yuxiao Dong, Jie Tang
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters.
Ranked #1 on
Language Modelling
on CLUE (OCNLI_50K)
4 code implementations • NeurIPS 2021 • Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang
Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.
Ranked #53 on
Text-to-Image Generation
on MS COCO
(using extra training data)
no code implementations • 13 Jun 2019 • Xi Chen, Weidong Liu, Xiaojun Mao, Zhuoyi Yang
This paper studies distributed estimation and support recovery for high-dimensional linear regression model with heavy-tailed noise.
no code implementations • 29 Nov 2018 • Xiaozhou Wang, Zhuoyi Yang, Xi Chen, Weidong Liu
In this paper, we propose a multi-round distributed linear-type (MDL) estimator for conducting inference for linear SVM.