Search Results for author: Zhuoyi Yang

Found 15 papers, 11 papers with code

MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models

no code implementations6 Jan 2025 Wenyi Hong, Yean Cheng, Zhuoyi Yang, Weihan Wang, Lefan Wang, Xiaotao Gu, Shiyu Huang, Yuxiao Dong, Jie Tang

To address this gap, we propose MotionBench, a comprehensive evaluation benchmark designed to assess the fine-grained motion comprehension of video understanding models.

Benchmarking Feature Compression +1

CogVLM2: Visual Language Models for Image and Video Understanding

3 code implementations29 Aug 2024 Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications.

MM-Vet MVBench +3

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

1 code implementation12 Aug 2024 Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang

We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels.

Text-to-Video Generation Video Alignment +2

Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer

1 code implementation7 May 2024 Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng, Yuxiao Dong, Ming Ding, Jie Tang

However, due to a quadratic increase in memory during generating ultra-high-resolution images (e. g. 4096*4096), the resolution of generated images is often limited to 1024*1024.

Image Generation Super-Resolution

Eloss in the way: A Sensitive Input Quality Metrics for Intelligent Driving

1 code implementation2 Feb 2023 Haobo Yang, Shiyan Zhang, Zhuoyi Yang, Xinyu Zhang

With the increasing complexity of the traffic environment, the importance of safety perception in intelligent driving is growing.

3D Object Detection Anomaly Detection

Parameter-Efficient Tuning Makes a Good Classification Head

1 code implementation30 Oct 2022 Zhuoyi Yang, Ming Ding, Yanhui Guo, Qingsong Lv, Jie Tang

In this paper, we find that parameter-efficient tuning makes a good classification head, with which we can simply replace the randomly initialized heads for a stable performance gain.

Classification Natural Language Understanding

CogView: Mastering Text-to-Image Generation via Transformers

4 code implementations NeurIPS 2021 Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding.

Ranked #53 on Text-to-Image Generation on MS COCO (using extra training data)

Super-Resolution Zero-Shot Text-to-Image Generation

Distributed High-dimensional Regression Under a Quantile Loss Function

no code implementations13 Jun 2019 Xi Chen, Weidong Liu, Xiaojun Mao, Zhuoyi Yang

This paper studies distributed estimation and support recovery for high-dimensional linear regression model with heavy-tailed noise.

quantile regression Vocal Bursts Intensity Prediction

Distributed Inference for Linear Support Vector Machine

no code implementations29 Nov 2018 Xiaozhou Wang, Zhuoyi Yang, Xi Chen, Weidong Liu

In this paper, we propose a multi-round distributed linear-type (MDL) estimator for conducting inference for linear SVM.

Binary Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.