Search Results for author: Enxin Song

Found 5 papers, 3 papers with code

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

1 code implementation10 Oct 2024 Jinbin Bai, Tian Ye, Wei Chow, Enxin Song, Qing-Guo Chen, Xiangtai Li, Zhen Dong, Lei Zhu, Shuicheng Yan

We present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL.

Feature Compression Image Generation

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark

no code implementations4 Oct 2024 Wenhao Chai, Enxin Song, Yilun Du, Chenlin Meng, Vashisht Madhavan, Omer Bar-Tal, Jeng-Neng Hwang, Saining Xie, Christopher D. Manning

AuroraCap shows superior performance on various video and image captioning benchmarks, for example, obtaining a CIDEr of 88. 9 on Flickr30k, beating GPT-4V (55. 3) and Gemini-1. 5 Pro (82. 2).

Image Captioning Video Understanding

MovieChat+: Question-aware Sparse Memory for Long Video Question Answering

1 code implementation26 Apr 2024 Enxin Song, Wenhao Chai, Tian Ye, Jenq-Neng Hwang, Xi Li, Gaoang Wang

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.

2k Question Answering +2

Devil in the Number: Towards Robust Multi-modality Data Filter

no code implementations24 Sep 2023 Yichen Xu, Zihan Xu, Wenhao Chai, Zhonghan Zhao, Enxin Song, Gaoang Wang

In order to appropriately filter multi-modality data sets on a web-scale, it becomes crucial to employ suitable filtering methods to boost performance and reduce training costs.

Cannot find the paper you are looking for? You can Submit a new open access paper.