Search Results for author: Quan Sun

Found 9 papers, 8 papers with code

Generative Multimodal Models are In-Context Learners

1 code implementation20 Dec 2023 Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.

In-Context Learning Question Answering +2

CapsFusion: Rethinking Image-Text Data at Scale

1 code implementation31 Oct 2023 Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu

To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions.

World Knowledge

Generative Pretraining in Multimodality

2 code implementations11 Jul 2023 Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context.

Image Captioning Temporal/Casual QA +4

EVA-CLIP: Improved Training Techniques for CLIP at Scale

3 code implementations27 Mar 2023 Quan Sun, Yuxin Fang, Ledell Wu, Xinlong Wang, Yue Cao

Our approach incorporates new techniques for representation learning, optimization, and augmentation, enabling EVA-CLIP to achieve superior performance compared to previous CLIP models with the same number of parameters but significantly smaller training costs.

Image Classification Representation Learning +2

EVA-02: A Visual Representation for Neon Genesis

6 code implementations20 Mar 2023 Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained to reconstruct strong and robust language-aligned vision features via masked image modeling.

Thermal Infrared Image Inpainting via Edge-Aware Guidance

no code implementations28 Oct 2022 Zeyu Wang, Haibin Shen, Changyou Men, Quan Sun, Kejie Huang

In this paper, we propose a novel task -- Thermal Infrared Image Inpainting, which aims to reconstruct missing regions of TIR images.

Image Inpainting

Cannot find the paper you are looking for? You can Submit a new open access paper.