Search Results for author: Yufeng Cui

Found 8 papers, 7 papers with code

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

3 code implementations • ICLR 2022 • Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks.

Zero-Shot Learning

649

Paper
Code

Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision

1 code implementation • 11 Mar 2022 • Yufeng Cui, Lichen Zhao, Feng Liang, Yangguang Li, Jing Shao

This is because researchers do not choose consistent training recipes and even use different data, hampering the fair comparison between different methods.

602

Paper
Code

Multi-Modal Gait Recognition via Effective Spatial-Temporal Feature Fusion

no code implementations • CVPR 2023 • Yufeng Cui, Yimei Kang

The SFM performs fine-grained body parts spatial fusion and guides the alignment of each part of the silhouette and each joint of the skeleton through the attention mechanism.

Gait Recognition

Paper
Add Code

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline

1 code implementation • 29 Jan 2023 • Yangguang Li, Bin Huang, Zeren Chen, Yufeng Cui, Feng Liang, Mingzhu Shen, Fenggang Liu, Enze Xie, Lu Sheng, Wanli Ouyang, Jing Shao

Our Fast-BEV consists of five parts, We novelly propose (1) a lightweight deployment-friendly view transformation which fast transfers 2D image feature to 3D voxel space, (2) an multi-scale image encoder which leverages multi-scale information for better performance, (3) an efficient BEV encoder which is particularly designed to speed up on-vehicle inference.

Data Augmentation

525

Paper
Code

Generative Pretraining in Multimodality

2 code implementations • 11 Jul 2023 • Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context.

Ranked #1 on Visual Question Answering on VQA v2

Image Captioning Temporal/Casual QA +4

1,487

Paper
Code

CapsFusion: Rethinking Image-Text Data at Scale

1 code implementation • 31 Oct 2023 • Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Yue Cao, Xinlong Wang, Jingjing Liu

To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions.

World Knowledge

168

Paper
Code

Generative Multimodal Models are In-Context Learners

1 code implementation • 20 Dec 2023 • Quan Sun, Yufeng Cui, Xiaosong Zhang, Fan Zhang, Qiying Yu, Zhengxiong Luo, Yueze Wang, Yongming Rao, Jingjing Liu, Tiejun Huang, Xinlong Wang

The human ability to easily solve multimodal tasks in context (i. e., with only a few demonstrations or simple instructions), is what current multimodal systems have largely struggled to imitate.

Ranked #19 on Visual Question Answering on MM-Vet