Search Results for author: Weichong Yin

Found 8 papers, 4 papers with code

ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation

no code implementations9 Nov 2022 Bin Shan, Yaqian Han, Weichong Yin, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models to non-English inputs and achieve impressive performance.

Contrastive Learning Language Modelling +4

ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training

1 code implementation30 Sep 2022 Bin Shan, Weichong Yin, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

They attempt to learn cross-modal representation using contrastive learning on image-text pairs, however, the built inter-modal correlations only rely on a single view for each modality.

Computational Efficiency Contrastive Learning +7

ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

no code implementations18 Sep 2022 Wenjin Wang, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin, Shikun Feng, Yu Sun, dianhai yu, Yin Zhang

At first, a document graph is proposed to model complex relationships among multi-grained multimodal elements, in which salient visual regions are detected by a cluster-based method.

Common Sense Reasoning document understanding +1

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

2 code implementations31 Dec 2021 Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

To explore the landscape of large-scale pre-training for bidirectional text-image generation, we train a 10-billion parameter ERNIE-ViLG model on a large-scale dataset of 145 million (Chinese) image-text pairs which achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7. 9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.

Image Captioning Quantization +2

Alpha at SemEval-2021 Task 6: Transformer Based Propaganda Classification

no code implementations SEMEVAL 2021 Zhida Feng, Jiji Tang, Jiaxiang Liu, Weichong Yin, Shikun Feng, Yu Sun, Li Chen

This paper describes our system participated in Task 6 of SemEval-2021: the task focuses on multimodal propaganda technique classification and it aims to classify given image and text into 22 classes.

Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.