Search Results for author: Weichong Yin

Found 8 papers, 4 papers with code

ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation

no code implementations • 9 Nov 2022 • Bin Shan, Yaqian Han, Weichong Yin, Shuohuan Wang, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Recent cross-lingual cross-modal works attempt to extend Vision-Language Pre-training (VLP) models to non-English inputs and achieve impressive performance.

Ranked #1 on Multimodal Machine Translation on Multi30K

Contrastive Learning Language Modelling +4

Paper
Add Code

ERNIE-ViLG 2.0: Improving Text-to-Image Diffusion Model with Knowledge-Enhanced Mixture-of-Denoising-Experts

2 code implementations • CVPR 2023 • Zhida Feng, Zhenyu Zhang, Xintong Yu, Yewei Fang, Lanxin Li, Xuyi Chen, Yuxiang Lu, Jiaxiang Liu, Weichong Yin, Shikun Feng, Yu Sun, Li Chen, Hao Tian, Hua Wu, Haifeng Wang

Recent progress in diffusion models has revolutionized the popular technology of text-to-image generation.

Ranked #12 on Text-to-Image Generation on MS COCO

Denoising Text-to-Image Generation

11,418

Paper
Code

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding

2 code implementations • 12 Oct 2022 • Qiming Peng, Yinxu Pan, Wenjin Wang, Bin Luo, Zhenyu Zhang, Zhengjie Huang, Teng Hu, Weichong Yin, Yongfeng Chen, Yin Zhang, Shikun Feng, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding.

Ranked #2 on Semantic entity labeling on FUNSD

Document Image Classification document understanding +4

11,418

Paper
Code

ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training

1 code implementation • 30 Sep 2022 • Bin Shan, Weichong Yin, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

They attempt to learn cross-modal representation using contrastive learning on image-text pairs, however, the built inter-modal correlations only rely on a single view for each modality.

Ranked #1 on Image Retrieval on AIC-ICC

Computational Efficiency Contrastive Learning +7

6,195

Paper
Code

ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

no code implementations • 18 Sep 2022 • Wenjin Wang, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin, Shikun Feng, Yu Sun, dianhai yu, Yin Zhang

At first, a document graph is proposed to model complex relationships among multi-grained multimodal elements, in which salient visual regions are detected by a cluster-based method.

Common Sense Reasoning document understanding +1

Paper
Add Code

ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation

2 code implementations • 31 Dec 2021 • Han Zhang, Weichong Yin, Yewei Fang, Lanxin Li, Boqiang Duan, Zhihua Wu, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

To explore the landscape of large-scale pre-training for bidirectional text-image generation, we train a 10-billion parameter ERNIE-ViLG model on a large-scale dataset of 145 million (Chinese) image-text pairs which achieves state-of-the-art performance for both text-to-image and image-to-text tasks, obtaining an FID of 7. 9 on MS-COCO for text-to-image synthesis and best results on COCO-CN and AIC-ICC for image captioning.

Ranked #42 on Text-to-Image Generation on MS COCO

Image Captioning Quantization +2

11,418

Paper
Code

Alpha at SemEval-2021 Task 6: Transformer Based Propaganda Classification

no code implementations • SEMEVAL 2021 • Zhida Feng, Jiji Tang, Jiaxiang Liu, Weichong Yin, Shikun Feng, Yu Sun, Li Chen

This paper describes our system participated in Task 6 of SemEval-2021: the task focuses on multimodal propaganda technique classification and it aims to classify given image and text into 22 classes.

Classification

Paper
Add Code

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

no code implementations • 30 Jun 2020 • Fei Yu, Jiji Tang, Weichong Yin, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Thus, ERNIE-ViL can learn the joint representations characterizing the alignments of the detailed semantics across vision and language.

Ranked #2 on Visual Question Answering (VQA) on VCR (QA-R) test

Attribute Referring Expression Comprehension +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.