WikiDiverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types

1 code implementation ACL 2022 Xuwu Wang, Junfeng Tian, Min Gui, Zhixu Li, Rui Wang, Ming Yan, Lihan Chen, Yanghua Xiao

In this paper, we present WikiDiverse, a high-quality human-annotated MEL dataset with diversified contextual topics and entity types from Wikinews, which uses Wikipedia as the corresponding knowledge base.

ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition

1 code implementation NAACL 2022 Xinyu Wang, Min Gui, Yong Jiang, Zixia Jia, Nguyen Bach, Tao Wang, Zhongqiang Huang, Fei Huang, Kewei Tu

As text representations take the most important role in MNER, in this paper, we propose {\bf I}mage-{\bf t}ext {\bf A}lignments (ITA) to align image features into the textual space, so that the attention mechanism in transformer-based pretrained textual embeddings can be better utilized.

Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training

no code implementations21 Aug 2021 Ming Yan, Haiyang Xu, Chenliang Li, Bin Bi, Junfeng Tian, Min Gui, Wei Wang

Existing approaches to vision-language pre-training (VLP) heavily rely on an object detector based on bounding boxes (regions), where salient objects are first detected from images and then a Transformer-based model is used for cross-modal fusion.

Attention Optimization for Abstractive Document Summarization

no code implementations IJCNLP 2019 Min Gui, Junfeng Tian, Rui Wang, Zhenglu Yang

Attention plays a key role in the improvement of sequence-to-sequence-based document summarization models.

