Multi-modal Named Entity Recognition
6 papers with code • 5 benchmarks • 0 datasets
Multi-modal named entity recognition aims at improving the accuracy of NER models through utilizing image information.
Most implemented papers
LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition
Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions.
Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer
To tackle the first issue, we propose a multimodal interaction module to obtain both image-aware word representations and word-aware visual representations.
RpBERT: A Text-image Relation Propagation-based BERT Model for Multimodal NER
We integrate soft or hard gates to select visual clues and propose a multitask algorithm to train on the MNER datasets.
ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition
As text representations take the most important role in MNER, in this paper, we propose {\bf I}mage-{\bf t}ext {\bf A}lignments (ITA) to align image features into the textual space, so that the attention mechanism in transformer-based pretrained textual embeddings can be better utilized.
Named Entity and Relation Extraction with Multi-Modal Retrieval
MoRe contains a text retrieval module and an image-based retrieval module, which retrieve related knowledge of the input text and image in the knowledge corpus respectively.
Prompting ChatGPT in MNER: Enhanced Multimodal Named Entity Recognition with Auxiliary Refined Knowledge
However, these methods either neglect the necessity of providing the model with external knowledge, or encounter issues of high redundancy in the retrieved knowledge.