One of the key challenges in learning joint embeddings of multiple modalities, e. g. of images and text, is to ensure coherent cross-modal semantics that generalize across datasets.
Common-sense argumentative reasoning is a challenging task that requires holistic understanding of the argumentation where external knowledge about the world is hypothesized to play a key role.
Current methods for knowledge graph (KG) representation learning focus solely on the structure of the KG and do not exploit any kind of external information, such as visual and linguistic information corresponding to the KG entities.
Our analysis shows that for the German data, textual representations are still competitive with multimodal ones.
The evaluation of summaries is a challenging but crucial task of the summarization field.
Automatic completion of frame-to-frame (F2F) relations in the FrameNet (FN) hierarchy has received little attention, although they incorporate meta-level commonsense knowledge and are used in downstream approaches.