In this work, we propose to use scene graphs, that express images in terms of objects and their visual relations, and knowledge graphs as structured representations for meme classification with a Transformer-based architecture.
The extraction of structured clinical information from free-text radiology reports in the form of radiology graphs has been demonstrated to be a valuable approach for evaluating the clinical correctness of report-generation methods.
Simulating high-resolution detector responses is a storage-costly and computationally intensive process that has long been challenging in particle physics.
To study this question, we propose a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesize a new image.
We propose three novel components to model short-term and long-term dependency and temporal coherence.
Ranked #4 on Video Instance Segmentation on Youtube-VIS 2022 Validation (using extra training data)
4 code implementations • • Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, Karen Simonyan
Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research.
Ranked #1 on Action Recognition on RareAct
1 code implementation • 19 Mar 2022 • Suprosanna Shit, Rajat Koner, Bastian Wittmann, Johannes Paetzold, Ivan Ezhov, Hongwei Li, Jiazhen Pan, Sahand Sharifzadeh, Georgios Kaissis, Volker Tresp, Bjoern Menze
We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly.
Although memory appears to be about the past, its main purpose is to support the agent in the present and the future.
We show that by fine-tuning the classification pipeline with the extracted knowledge from texts, we can achieve ~8x more accurate results in scene graph classification, ~3x in object classification, and ~1. 5x in predicate classification, compared to the supervised baselines with only 1% of the annotated images.
A major challenge in scene graph classification is that the appearance of objects and relations can be significantly different from one image to another.
Recently, knowledge graph embeddings (KGEs) received significant attention, and several software libraries have been developed for training and evaluating KGEs.
Ranked #1 on Link Prediction on WN18 (training time (s) metric)
The heterogeneity in recently published knowledge graph embedding models' implementations, training, and evaluation has made fair and thorough comparisons difficult.
In particular, we propose that explicit perception and declarative memories require a semantic decoder, which, in a simple realization, is based on four layers: First, a sensory memory layer, as a buffer for sensory input, second, an index layer representing concepts, third, a memoryless representation layer for the broadcasting of information ---the "blackboard", or the "canvas" of the brain--- and fourth, a working memory layer as a processing center and data buffer.
We argue that depth maps can additionally provide valuable information on object relations, e. g. helping to detect not only spatial relations, such as standing behind, but also non-spatial relations, such as holding.
Ranked #1 on Relationship Detection on VRD
To this end, we present the first approach to unsupervised text generation from KGs and show simultaneously how it can be used for unsupervised semantic parsing.
Ranked #1 on Unsupervised KG-to-Text Generation on VG graph-text
We propose an inverse reinforcement learning (IRL) approach using Deep Q-Networks to extract the rewards in problems with large state spaces.