|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
In LXMERT, we build a large-scale Transformer model that consists of three encoders: an object relationship encoder, a language encoder, and a cross-modality encoder.
The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA task.
We present ViLBERT (short for Vision-and-Language BERT), a model for learning task-agnostic joint representations of image content and natural language.
Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural language processing.
#2 best model for Semantic Textual Similarity on STS Benchmark
This paper investigates a new task named Conversational Question Generation (CQG) which is to generate a question based on a passage and a conversation history (i. e., previous turns of question-answer pairs).
Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging.
The inherent difficulty of knowledge specification and the lack of trained specialists are some of the key obstacles on the way to making intelligent systems based on the knowledge representation and reasoning (KRR) paradigm commonplace.
We propose a spatio-temporal cache mechanism that enables learning spatial dimension of the input in addition to the hidden states corresponding to the temporal input sequence.