We aim to leverage human and machine intelligence together for attention supervision.
The second phase, encoding structure, builds a graph of keyphrases and the given document to obtain the structure-aware representation of the augmented text.
We study event understanding as a critical step towards visual commonsense tasks. Meanwhile, we argue that current object-based event understanding is purely likelihood-based, leading to incorrect event prediction, due to biased correlation between events and objects. We propose to mitigate such biases with do-calculus, proposed in causality research, but overcoming its limited robustness, by an optimized aggregation with association-based prediction. We show the effectiveness of our approach, intrinsically by comparing our generated events with ground-truth event annotation, and extrinsically by downstream commonsense tasks.
We thus propose to additionally leverage references, which are selected from a large pool of texts labeled with one of the attributes, as textual information that enriches inductive biases of given attributes.
This paper studies the problem of non-factoid question answering, where the answer may span over multiple sentences.