We first obtain the initial set of event pairs that are likely to have the subevent relation, by exploiting two observations that 1) subevents are temporally contained by the parent event, and 2) the definitions of the parent event can be used to further guide the identification of subevents.
Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (MMC-Benchmark), a comprehensive human-annotated benchmark with 9 distinct tasks evaluating reasoning capabilities over charts.
1 code implementation • 9 Nov 2023 • Shuyi Xie, Wenlin Yao, Yong Dai, Shaobo Wang, Donlin Zhou, Lifeng Jin, Xinhua Feng, Pengzhi Wei, Yujie Lin, Zhichao Hu, Dong Yu, Zhengyou Zhang, Jing Nie, Yuhong Liu
We construct a hierarchical task tree encompassing 7 major areas covering over 200 categories and over 800 tasks, which covers diverse capabilities such as question answering, reasoning, multiturn dialogue, and text generation, to evaluate LLMs in a comprehensive and in-depth manner.
Large Language Models (LLMs) have achieved remarkable success, demonstrating powerful instruction-following capabilities across diverse tasks.
Our approach uniquely considers the various annotation formats as different "views" and leverages them in training the model.
Specifically, the detection technique achieves a recall of ~88% and the mitigation technique successfully mitigates 57. 6% of the correctly detected hallucinations.
Recommender systems play a crucial role in helping users discover information that aligns with their interests based on their past behaviors.
It is unclear whether the compositional semantics of sentences can be directly reflected as compositional operations in the embedding space.
However, there has been limited research on the zero-shot KBC settings, where we need to deal with unseen entities and relations that emerge in a constantly growing knowledge base.
To encourage research in this direction, we propose NarraSum, a large-scale narrative summarization dataset.
We argue that using the static embedding of the event type name might not be enough because a single word could be ambiguous, and we need a sentence to define the type semantics accurately.
In this paper, we develop a novel semi-parametric language model architecture, Knowledge-in-Context (KiC), which empowers a parametric text-to-text language model with a knowledge-rich external memory.
Recent literature adds extractive summaries as guidance for abstractive summarization models to provide hints of salient content and achieves better performance.
Ranked #7 on Abstractive Text Summarization on CNN / Daily Mail
Large-scale pretrained language models have made significant advances in solving downstream language understanding tasks.
Ranked #2 on Visual Commonsense Tests on ViComTe-color
Comprehending a dialogue requires a model to capture diverse kinds of key information in the utterances, which are either scattered around or implicitly implied in different turns of conversations.
And with our pretrained reader, the entire system improves by up to 4% in exact match.
We then train a model to identify semantic equivalence between a target word in context and one of its glosses using these aligned inventories, which exhibits strong transfer capability to many WSD tasks.
People increasingly use social media to report emergencies, seek help or share information during disasters, which makes social networks an important tool for disaster management.
Inspired by the double temporality characteristic of narrative texts, we propose a novel approach for acquiring rich temporal "before/after" event knowledge across sentences in narrative stories.
Focusing on the task of identifying event temporal status, we find that events directly or indirectly governing the target event in a dependency tree are most important contexts.
Capabilities of detecting temporal relations between two events can benefit many applications.