TV-AfD: An Imperative-Annotated Corpus from The Big Bang Theory and Wikipedia's Articles for Deletion Discussions

LREC 2020  ·  Yimin Xiao, Zong-Ying Slaton, Lu Xiao ·

In this study, we created an imperative corpus with speech conversations from dialogues in The Big Bang Theory and with the written comments in Wikipedia{'}s Articles for Deletion discussions. For the TV show data, 59 episodes containing 25,076 statements are used. We manually annotated imperatives based on the annotation guideline adapted from Condoravdi and Lauer{'}s study (2012) and used the retrieved data to assess the performance of syntax-based classification rules. For the Wikipedia AfD comments data, we first developed and leveraged a syntax-based classifier to extract 10,624 statements that may be imperative, and we manually examined the statements and then identified true positives. With this corpus, we also examined the performance of the rule-based imperative detection tool. Our result shows different outcomes for speech (dialogue) and written data. The rule-based classification performs better in the written data in precision (0.80) compared to the speech data (0.44). Also, the rule-based classification has a low-performance overall for speech data with the precision of 0.44, recall of 0.41, and f-1 measure of 0.42. This finding implies the syntax-based model may need to be adjusted for a speech dataset because imperatives in oral communication have greater syntactic varieties and are highly context-dependent.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here