How Robust are Fact Checking Systems on Colloquial Claims?

NAACL 2021 · Byeongchang Kim, Hyunwoo Kim, Seokhee Hong, Gunhee Kim ·

Knowledge is now starting to power neural dialogue agents. At the same time, the risk of misinformation and disinformation from dialogue agents also rises. Verifying the veracity of information from formal sources are widely studied in computational fact checking. In this work, we ask: How robust are fact checking systems on claims in colloquial style? We aim to open up new discussions in the intersection of fact verification and dialogue safety. In order to investigate how fact checking systems behave on colloquial claims, we transfer the styles of claims from FEVER (Thorne et al., 2018) into colloquialism. We find that existing fact checking systems that perform well on claims in formal style significantly degenerate on colloquial claims with the same semantics. Especially, we show that document retrieval is the weakest spot in the system even vulnerable to filler words, such as {``}yeah{''} and {``}you know{''}. The document recall of WikiAPI retriever (Hanselowski et al., 2018) which is 90.0{\%} on FEVER, drops to 72.2{\%} on the colloquial claims. We compare the characteristics of colloquial claims to those of claims in formal style, and demonstrate the challenging issues in them.

PDF Abstract