Authorship verification in the absence of explicit features and thresholds

1 Mar 2018 · Oren Halvani, Lukas Graner, Inna Vogel ·

Enhancing information retrieval systems with the ability to take the writing style of people into account opens the door for a number of applications. For example, one can link articles by authorships that can help identifying authors who generate hoaxes and deliberate misinformation in news stories, distributed across different platforms. Authorship verification (AV) is a technique that can be used for this purpose. AV deals with the task to judge, whether two or more documents stem from the same author. The majority of existing AV approaches relies on machine learning concepts based on explicitly defined stylistic features and complex models that involve a fair amount of parameters. Moreover, many existing AV methods are based on explicit thresholds (needed to accept or reject a stated authorship), which are determined on training corpora. We propose a novel parameter-free AV approach, which derives its thresholds for each verification case individually and enables AV in the absence of explicit features and training corpora. In an experimental setup based on eight evaluation corpora (each one from another language) we show that our approach yields competitive results against the current state of the art and other noteworthy AV baselines.

PDF