Determining whether two documents were composed by the same author, also known as authorship verification, has traditionally been tackled using statistical methods.
We consider the task of linking social media accounts that belong to the same author in an automated fashion on the basis of the content and metadata of their corresponding document streams.
However, extending these methods to structured prediction is not always straightforward or effective; furthermore, a held-out calibration set may not always be available.
However, a straightforward implementation of this simple idea does not always work in practice: naive training of NER models using annotated data drawn from multiple languages consistently underperforms models trained on monolingual data alone, despite having access to more training data.
The evolution of social media users' behavior over time complicates user-level comparison tasks such as verification, classification, clustering, and ranking.
While recurrent neural networks (RNNs) are widely used for text classification, they demonstrate poor performance and slow convergence when trained on long sequences.
Practically, this means that we may treat the lexical resources as observations under the proposed generative model.