no code implementations • 23 Mar 2017 • Fabian Flöck, Kenan Erdogan, Maribel Acosta
We present a dataset that contains every instance of all tokens (~ words) ever written in undeleted, non-redirect English Wikipedia articles until October 2016, in total 13, 545, 349, 787 instances.
no code implementations • 15 May 2019 • Zijian Wang, Scott A. Hale, David Adelani, Przemyslaw A. Grabowicz, Timo Hartmann, Fabian Flöck, David Jurgens
In a large experiment over multilingual heterogeneous European regions, we show that our demographic inference and bias correction together allow for more accurate estimates of populations and make a significant step towards representative social sensing in downstream applications with multilingual social media.
no code implementations • SemEval (NAACL) 2022 • Xi Chen, Ali Zeynali, Chico Camargo, Fabian Flöck, Devin Gaffney, Przemyslaw Grabowicz, Scott Hale, David Jurgens, Mattia Samory
Thousands of new news articles appear daily in outlets in different languages.