This data is for the task of named entity recognition and linking/disambiguation over tweets. It comprises the addition of an entity URI layer on top of an NER-annotated tweet dataset. The task is to detect entities and then provide a correct link to them in DBpedia, thus disambiguating otherwise ambiguous entity surface forms; for example, this means linking "Paris" to the correct instance of a city named that (e.g. Paris, France vs. Paris, Texas).
The data concentrates on ten types of named entities: company, facility, geographic location, movie, musical artist, person, product, sports team, TV show, and other.
The file is tab separated, in CoNLL format, with line breaks between tweets. Data preserves the tokenisation used in the Ritter datasets. PoS labels are not present for all tweets, but where they could be found in the Ritter data, they're given. In cases where a URI could not be agreed, or was not present in DBpedia, there is a NIL. See the paper for a full description of the methodology.