COUNTER

Introduced by S. Muhammad et al. in COUNTER - corpus of Urdu news text reuse

The COUNTER (COrpus of Urdu News TExt Reuse) corpus contains 600 source-derived document pairs collected from the field of journalism. It can be used to evaluate mono-lingual text reuse detection systems in general and specifically for Urdu language.

The corpus has 600 source and 600 derived documents. It contains in total 275,387 words (tokens), 21,426 unique words and 10,841 sentences. It has been manually annotated at document level with three levels of reuse: wholly derived (135), partially derived (288) and non derived (177).

Source: COUNTER

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


License


  • Unknown

Modalities


Languages