Gazeta is a dataset for automatic summarization of Russian news. The dataset consists of 63,435 text-summary pairs. To form training, validation, and test datasets, these pairs were sorted by time and the first 52,400 pairs are used as the training dataset, the proceeding 5,265 pairs as the validation dataset, and the remaining 5,770 pairs as the test dataset.

Source: https://github.com/IlyaGusev/gazeta

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages