THUCNews (THU Chinese Text Classification)

The THUCNews Chinese text dataset is a large-scale Chinese text classification dataset. It contains approximately 840,000 news documents categorized into 14 classes. The dataset was generated by filtering historical data from the Sina News RSS feeds between 2005 and 2011. This dataset can be used for various tasks such as text classification and training word vectors.

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages