AG’s Corpus (AG's corpus of news articlesNews)

Introduced by Gulli in AG's corpus of news articles

Antonio Gulli’s corpus of news articles is a collection of more than 1 million news articles. The articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non - commercial activity.

A subset of this corpus, AG News, consisting of the 4 largest classes is a popular topic classification dataset.

Source: AG's corpus of news articles


Paper Code Results Date Stars

Dataset Loaders

No data loaders found. You can submit your data loader here.


Similar Datasets


