CommonCrawl News is a dataset containing news articles from news sites all over the world. The dataset is available in form of Web ARChive (WARC) files that are released on a daily basis.
Source: https://commoncrawl.org/2016/10/news-dataset-available/Paper | Code | Results | Date | Stars |
---|