NewSHead

Introduced by Gu et al. in Generating Representative Headlines for News Stories

The NewSHead dataset contains 369,940 English stories with 932,571 unique URLs, among which there are 359,940 stories for training, 5,000 for validation, and 5,000 for testing, respectively. Each news story contains at least three (and up to five) articles.

The dataset is collected from news stories published between May 2018 and May 2019, where a proprietary clustering algorithm iteratively loads articles published in a time window and groups them based on content similarity. Up to five representative articles are picked from the cluster for generating the story headline. Curators from a crowd-sourcing platform are requested to provide a headline of up to 35 characters to describe the major information covered by the story.

Source: NewSHead

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Information Threading	NewSHead	HINT

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

google-research-datasets/NewSHead

NewSHead

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

Multi-News

WCEP

WikiSum

Usage

License

Modalities

Languages

NewSHead

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

Multi-News

WCEP

WikiSum

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages