CSAbstruct Dataset

Introduced by Cohan et al. in Pretrained Language Models for Sequential Sentence Classification

CSAbstruct is a new dataset of annotated computer science abstracts with sentence labels according to their rhetorical roles. The key difference between this dataset and PUBMED-RCT is that PubMed abstracts are written according to a predefined structure, whereas computer science papers are free-form. Therefore, there is more variety in writing styles in CSABSTRUCT. CSABSTRUCT is collected from the Semantic Scholar corpus (Ammar et al., 2018). Each sentence is annotated by 5 workers on the Figure-eight platform,6 with one of 5 categories {BACKGROUND, OBJECTIVE, METHOD, RESULT, OTHER}.

Source: Pretrained Language Models for Sequential Sentence Classification

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Semantic Scholar

License

Unknown

Modalities

Texts

Languages

English

CSAbstruct Dataset

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

CSPubSum

Semantic Scholar

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages