ACL ARC

ACL Anthology Reference Corpus (ACL ARC) is a collection of 10,920 academic papers from the ACL Anthology. ACL ARC is cleaned to remove:

  • files that look like not full papers, paper fragments, foreign-language papers (e.g., French), or pure junk.
  • headers (title and author information; NOT abstract).
  • footers ("References" line and the actual references).
  • some bad characters (spurious characters).
  • some page numbers (i.e., a single number appearing on a line, with nothing else attached to it).
  • significant foreign-language (e.g., French) content in an otherwise English paper.

The cleaned corpus has 10,628 documents.

Source: ACL ARC

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages