arXiv-10

Introduced by Farhangi et al. in Protoformer: Embedding Prototypes for Transformers

Benchmark dataset for abstracts and titles of 100,000 ArXiv scientific papers. This dataset contains 10 classes and is balanced (exactly 10,000 per class). The classes include subcategories of computer science, physics, and math.

• Direct link: Download

• Citation:

@inproceedings{farhangi2022protoformer,
  title={Protoformer: Embedding Prototypes for Transformers},
  author={Farhangi, Ashkan and Sui, Ning and Hua, Nan and Bai, Haiyan and Huang, Arthur and Guo, Zhishan},
  booktitle={Advances in Knowledge Discovery and Data Mining: 26th Pacific-Asia Conference, PAKDD 2022, Chengdu, China, May 16--19, 2022, Proceedings, Part I},
  pages={447--458},
  year={2022}
}

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Open Source

Modalities


Languages