The MAGE dataset provides a large set of generated texts using 27 LLMs from seven different groups: OpenAI GPT, LLaMA, GLM130B, FLAN-T5, OPT, BigScience, and EleutherAI. In total, the dataset contains 432,682 texts, along with two additional sets. The first is an additional test set with texts from unseen domains generated by an unseen model, namely GPT-4. The second set is designed to evaluate the robustness of detectors against paraphrasing attacks. To achieve this, the GPT-3.5-turbo model was employed to paraphrase the sentences from the first set, with all paraphrased texts treated as machine-generated.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages