Dataset used for the paper entitled "Towards a Fair Comparison and Realistic Evaluation Framework of Android Malware Detectors based on Static Analysis and Machine Learning".

Desciption

The dataset consist of 100 monthly samples of each class (malware, goodware and greyware) during the period starting from January 2012 to December 2019. We resorted the VTD values of apps for labeling. In particular, we used a VTD≥7 to label malware, VTD=0 for goodware and apps with a 1≤VTD≤6 rating were labeled as greyware. In total, our dataset consists of 28,800 app samples.

The directory "dataset" contains a file with the SHA hashes of the APKs that comprise each of the three classes (goodware, malware and greyware). All these APKs were originally downloaded from AndroZoo. To download the APKs in our dataset, you can use the AZ tool.

Authors and acknowledgment

If you use this dataset, please cite:

@article{molinacoronado2022towards,
    title = {Towards a Fair Comparison and Realistic Evaluation Framework of Android Malware Detectors based on Static Analysis and Machine Learning},
    author = {Borja Molina-Coronado and Usue Mori and Alexander Mendiburu and Jose Miguel-Alonso},
    journal = {Computers & Security},
    pages = {102996},
    year = {2022},
    issn = {0167-4048},
    doi = {https://doi.org/10.1016/j.cose.2022.102996},
    url = {https://www.sciencedirect.com/science/article/pii/S0167404822003881}
}

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. For more information check the link below:

http://creativecommons.org/licenses/by-nc/4.0/

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


Modalities


Languages