2 code implementations • 14 Dec 2020 • Richard Harang, Ethan M. Rudd
In this paper we describe the SOREL-20M (Sophos/ReversingLabs-20 Million) dataset: a large-scale dataset consisting of nearly 20 million files with pre-extracted features and metadata, high-quality labels derived from multiple sources, information about vendor detections of the malware samples at the time of collection, and additional ``tags'' related to each malware sample to serve as additional targets.
Cryptography and Security
1 code implementation • 13 Mar 2019 • Ethan M. Rudd, Felipe N. Ducau, Cody Wild, Konstantin Berlin, Richard Harang
In this work, we fit deep neural networks to multiple additional targets derived from metadata in a threat intelligence feed for Portable Executable (PE) malware and benignware, including a multi-source malicious/benign loss, a count loss on multi-source detections, and a semantic malware attribute tag loss.
no code implementations • 29 Oct 2018 • Richard Harang, Ethan M. Rudd
When the cost of misclassifying a sample is high, it is useful to have an accurate estimate of uncertainty in the prediction for that sample.
no code implementations • 13 Apr 2018 • Joshua Saxe, Richard Harang, Cody Wild, Hillary Sanders
Malicious web content is a serious problem on the Internet today.
no code implementations • 20 Jan 2017 • Edwin Dauber, Aylin Caliskan, Richard Harang, Gregory Shearer, Michael Weisman, Frederica Nelson, Rachel Greenstadt
We show that we can also use these calibration curves in the case that we do not have linking information and thus are forced to classify individual samples directly.
1 code implementation • 28 Apr 2016 • Nicolas Papernot, Patrick McDaniel, Ananthram Swami, Richard Harang
Machine learning models are frequently used to solve complex security problems, as well as to make decisions in sensitive situations like guiding autonomous vehicles or predicting financial market behaviors.
3 code implementations • 28 Dec 2015 • Aylin Caliskan, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, Arvind Narayanan
Many distinguishing features present in source code, e. g. variable names, are removed in the compilation process, and compiler optimization may alter the structure of a program, further obscuring features that are known to be useful in determining authorship.
Cryptography and Security