no code implementations • 24 Mar 2024 • Ryan Barron, Maksim E. Eren, Manish Bhattarai, Selma Wanna, Nicholas Solovyev, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas, Cynthia Matuszek
One of the challenges in constructing a KG from scientific literature is the extraction of ontology from unstructured text.
no code implementations • 25 Dec 2023 • Tirth Patel, Fred Lu, Edward Raff, Charles Nicholas, Cynthia Matuszek, James Holt
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines, meaning a 0. 1\% change can cause an overwhelming number of false positives.
no code implementations • 4 Sep 2023 • Maksim E. Eren, Manish Bhattarai, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas
Here we introduce and showcase preliminary capabilities of a new method that can perform precise identification of novel malware families, while also unifying the capability for malware/benign-ware classification and malware family classification into a single framework.
no code implementations • 9 Aug 2023 • Ran Liu, Charles Nicholas
Machine learning (ML)-based malware detection systems are becoming increasingly important as malware threats increase and get more sophisticated.
no code implementations • 9 Jun 2023 • Robert J. Joyce, Tirth Patel, Charles Nicholas, Edward Raff
Our work explores the potential of antivirus (AV) scan data as a scalable source of features for malware.
no code implementations • 3 May 2023 • Ran Liu, Maksim Eren, Charles Nicholas
With the increasing number and sophistication of malware attacks, malware detection systems based on machine learning (ML) grow in importance.
no code implementations • 21 Aug 2022 • Maksim E. Eren, Nick Solovyev, Manish Bhattarai, Kim Rasmussen, Charles Nicholas, Boian S. Alexandrov
As the amount of text data continues to grow, topic modeling is serving an important role in understanding the content hidden by the overwhelming quantity of documents.
no code implementations • 4 May 2022 • Maksim E. Eren, Luke E. Richards, Manish Bhattarai, Roberto Yus, Charles Nicholas, Boian S. Alexandrov
Non-negative matrix factorization (NMF) with missing-value completion is a well-known effective Collaborative Filtering (CF) method used to provide personalized user recommendations.
no code implementations • 18 Feb 2022 • Andre T. Nguyen, Fred Lu, Gary Lopez Munoz, Edward Raff, Charles Nicholas, James Holt
We explore the utility of information contained within a dropout based Bayesian neural network (BNN) for the task of detecting out of distribution (OOD) data.
no code implementations • 28 Dec 2021 • Robert J. Joyce, Edward Raff, Charles Nicholas
Although groups of strongly correlated antivirus engines are known to exist, at present there is limited understanding of how or why these correlations came to be.
1 code implementation • 29 Nov 2021 • Robert J. Joyce, Dev Amlani, Charles Nicholas, Edward Raff
Malware family classification is a significant issue with public safety and research implications that has been hindered by the high cost of expert labels.
no code implementations • 23 Sep 2021 • Robert J. Joyce, Edward Raff, Charles Nicholas
In some problem spaces, the high cost of obtaining ground truth labels necessitates use of lower quality reference datasets.
no code implementations • 9 Aug 2021 • Andre T. Nguyen, Edward Raff, Charles Nicholas, James Holt
The detection of malware is a critical task for the protection of computing environments.
no code implementations • 17 Jul 2021 • Maksim E. Eren, Nick Solovyev, Chris Hamer, Renee McDonald, Boian S. Alexandrov, Charles Nicholas
The unprecedented outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, continues to be a significant worldwide problem.
2 code implementations • 15 Jun 2021 • John Boutsikas, Maksim E. Eren, Charles Varga, Edward Raff, Cynthia Matuszek, Charles Nicholas
The use of Machine Learning has become a significant part of malware detection efforts due to the influx of new malware, an ever changing threat landscape, and the ability of Machine Learning methods to discover meaningful distinctions between malicious and benign software.
1 code implementation • 6 Sep 2020 • Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, James Holt
Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts.
1 code implementation • 4 Aug 2020 • Maksim Ekin Eren, Nick Solovyev, Edward Raff, Charles Nicholas, Ben Johnson
The world has faced the devastating outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, in 2020.
no code implementations • 15 Jun 2020 • Edward Raff, Charles Nicholas
Malware classification is a difficult problem, to which machine learning methods have been applied for decades.
4 code implementations • 30 Dec 2019 • Edward Raff, Charles Nicholas, Mark McLean
Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems.
1 code implementation • 1 Aug 2019 • Edward Raff, William Fleming, Richard Zak, Hyrum Anderson, Bill Finlayson, Charles Nicholas, Mark McLean
N-grams have been a common tool for information retrieval and machine learning applications for decades.
no code implementations • 12 Jun 2018 • William Fleshman, Edward Raff, Richard Zak, Mark McLean, Charles Nicholas
As machine-learning (ML) based systems for malware detection become more prevalent, it becomes necessary to quantify the benefits compared to the more traditional anti-virus (AV) systems widely used today.
no code implementations • 30 Mar 2018 • Edward Raff, Jared Sylvester, Charles Nicholas
The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification.
no code implementations • 12 Jan 2018 • Edward Raff, Charles Nicholas
In this work we explore the use of metric index structures, which accelerate nearest neighbor queries, in the scenario where we need to interleave insertions and queries during deployment.
7 code implementations • 25 Oct 2017 • Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, Charles Nicholas
In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community.
2 code implementations • 5 Sep 2017 • Edward Raff, Jared Sylvester, Charles Nicholas
Many efforts have been made to use various forms of domain knowledge in malware detection.