Search Results for author: Charles Nicholas

Found 25 papers, 8 papers with code

Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!

no code implementations25 Dec 2023 Tirth Patel, Fred Lu, Edward Raff, Charles Nicholas, Cynthia Matuszek, James Holt

Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines, meaning a 0. 1\% change can cause an overwhelming number of false positives.

Malware Detection

MalwareDNA: Simultaneous Classification of Malware, Malware Families, and Novel Malware

no code implementations4 Sep 2023 Maksim E. Eren, Manish Bhattarai, Kim Rasmussen, Boian S. Alexandrov, Charles Nicholas

Here we introduce and showcase preliminary capabilities of a new method that can perform precise identification of novel malware families, while also unifying the capability for malware/benign-ware classification and malware family classification into a single framework.

Classification

A Feature Set of Small Size for the PDF Malware Detection

no code implementations9 Aug 2023 Ran Liu, Charles Nicholas

Machine learning (ML)-based malware detection systems are becoming increasingly important as malware threats increase and get more sophisticated.

feature selection Malware Detection

Can Feature Engineering Help Quantum Machine Learning for Malware Detection?

no code implementations3 May 2023 Ran Liu, Maksim Eren, Charles Nicholas

With the increasing number and sophistication of malware attacks, malware detection systems based on machine learning (ML) grow in importance.

Feature Engineering feature selection +2

SeNMFk-SPLIT: Large Corpora Topic Modeling by Semantic Non-negative Matrix Factorization with Automatic Model Selection

no code implementations21 Aug 2022 Maksim E. Eren, Nick Solovyev, Manish Bhattarai, Kim Rasmussen, Charles Nicholas, Boian S. Alexandrov

As the amount of text data continues to grow, topic modeling is serving an important role in understanding the content hidden by the overwhelming quantity of documents.

Model Selection

FedSPLIT: One-Shot Federated Recommendation System Based on Non-negative Joint Matrix Factorization and Knowledge Distillation

no code implementations4 May 2022 Maksim E. Eren, Luke E. Richards, Manish Bhattarai, Roberto Yus, Charles Nicholas, Boian S. Alexandrov

Non-negative matrix factorization (NMF) with missing-value completion is a well-known effective Collaborative Filtering (CF) method used to provide personalized user recommendations.

Collaborative Filtering Federated Learning +2

Out of Distribution Data Detection Using Dropout Bayesian Neural Networks

no code implementations18 Feb 2022 Andre T. Nguyen, Fred Lu, Gary Lopez Munoz, Edward Raff, Charles Nicholas, James Holt

We explore the utility of information contained within a dropout based Bayesian neural network (BNN) for the task of detecting out of distribution (OOD) data.

Classification Image Classification +1

Rank-1 Similarity Matrix Decomposition For Modeling Changes in Antivirus Consensus Through Time

no code implementations28 Dec 2021 Robert J. Joyce, Edward Raff, Charles Nicholas

Although groups of strongly correlated antivirus engines are known to exist, at present there is limited understanding of how or why these correlations came to be.

MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels

1 code implementation29 Nov 2021 Robert J. Joyce, Dev Amlani, Charles Nicholas, Edward Raff

Malware family classification is a significant issue with public safety and research implications that has been hindered by the high cost of expert labels.

A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels

no code implementations23 Sep 2021 Robert J. Joyce, Edward Raff, Charles Nicholas

In some problem spaces, the high cost of obtaining ground truth labels necessitates use of lower quality reference datasets.

COVID-19 Multidimensional Kaggle Literature Organization

no code implementations17 Jul 2021 Maksim E. Eren, Nick Solovyev, Chris Hamer, Renee McDonald, Boian S. Alexandrov, Charles Nicholas

The unprecedented outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, continues to be a significant worldwide problem.

Tensor Decomposition

Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery

2 code implementations15 Jun 2021 John Boutsikas, Maksim E. Eren, Charles Varga, Edward Raff, Cynthia Matuszek, Charles Nicholas

The use of Machine Learning has become a significant part of malware detection efforts due to the influx of new malware, an ever changing threat landscape, and the ability of Machine Learning methods to discover meaningful distinctions between malicious and benign software.

BIG-bench Machine Learning Malware Analysis +1

COVID-19 Kaggle Literature Organization

1 code implementation4 Aug 2020 Maksim Ekin Eren, Nick Solovyev, Edward Raff, Charles Nicholas, Ben Johnson

The world has faced the devastating outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, in 2020.

A New Burrows Wheeler Transform Markov Distance

4 code implementations30 Dec 2019 Edward Raff, Charles Nicholas, Mark McLean

Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems.

Clustering Malware Classification

Static Malware Detection & Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus

no code implementations12 Jun 2018 William Fleshman, Edward Raff, Richard Zak, Mark McLean, Charles Nicholas

As machine-learning (ML) based systems for malware detection become more prevalent, it becomes necessary to quantify the benefits compared to the more traditional anti-virus (AV) systems widely used today.

BIG-bench Machine Learning Malware Detection

Engineering a Simplified 0-Bit Consistent Weighted Sampling

no code implementations30 Mar 2018 Edward Raff, Jared Sylvester, Charles Nicholas

The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification.

General Classification

Toward Metric Indexes for Incremental Insertion and Querying

no code implementations12 Jan 2018 Edward Raff, Charles Nicholas

In this work we explore the use of metric index structures, which accelerate nearest neighbor queries, in the scenario where we need to interleave insertions and queries during deployment.

Malware Analysis

Malware Detection by Eating a Whole EXE

7 code implementations25 Oct 2017 Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, Charles Nicholas

In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community.

Malware Detection

Learning the PE Header, Malware Detection with Minimal Domain Knowledge

2 code implementations5 Sep 2017 Edward Raff, Jared Sylvester, Charles Nicholas

Many efforts have been made to use various forms of domain knowledge in malware detection.

Malware Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.