Search Results for author: James Holt

Found 23 papers, 9 papers with code

Living off the Analyst: Harvesting Features from Yara Rules for Malware Detection

no code implementations27 Nov 2024 Siddhant Gupta, Fred Lu, Andrew Barlow, Edward Raff, Francis Ferraro, Cynthia Matuszek, Charles Nicholas, James Holt

A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent.

Malware Detection

Stabilizing Linear Passive-Aggressive Online Learning with Weighted Reservoir Sampling

1 code implementation31 Oct 2024 Skyler Wu, Fred Lu, Edward Raff, James Holt

While such algorithms enjoy low theoretical regret, in real-world deployment they can be sensitive to individual outliers that cause the algorithm to over-correct.

A Walsh Hadamard Derived Linear Vector Symbolic Architecture

1 code implementation30 Oct 2024 Mohammad Mahmudul Alam, Alexander Oberle, Edward Raff, Stella Biderman, Tim Oates, James Holt

Vector Symbolic Architectures (VSAs) are one approach to developing Neuro-symbolic AI, where two vectors in $\mathbb{R}^d$ are `bound' together to produce a new vector in the same space.

Computational Efficiency

Is Function Similarity Over-Engineered? Building a Benchmark

1 code implementation30 Oct 2024 Rebecca Saul, Chang Liu, Noah Fleischmann, Richard Zak, Kristopher Micinski, Edward Raff, James Holt

Binary analysis is a core component of many critical security tasks, including reverse engineering, malware analysis, and vulnerability detection.

Malware Analysis Vulnerability Detection

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates

1 code implementation8 Jul 2024 Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt

As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention.

Optimizing the Optimal Weighted Average: Efficient Distributed Sparse Classification

no code implementations3 Jun 2024 Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt

While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular distributed approaches can dominate as data dimensionality increases.

Assemblage: Automatic Binary Dataset Construction for Machine Learning

2 code implementations7 May 2024 Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs, Townsend Southard Pantano, James Holt, Kristopher Micinski

Our results illustrate the practical need for robust corpora of high-quality Windows PE binaries in training modern learning-based binary analyses.

Malware Classification

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

no code implementations23 Mar 2024 Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt

Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges.

Malware Detection

Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!

no code implementations25 Dec 2023 Tirth Patel, Fred Lu, Edward Raff, Charles Nicholas, Cynthia Matuszek, James Holt

Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines, meaning a 0. 1\% change can cause an overwhelming number of false positives.

Malware Detection

Exploring the Sharpened Cosine Similarity

no code implementations25 Jul 2023 Skyler Wu, Fred Lu, Edward Raff, James Holt

Convolutional layers have long served as the primary workhorse for image classification.

Adversarial Robustness Image Classification

Recasting Self-Attention with Holographic Reduced Representations

1 code implementation31 May 2023 Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains.

Malware Detection

A Coreset Learning Reality Check

no code implementations15 Jan 2023 Fred Lu, Edward Raff, James Holt

Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets.

regression

Efficient Malware Analysis Using Metric Embeddings

no code implementations5 Dec 2022 Ethan M. Rudd, David Krisiloff, Scott Coull, Daniel Olszewski, Edward Raff, James Holt

In this paper, we explore the use of metric learning to embed Windows PE files in a low-dimensional vector space for downstream use in a variety of applications, including malware detection, family classification, and malware attribute tagging.

Attribute Malware Analysis +2

Lempel-Ziv Networks

no code implementations23 Nov 2022 Rebecca Saul, Mohammad Mahmudul Alam, John Hurwitz, Edward Raff, Tim Oates, James Holt

Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally expensive when applied to very long sequences.

Malware Classification

Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations

1 code implementation13 Jun 2022 Mohammad Mahmudul Alam, Edward Raff, Tim Oates, James Holt

Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party's compute environment or hardware is common.

Marvolo: Programmatic Data Augmentation for Practical ML-Driven Malware Detection

no code implementations7 Jun 2022 Michael D. Wong, Edward Raff, James Holt, Ravi Netravali

Data augmentation has been rare in the cyber security domain due to technical difficulties in altering data in a manner that is semantically consistent with the original data.

Data Augmentation Malware Detection

Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop at AAAI 2022

no code implementations28 Feb 2022 James Holt, Edward Raff, Ahmad Ridley, Dennis Ross, Arunesh Sinha, Diane Staheli, William Streilen, Milind Tambe, Yevgeniy Vorobeychik, Allan Wollaber

These challenges are widely studied in enterprise networks, but there are many gaps in research and practice as well as novel problems in other domains.

Out of Distribution Data Detection Using Dropout Bayesian Neural Networks

no code implementations18 Feb 2022 Andre T. Nguyen, Fred Lu, Gary Lopez Munoz, Edward Raff, Charles Nicholas, James Holt

We explore the utility of information contained within a dropout based Bayesian neural network (BNN) for the task of detecting out of distribution (OOD) data.

Classification Image Classification +1

Learning with Holographic Reduced Representations

1 code implementation NeurIPS 2021 Ashwinkumar Ganesan, Hang Gao, Sunil Gandhi, Edward Raff, Tim Oates, James Holt, Mark McLean

HRRs today are not effective in a differentiable solution due to numerical instability, a problem we solve by introducing a projection step that forces the vectors to exist in a well behaved point in space.

Multi-Label Classification MUlTI-LABEL-ClASSIFICATION +1

Getting Passive Aggressive About False Positives: Patching Deployed Malware Detectors

no code implementations22 Oct 2020 Edward Raff, Bobby Filar, James Holt

We propose a strategy for fixing false positives in production after a model has already been deployed.

Malware Detection

RelExt: Relation Extraction using Deep Learning approaches for Cybersecurity Knowledge Graph Improvement

no code implementations7 May 2019 Aditya Pingle, Aritran Piplai, Sudip Mittal, Anupam Joshi, James Holt, Richard Zak

A cybersecurity knowledge graph can be paramount in aiding a security analyst to detect cyber threats because it stores a vast range of cyber threat information in the form of semantic triples which can be queried.

Relation Relation Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.