no code implementations • 27 Nov 2024 • Siddhant Gupta, Fred Lu, Andrew Barlow, Edward Raff, Francis Ferraro, Cynthia Matuszek, Charles Nicholas, James Holt
A strategy used by malicious actors is to "live off the land," where benign systems and tools already available on a victim's systems are used and repurposed for the malicious actor's intent.
1 code implementation • 31 Oct 2024 • Skyler Wu, Fred Lu, Edward Raff, James Holt
While such algorithms enjoy low theoretical regret, in real-world deployment they can be sensitive to individual outliers that cause the algorithm to over-correct.
1 code implementation • 30 Oct 2024 • Mohammad Mahmudul Alam, Alexander Oberle, Edward Raff, Stella Biderman, Tim Oates, James Holt
Vector Symbolic Architectures (VSAs) are one approach to developing Neuro-symbolic AI, where two vectors in $\mathbb{R}^d$ are `bound' together to produce a new vector in the same space.
1 code implementation • 30 Oct 2024 • Rebecca Saul, Chang Liu, Noah Fleischmann, Richard Zak, Kristopher Micinski, Edward Raff, James Holt
Binary analysis is a core component of many critical security tasks, including reverse engineering, malware analysis, and vulnerability detection.
1 code implementation • 8 Jul 2024 • Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt
As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention.
no code implementations • 3 Jun 2024 • Fred Lu, Ryan R. Curtin, Edward Raff, Francis Ferraro, James Holt
While distributed training is often viewed as a solution to optimizing linear models on increasingly large datasets, inter-machine communication costs of popular distributed approaches can dominate as data dimensionality increases.
2 code implementations • 7 May 2024 • Chang Liu, Rebecca Saul, Yihao Sun, Edward Raff, Maya Fuchs, Townsend Southard Pantano, James Holt, Kristopher Micinski
Our results illustrate the practical need for robust corpora of high-quality Windows PE binaries in training modern learning-based binary analyses.
no code implementations • 23 Mar 2024 • Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt
Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges.
no code implementations • 25 Dec 2023 • Tirth Patel, Fred Lu, Edward Raff, Charles Nicholas, Cynthia Matuszek, James Holt
Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines, meaning a 0. 1\% change can cause an overwhelming number of false positives.
no code implementations • 25 Jul 2023 • Skyler Wu, Fred Lu, Edward Raff, James Holt
Convolutional layers have long served as the primary workhorse for image classification.
1 code implementation • 31 May 2023 • Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt
In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains.
no code implementations • 15 Jan 2023 • Fred Lu, Edward Raff, James Holt
Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets.
no code implementations • 5 Dec 2022 • Ethan M. Rudd, David Krisiloff, Scott Coull, Daniel Olszewski, Edward Raff, James Holt
In this paper, we explore the use of metric learning to embed Windows PE files in a low-dimensional vector space for downstream use in a variety of applications, including malware detection, family classification, and malware attribute tagging.
no code implementations • 23 Nov 2022 • Rebecca Saul, Mohammad Mahmudul Alam, John Hurwitz, Edward Raff, Tim Oates, James Holt
Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally expensive when applied to very long sequences.
1 code implementation • 13 Jun 2022 • Mohammad Mahmudul Alam, Edward Raff, Tim Oates, James Holt
Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party's compute environment or hardware is common.
no code implementations • 7 Jun 2022 • Michael D. Wong, Edward Raff, James Holt, Ravi Netravali
Data augmentation has been rare in the cyber security domain due to technical difficulties in altering data in a manner that is semantically consistent with the original data.
no code implementations • 28 Feb 2022 • James Holt, Edward Raff, Ahmad Ridley, Dennis Ross, Arunesh Sinha, Diane Staheli, William Streilen, Milind Tambe, Yevgeniy Vorobeychik, Allan Wollaber
These challenges are widely studied in enterprise networks, but there are many gaps in research and practice as well as novel problems in other domains.
no code implementations • 18 Feb 2022 • Andre T. Nguyen, Fred Lu, Gary Lopez Munoz, Edward Raff, Charles Nicholas, James Holt
We explore the utility of information contained within a dropout based Bayesian neural network (BNN) for the task of detecting out of distribution (OOD) data.
1 code implementation • NeurIPS 2021 • Ashwinkumar Ganesan, Hang Gao, Sunil Gandhi, Edward Raff, Tim Oates, James Holt, Mark McLean
HRRs today are not effective in a differentiable solution due to numerical instability, a problem we solve by introducing a projection step that forces the vectors to exist in a well behaved point in space.
no code implementations • 9 Aug 2021 • Andre T. Nguyen, Edward Raff, Charles Nicholas, James Holt
The detection of malware is a critical task for the protection of computing environments.
no code implementations • 22 Oct 2020 • Edward Raff, Bobby Filar, James Holt
We propose a strategy for fixing false positives in production after a model has already been deployed.
1 code implementation • 6 Sep 2020 • Edward Raff, Richard Zak, Gary Lopez Munoz, William Fleming, Hyrum S. Anderson, Bobby Filar, Charles Nicholas, James Holt
Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts.
no code implementations • 7 May 2019 • Aditya Pingle, Aritran Piplai, Sudip Mittal, Anupam Joshi, James Holt, Richard Zak
A cybersecurity knowledge graph can be paramount in aiding a security analyst to detect cyber threats because it stores a vast range of cyber threat information in the form of semantic triples which can be queried.