Search Results for author: Edward Raff

Found 82 papers, 31 papers with code

Lempel-Ziv Jaccard Distance, an Effective Alternative to Ssdeep and Sdhash

5 code implementations Digital Investigation 2018 Edward Raff, Charles K. Nicholas

Recent work has proposed the Lempel-Ziv Jaccard Distance (LZJD) as a method to measure the similarity between binary byte sequences for malware classification.

Cryptography and Security

Learning the PE Header, Malware Detection with Minimal Domain Knowledge

2 code implementations5 Sep 2017 Edward Raff, Jared Sylvester, Charles Nicholas

Many efforts have been made to use various forms of domain knowledge in malware detection.

Malware Detection

Malware Detection by Eating a Whole EXE

7 code implementations25 Oct 2017 Edward Raff, Jon Barker, Jared Sylvester, Robert Brandon, Bryan Catanzaro, Charles Nicholas

In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community.

Malware Detection

Fair Forests: Regularized Tree Induction to Minimize Model Bias

no code implementations21 Dec 2017 Edward Raff, Jared Sylvester, Steven Mills

The potential lack of fairness in the outputs of machine learning algorithms has recently gained attention both within the research community as well as in society more broadly.

Attribute Fairness

Toward Metric Indexes for Incremental Insertion and Querying

no code implementations12 Jan 2018 Edward Raff, Charles Nicholas

In this work we explore the use of metric index structures, which accelerate nearest neighbor queries, in the scenario where we need to interleave insertions and queries during deployment.

Malware Analysis

Engineering a Simplified 0-Bit Consistent Weighted Sampling

no code implementations30 Mar 2018 Edward Raff, Jared Sylvester, Charles Nicholas

The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification.

General Classification

Static Malware Detection & Subterfuge: Quantifying the Robustness of Machine Learning and Current Anti-Virus

no code implementations12 Jun 2018 William Fleshman, Edward Raff, Richard Zak, Mark McLean, Charles Nicholas

As machine-learning (ML) based systems for malware detection become more prevalent, it becomes necessary to quantify the benefits compared to the more traditional anti-virus (AV) systems widely used today.

BIG-bench Machine Learning Malware Detection

What About Applied Fairness?

no code implementations13 Jun 2018 Jared Sylvester, Edward Raff

Machine learning practitioners are often ambivalent about the ethical aspects of their products.

Fairness Position

Non-Negative Networks Against Adversarial Attacks

1 code implementation15 Jun 2018 William Fleshman, Edward Raff, Jared Sylvester, Steven Forsyth, Mark McLean

Adversarial attacks against neural networks are a problem of considerable importance, for which effective defenses are not yet readily available.

Binary Classification Classification +3

Gradient Reversal Against Discrimination

no code implementations1 Jul 2018 Edward Raff, Jared Sylvester

No methods currently exist for making arbitrary neural networks fair.

Attribute Fairness

Growing and Retaining AI Talent for the United States Government

no code implementations27 Sep 2018 Edward Raff

Artificial Intelligence and Machine Learning have become transformative to a number of industries, and as such many industries need for AI talent is increasing the demand for individuals with these skills.

BIG-bench Machine Learning Position

Adversarial Attacks, Regression, and Numerical Stability Regularization

no code implementations7 Dec 2018 Andre T. Nguyen, Edward Raff

Adversarial attacks against neural networks in a regression setting are a critical yet understudied problem.

regression

Barrage of Random Transforms for Adversarially Robust Defense

no code implementations CVPR 2019 Edward Raff, Jared Sylvester, Steven Forsyth, Mark McLean

Defenses against adversarial examples, when using the ImageNet dataset, are historically easy to defeat.

Connecting Lyapunov Control Theory to Adversarial Attacks

no code implementations17 Jul 2019 Arash Rahnama, Andre T. Nguyen, Edward Raff

Significant work is being done to develop the math and tools necessary to build provable defenses, or at least bounds, against adversarial attacks of neural networks.

Math

Heterogeneous Relational Kernel Learning

no code implementations24 Aug 2019 Andre T. Nguyen, Edward Raff

Recent work has developed Bayesian methods for the automatic statistical analysis and description of single time series as well as of homogeneous sets of time series data.

Anomaly Detection Clustering +2

Would a File by Any Other Name Seem as Malicious?

no code implementations10 Oct 2019 Andre T. Nguyen, Edward Raff, Aaron Sant-Miller

Successful malware attacks on information technology systems can cause millions of dollars in damage, the exposure of sensitive and private information, and the irreversible destruction of data.

Malware Detection

Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory

1 code implementation CVPR 2020 Arash Rahnama, Andre T. Nguyen, Edward Raff

We treat each individual layer of the DNN as a nonlinear dynamical system and use Lyapunov theory to prove stability and robustness locally.

Robust Design

A New Burrows Wheeler Transform Markov Distance

4 code implementations30 Dec 2019 Edward Raff, Charles Nicholas, Mark McLean

Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems.

Clustering Malware Classification

Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs

2 code implementations6 May 2020 Catherine Ordun, Sanjay Purushotham, Edward Raff

As the time to retweet increases, the density of connections also increase where in our sample, we found distinct users dominating the attention of Covid19 retweeters.

Clustering Descriptive

Presentation and Analysis of a Multimodal Dataset for Grounded Language Learning

no code implementations29 Jul 2020 Patrick Jenkins, Rishabh Sachdeva, Gaoussou Youssouf Kebe, Padraig Higgins, Kasra Darvish, Edward Raff, Don Engel, John Winder, Francis Ferraro, Cynthia Matuszek

Grounded language acquisition -- learning how language-based interactions refer to the world around them -- is amajor area of research in robotics, NLP, and HCI.

Grounded language learning

Bringing UMAP Closer to the Speed of Light with GPU Acceleration

1 code implementation1 Aug 2020 Corey J. Nolet, Victor Lafargue, Edward Raff, Thejaswi Nanditale, Tim Oates, John Zedlewski, Joshua Patterson

The Uniform Manifold Approximation and Projection (UMAP) algorithm has become widely popular for its ease of use, quality of results, and support for exploratory, unsupervised, supervised, and semi-supervised learning.

COVID-19 Kaggle Literature Organization

1 code implementation4 Aug 2020 Maksim Ekin Eren, Nick Solovyev, Edward Raff, Charles Nicholas, Ben Johnson

The world has faced the devastating outbreak of Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), or COVID-19, in 2020.

Practical Cross-modal Manifold Alignment for Grounded Language

no code implementations1 Sep 2020 Andre T. Nguyen, Luke E. Richards, Gaoussou Youssouf Kebe, Edward Raff, Kasra Darvish, Frank Ferraro, Cynthia Matuszek

We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items.

Grounded language learning

The Use of AI for Thermal Emotion Recognition: A Review of Problems and Limitations in Standard Design and Data

no code implementations22 Sep 2020 Catherine Ordun, Edward Raff, Sanjay Purushotham

But we also propose that thermal imagery may provide a semi-anonymous modality for computer vision, over RGB, which has been plagued by misuse in facial recognition.

BIG-bench Machine Learning Facial Emotion Recognition

Getting Passive Aggressive About False Positives: Patching Deployed Malware Detectors

no code implementations22 Oct 2020 Edward Raff, Bobby Filar, James Holt

We propose a strategy for fixing false positives in production after a model has already been deployed.

Malware Detection

Sampling Approach Matters: Active Learning for Robotic Language Acquisition

no code implementations16 Nov 2020 Nisha Pillai, Edward Raff, Francis Ferraro, Cynthia Matuszek

Ordering the selection of training data using active learning can lead to improvements in learning efficiently from smaller corpora.

Active Learning feature selection +1

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

1 code implementation17 Dec 2020 Edward Raff, William Fleshman, Richard Zak, Hyrum S. Anderson, Bobby Filar, Mark McLean

Recent works within machine learning have been tackling inputs of ever-increasing size, with cybersecurity presenting sequence classification problems of particularly extreme lengths.

Malware Detection Time Series +1

Research Reproducibility as a Survival Analysis

1 code implementation17 Dec 2020 Edward Raff

There has been increasing concern within the machine learning community that we are in a reproducibility crisis.

Survival Analysis

Accounting for Variance in Machine Learning Benchmarks

no code implementations1 Mar 2021 Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices.

Benchmarking BIG-bench Machine Learning +1

GPU Semiring Primitives for Sparse Neighborhood Methods

2 code implementations13 Apr 2021 Corey J. Nolet, Divye Gala, Edward Raff, Joe Eaton, Brad Rees, John Zedlewski, Tim Oates

High-performance primitives for mathematical operations on sparse vectors must deal with the challenges of skewed degree distributions and limits on memory consumption that are typically not issues in dense operations.

BIG-bench Machine Learning Information Retrieval +1

Exact Acceleration of K-Means++ and K-Means$\|$

no code implementations6 May 2021 Edward Raff

K-Means++ and its distributed variant K-Means$\|$ have become de facto tools for selecting the initial seeds of K-means.

Generating Thermal Human Faces for Physiological Assessment Using Thermal Sensor Auxiliary Labels

1 code implementation15 Jun 2021 Catherine Ordun, Edward Raff, Sanjay Purushotham

These combined data are captured from similar sensors in order to bootstrap the training and transfer learning task, especially valuable because visible-thermal face datasets are limited.

SSIM Transfer Learning +1

Evading Malware Classifiers via Monte Carlo Mutant Feature Discovery

2 code implementations15 Jun 2021 John Boutsikas, Maksim E. Eren, Charles Varga, Edward Raff, Cynthia Matuszek, Charles Nicholas

The use of Machine Learning has become a significant part of malware detection efforts due to the influx of new malware, an ever changing threat landscape, and the ability of Machine Learning methods to discover meaningful distinctions between malicious and benign software.

BIG-bench Machine Learning Malware Analysis +1

Learning with Holographic Reduced Representations

1 code implementation NeurIPS 2021 Ashwinkumar Ganesan, Hang Gao, Sunil Gandhi, Edward Raff, Tim Oates, James Holt, Mark McLean

HRRs today are not effective in a differentiable solution due to numerical instability, a problem we solve by introducing a projection step that forces the vectors to exist in a well behaved point in space.

Multi-Label Classification Retrieval

Adversarial Transfer Attacks With Unknown Data and Class Overlap

no code implementations23 Sep 2021 Luke E. Richards, André Nguyen, Ryan Capps, Steven Forsythe, Cynthia Matuszek, Edward Raff

In this work we note that as studied, current transfer attack research has an unrealistic advantage for the attacker: the attacker has the exact same training data as the victim.

A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels

no code implementations23 Sep 2021 Robert J. Joyce, Edward Raff, Charles Nicholas

In some problem spaces, the high cost of obtaining ground truth labels necessitates use of lower quality reference datasets.

MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels

1 code implementation29 Nov 2021 Robert J. Joyce, Dev Amlani, Charles Nicholas, Edward Raff

Malware family classification is a significant issue with public safety and research implications that has been hindered by the high cost of expert labels.

Rank-1 Similarity Matrix Decomposition For Modeling Changes in Antivirus Consensus Through Time

no code implementations28 Dec 2021 Robert J. Joyce, Edward Raff, Charles Nicholas

Although groups of strongly correlated antivirus engines are known to exist, at present there is limited understanding of how or why these correlations came to be.

Fooling MOSS Detection with Pretrained Language Models

no code implementations19 Jan 2022 Stella Biderman, Edward Raff

In this paper we explore whether transformers can be used to solve introductory level programming assignments while bypassing commonly used AI tools to detect similarities between pieces of software.

Continuously Generalized Ordinal Regression for Linear and Deep Models

no code implementations14 Feb 2022 Fred Lu, Francis Ferraro, Edward Raff

Our method, which we term continuously generalized ordinal logistic, significantly outperforms the standard ordinal logistic model over a thorough set of ordinal regression benchmark datasets.

Inductive Bias regression

Out of Distribution Data Detection Using Dropout Bayesian Neural Networks

no code implementations18 Feb 2022 Andre T. Nguyen, Fred Lu, Gary Lopez Munoz, Edward Raff, Charles Nicholas, James Holt

We explore the utility of information contained within a dropout based Bayesian neural network (BNN) for the task of detecting out of distribution (OOD) data.

Classification Image Classification +1

Proceedings of the Artificial Intelligence for Cyber Security (AICS) Workshop at AAAI 2022

no code implementations28 Feb 2022 James Holt, Edward Raff, Ahmad Ridley, Dennis Ross, Arunesh Sinha, Diane Staheli, William Streilen, Milind Tambe, Yevgeniy Vorobeychik, Allan Wollaber

These challenges are widely studied in enterprise networks, but there are many gaps in research and practice as well as novel problems in other domains.

Does the Market of Citations Reward Reproducible Work?

1 code implementation8 Apr 2022 Edward Raff

Yet to the best of our knowledge, only one work has attempted to look at this combined space, concluding that non-reproducible work is more highly cited.

A Siren Song of Open Source Reproducibility

no code implementations9 Apr 2022 Edward Raff, Andrew L. Farris

Our argument is that this focus on code for replication is misguided if we want to improve the state of reproducible research.

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance

1 code implementation18 Apr 2022 Katherine Crowson, Stella Biderman, Daniel Kornis, Dashiell Stander, Eric Hallahan, Louis Castricato, Edward Raff

Generating and editing images from open domain text prompts is a challenging task that heretofore has required expensive and specially trained models.

Image Generation

Marvolo: Programmatic Data Augmentation for Practical ML-Driven Malware Detection

no code implementations7 Jun 2022 Michael D. Wong, Edward Raff, James Holt, Ravi Netravali

Data augmentation has been rare in the cyber security domain due to technical difficulties in altering data in a manner that is semantically consistent with the original data.

Data Augmentation Malware Detection

Neural Bregman Divergences for Distance Learning

no code implementations9 Jun 2022 Fred Lu, Edward Raff, Francis Ferraro

Many metric learning tasks, such as triplet learning, nearest neighbor retrieval, and visualization, are treated primarily as embedding tasks where the ultimate metric is some variant of the Euclidean distance (e. g., cosine or Mahalanobis), and the algorithm must learn to embed points into the pre-chosen space.

Metric Learning Retrieval

Deploying Convolutional Networks on Untrusted Platforms Using 2D Holographic Reduced Representations

1 code implementation13 Jun 2022 Mohammad Mahmudul Alam, Edward Raff, Tim Oates, James Holt

Due to the computational cost of running inference for a neural network, the need to deploy the inferential steps on a third party's compute environment or hardware is common.

Improving Out-of-Distribution Detection via Epistemic Uncertainty Adversarial Training

no code implementations5 Sep 2022 Derek Everett, Andre T. Nguyen, Luke E. Richards, Edward Raff

The quantification of uncertainty is important for the adoption of machine learning, especially to reject out-of-distribution (OOD) data back to human experts for review.

Computational Efficiency Out-of-Distribution Detection +1

A General Framework for Auditing Differentially Private Machine Learning

no code implementations16 Oct 2022 Fred Lu, Joseph Munoz, Maya Fuchs, Tyler LeBlond, Elliott Zaresky-Williams, Edward Raff, Francis Ferraro, Brian Testa

We present a framework to statistically audit the privacy guarantee conferred by a differentially private machine learner in practice.

Lempel-Ziv Networks

no code implementations23 Nov 2022 Rebecca Saul, Mohammad Mahmudul Alam, John Hurwitz, Edward Raff, Tim Oates, James Holt

Recurrent neural nets have been successful in processing sequences for a number of tasks; however, they are known to be both ineffective and computationally expensive when applied to very long sequences.

Malware Classification

Efficient Malware Analysis Using Metric Embeddings

no code implementations5 Dec 2022 Ethan M. Rudd, David Krisiloff, Scott Coull, Daniel Olszewski, Edward Raff, James Holt

In this paper, we explore the use of metric learning to embed Windows PE files in a low-dimensional vector space for downstream use in a variety of applications, including malware detection, family classification, and malware attribute tagging.

Attribute Malware Analysis +2

A Coreset Learning Reality Check

no code implementations15 Jan 2023 Fred Lu, Edward Raff, James Holt

Subsampling algorithms are a natural approach to reduce data size before fitting models on massive datasets.

regression

Measuring Equality in Machine Learning Security Defenses: A Case Study in Speech Recognition

no code implementations17 Feb 2023 Luke E. Richards, Edward Raff, Cynthia Matuszek

Over the past decade, the machine learning security community has developed a myriad of defenses for evasion attacks.

Adversarial Robustness Fairness +2

When Visible-to-Thermal Facial GAN Beats Conditional Diffusion

no code implementations18 Feb 2023 Catherine Ordun, Edward Raff, Sanjay Purushotham

Thermal facial imagery offers valuable insight into physiological states such as inflammation and stress by detecting emitted radiation in the infrared spectrum, which is unseen in the visible spectra.

Denoising

The Challenge of Differentially Private Screening Rules

no code implementations18 Mar 2023 Amol Khanna, Fred Lu, Edward Raff

Linear $L_1$-regularized models have remained one of the simplest and most effective tools in data analysis, especially in information retrieval problems where n-grams over text with TF-IDF or Okapi feature values are a strong and easy baseline.

Information Retrieval Privacy Preserving +2

Emergent and Predictable Memorization in Large Language Models

2 code implementations NeurIPS 2023 Stella Biderman, USVSN Sai Prashanth, Lintang Sutawika, Hailey Schoelkopf, Quentin Anthony, Shivanshu Purohit, Edward Raff

Memorization, or the tendency of large language models (LLMs) to output entire sequences from their training data verbatim, is a key concern for safely deploying language models.

Memorization

Sparse Private LASSO Logistic Regression

no code implementations24 Apr 2023 Amol Khanna, Fred Lu, Edward Raff, Brian Testa

LASSO regularized logistic regression is particularly useful for its built-in feature selection, allowing coefficients to be removed from deployment and producing sparse solutions.

feature selection Model Selection +1

Recasting Self-Attention with Holographic Reduced Representations

1 code implementation31 May 2023 Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt

In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains.

Malware Detection

You Don't Need Robust Machine Learning to Manage Adversarial Attack Risks

no code implementations16 Jun 2023 Edward Raff, Michel Benaroch, Andrew L. Farris

In this survey we review the current literature on attacks and their real-world occurrences, or limited evidence thereof, to critically evaluate the real-world risks of adversarial machine learning (AML) for the average entity.

Adversarial Attack

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

1 code implementation28 Jun 2023 Corey J. Nolet, Divye Gala, Alex Fender, Mahesh Doijade, Joe Eaton, Edward Raff, John Zedlewski, Brad Rees, Tim Oates

In this paper, we propose cuSLINK, a novel and state-of-the-art reformulation of the SLINK algorithm on the GPU which requires only $O(Nk)$ space and uses a parameter $k$ to trade off space and time.

Clustering graph construction

Exploring the Sharpened Cosine Similarity

no code implementations25 Jul 2023 Skyler Wu, Fred Lu, Edward Raff, James Holt

Convolutional layers have long served as the primary workhorse for image classification.

Adversarial Robustness Image Classification

A Generative Approach for Image Registration of Visible-Thermal (VT) Cancer Faces

no code implementations23 Aug 2023 Catherine Ordun, Alexandra Cha, Edward Raff, Sanjay Purushotham, Karen Kwok, Mason Rule, James Gulley

Since thermal imagery offers a unique modality to investigate pain, the U. S. National Institutes of Health (NIH) has collected a large and diverse set of cancer patient facial thermograms for AI-based pain research.

Image Registration

Towards Generalization in Subitizing with Neuro-Symbolic Loss using Holographic Reduced Representations

1 code implementation23 Dec 2023 Mohammad Mahmudul Alam, Edward Raff, Tim Oates

While deep learning has enjoyed significant success in computer vision tasks over the past decade, many shortcomings still exist from a Cognitive Science (CogSci) perspective.

Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!

no code implementations25 Dec 2023 Tirth Patel, Fred Lu, Edward Raff, Charles Nicholas, Cynthia Matuszek, James Holt

Industry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines, meaning a 0. 1\% change can cause an overwhelming number of false positives.

Malware Detection

Comprehensive OOD Detection Improvements

no code implementations18 Jan 2024 Anish Lakkapragada, Amol Khanna, Edward Raff, Nathan Inkawhich

As machine learning becomes increasingly prevalent in impactful decisions, recognizing when inference data is outside the model's expected input distribution is paramount for giving context to predictions.

Dimensionality Reduction Out of Distribution (OOD) Detection

Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection

no code implementations23 Mar 2024 Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt

Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges.

Malware Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.