Model extraction

31 papers with code • 1 benchmarks • 2 datasets

Model extraction attacks, aka model stealing attacks, are used to extract the parameters from the target model. Ideally, the adversary will be able to steal and replicate a model that will have a very similar performance to the target model.


Use these libraries to find Model extraction models and implementations

Most implemented papers

Entangled Watermarks as a Defense against Model Extraction

cleverhans-lab/entangled-watermark 27 Feb 2020

Such pairs are watermarks, which are not sampled from the task distribution and are only known to the defender.

Data-Free Model Extraction

cake-lab/datafree-model-extraction CVPR 2021

Current model extraction attacks assume that the adversary has access to a surrogate dataset with characteristics similar to the proprietary data used to train the victim model.

FedRolex: Model-Heterogeneous Federated Learning with Rolling Sub-Model Extraction

aiot-mlsys-lab/fedrolex 3 Dec 2022

Most cross-device federated learning (FL) studies focus on the model-homogeneous setting where the global server model and local client models are identical.

Protecting Language Generation Models via Invisible Watermarking

xuandongzhao/ginsew 6 Feb 2023

We can then detect the secret message by probing a suspect model to tell if it is distilled from the protected one.

Stealing Machine Learning Models via Prediction APIs

ftramer/Steal-ML 9 Sep 2016

In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i. e., "steal") the model.

An Approach for Process Model Extraction By Multi-Grained Text Classification

qianc62/MGTC 16 May 2019

Process model extraction (PME) is a recently emerged interdiscipline between natural language processing (NLP) and business process management (BPM), which aims to extract process models from textual descriptions.

DAWN: Dynamic Adversarial Watermarking of Neural Networks

ssg-research/dawn-dynamic-adversarial-watermarking-of-neural-networks 3 Jun 2019

Existing watermarking schemes are ineffective against IP theft via model extraction since it is the adversary who trains the surrogate model.

Thieves on Sesame Street! Model Extraction of BERT-based APIs

google-research/language ICLR 2020

We study the problem of model extraction in natural language processing, in which an adversary with only query access to a victim model attempts to reconstruct a local copy of that model.

Deep Neural Network Fingerprinting by Conferrable Adversarial Examples

ayberkuckun/dnn-fingerprinting ICLR 2021

We propose a fingerprinting method for deep neural network classifiers that extracts a set of inputs from the source model so that only surrogates agree with the source model on the classification of such inputs.

ACTIVETHIEF: Model Extraction Using Active Learning and Unannotated Public Data

iiscseal/activethief 7 Feb 2020

We demonstrate that (1) it is possible to use ACTIVETHIEF to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.