31 papers with code • 1 benchmarks • 2 datasets
Model extraction attacks, aka model stealing attacks, are used to extract the parameters from the target model. Ideally, the adversary will be able to steal and replicate a model that will have a very similar performance to the target model.
LibrariesUse these libraries to find Model extraction models and implementations
Most cross-device federated learning (FL) studies focus on the model-homogeneous setting where the global server model and local client models are identical.
In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i. e., "steal") the model.
Process model extraction (PME) is a recently emerged interdiscipline between natural language processing (NLP) and business process management (BPM), which aims to extract process models from textual descriptions.
Existing watermarking schemes are ineffective against IP theft via model extraction since it is the adversary who trains the surrogate model.
We study the problem of model extraction in natural language processing, in which an adversary with only query access to a victim model attempts to reconstruct a local copy of that model.
We propose a fingerprinting method for deep neural network classifiers that extracts a set of inputs from the source model so that only surrogates agree with the source model on the classification of such inputs.
We demonstrate that (1) it is possible to use ACTIVETHIEF to extract deep classifiers trained on a variety of datasets from image and text domains, while querying the model with as few as 10-30% of samples from public datasets, (2) the resulting model exhibits a higher transferability success rate of adversarial examples than prior work, and (3) the attack evades detection by the state-of-the-art model extraction detection method, PRADA.