Model extraction

40 papers with code • 1 benchmarks • 2 datasets

Model extraction attacks, aka model stealing attacks, are used to extract the parameters from the target model. Ideally, the adversary will be able to steal and replicate a model that will have a very similar performance to the target model.

Libraries

Use these libraries to find Model extraction models and implementations

Most implemented papers

Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction

yueeeeeeee/recsys-extraction-attack 1 Sep 2021

Under this setting, we propose an API-based model extraction method via limited-budget synthetic data generation and knowledge distillation.

Protecting Intellectual Property of Language Generation APIs with Lexical Watermark

xlhex/nlg_api_watermark 5 Dec 2021

Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred billion word generations per day.

On the Effectiveness of Dataset Watermarking in Adversarial Settings

ssg-research/conflicts-in-ml-protection-mechanisms 25 Feb 2022

We show that radioactive data can effectively survive model extraction attacks, which raises the possibility that it can be used for ML model ownership verification robust against model extraction.

Stealing and Evading Malware Classifiers and Antivirus at Low False Positive Conditions

stratosphereips/model_extraction_malware 13 Apr 2022

We achieved good surrogates of the stand-alone classifiers with up to 99\% agreement with the target models, using less than 4% of the original training dataset.

On the Difficulty of Defending Self-Supervised Learning against Model Extraction

cleverhans-lab/ssl-attacks-defenses 16 May 2022

We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models.

Towards Automatically Extracting UML Class Diagrams from Natural Language Specifications

XsongyangX/uml-translation-3step 26 Oct 2022

To develop our approach, we create a dataset of UML class diagrams and their English specifications with the help of volunteers.

Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data

debabrota-basu/marich 16 Feb 2023

We study design of black-box model extraction attacks that can send minimal number of queries from a publicly available dataset to a target ML model through a predictive API with an aim to create an informative and distributionally equivalent replica of the target.

Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

yjw1029/embmarker 17 May 2023

Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers.

Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks

weizeming/extract_wfa_from_rnn_for_nl 24 Jun 2023

In this paper, we propose a novel framework of Weighted Finite Automata (WFA) extraction and explanation to tackle the limitations for natural language tasks.

FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout

iwang05/fluid NeurIPS 2023

Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID).