Model extraction
40 papers with code • 1 benchmarks • 2 datasets
Model extraction attacks, aka model stealing attacks, are used to extract the parameters from the target model. Ideally, the adversary will be able to steal and replicate a model that will have a very similar performance to the target model.
Libraries
Use these libraries to find Model extraction models and implementationsMost implemented papers
Black-Box Attacks on Sequential Recommenders via Data-Free Model Extraction
Under this setting, we propose an API-based model extraction method via limited-budget synthetic data generation and knowledge distillation.
Protecting Intellectual Property of Language Generation APIs with Lexical Watermark
Nowadays, due to the breakthrough in natural language generation (NLG), including machine translation, document summarization, image captioning, etc NLG models have been encapsulated in cloud APIs to serve over half a billion people worldwide and process over one hundred billion word generations per day.
On the Effectiveness of Dataset Watermarking in Adversarial Settings
We show that radioactive data can effectively survive model extraction attacks, which raises the possibility that it can be used for ML model ownership verification robust against model extraction.
Stealing and Evading Malware Classifiers and Antivirus at Low False Positive Conditions
We achieved good surrogates of the stand-alone classifiers with up to 99\% agreement with the target models, using less than 4% of the original training dataset.
On the Difficulty of Defending Self-Supervised Learning against Model Extraction
We construct several novel attacks and find that approaches that train directly on a victim's stolen representations are query efficient and enable high accuracy for downstream models.
Towards Automatically Extracting UML Class Diagrams from Natural Language Specifications
To develop our approach, we create a dataset of UML class diagrams and their English specifications with the help of volunteers.
Marich: A Query-efficient Distributionally Equivalent Model Extraction Attack using Public Data
We study design of black-box model extraction attacks that can send minimal number of queries from a publicly available dataset to a target ML model through a predictive API with an aim to create an informative and distributionally equivalent replica of the target.
Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark
Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers.
Weighted Automata Extraction and Explanation of Recurrent Neural Networks for Natural Language Tasks
In this paper, we propose a novel framework of Weighted Finite Automata (WFA) extraction and explanation to tackle the limitations for natural language tasks.
FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout
Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID).