EXPLORING VULNERABILITIES OF BERT-BASED APIS

1 Jan 2021 · Xuanli He, Lingjuan Lyu, Lichao Sun, Xiaojun Chang, Jun Zhao ·

Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by pretrained BERT models. This allows corporations to easily build powerful APIs by encapsulating fine-tuned BERT models. These BERT-based APIs are often designed to not only provide reliable service but also protect intellectual properties or privacy-sensitive information of the training data. However, a series of privacy and robustness issues may still exist when a fine-tuned BERT model is deployed as a service. In this work, we first present an effective model extraction attack, where the adversary can steal a BERT-based API (the victim model), without knowing the victim model’s architecture, parameters or the training data distribution. We then demonstrate how the extracted model can be exploited to develop effective attribute inference attack to expose sensitive information of the training data. We also show that the extracted model can lead to highly transferable adversarial attacks against the original model. Extensive experiments on multiple benchmark datasets under various settings validate the potential privacy issues and adversarial vulnerabilities of BERT-based APIs.

PDF Abstract