1 code implementation • ACL 2022 • FatemehSadat Mireshghallah, Kartik Goyal, Taylor Berg-Kirkpatrick
Recent work on controlled text generation has either required attribute-based fine-tuning of the base language model (LM), or has restricted the parameterization of the attribute discriminator to be compatible with the base autoregressive LM.
1 code implementation • 29 Sep 2023 • Mengke Zhang, Tianxing He, Tianle Wang, Lu Mi, FatemehSadat Mireshghallah, Binyi Chen, Hao Wang, Yulia Tsvetkov
In the current user-server interaction paradigm of prompted generation with large language models (LLM) on cloud, the server fully controls the generation process, which leaves zero options for users who want to keep the generated text to themselves.
1 code implementation • 21 Sep 2023 • Xinyu Tang, Richard Shin, Huseyin A. Inan, Andre Manoel, FatemehSadat Mireshghallah, Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Robert Sim
Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels.
1 code implementation • 29 May 2023 • Justus Mattern, FatemehSadat Mireshghallah, Zhijing Jin, Bernhard Schölkopf, Mrinmaya Sachan, Taylor Berg-Kirkpatrick
To investigate whether this fragility provides a layer of safety, we propose and evaluate neighbourhood attacks, which compare model scores for a given sample to scores of synthetically generated neighbour texts and therefore eliminate the need for access to the training data distribution.
no code implementations • 24 May 2023 • Aman Priyanshu, Supriti Vijay, Ayush Kumar, Rakshit Naidu, FatemehSadat Mireshghallah
More specifically, we find that when ChatGPT is prompted to summarize cover letters of a 100 candidates, it would retain personally identifiable information (PII) verbatim in 57. 4% of cases, and we find this retention to be non-uniform between different subgroups of people, based on attributes such as gender identity.
no code implementations • 20 Dec 2022 • FatemehSadat Mireshghallah, Yu Su, Tatsunori Hashimoto, Jason Eisner, Richard Shin
Task-oriented dialogue systems often assist users with personal or confidential matters.
no code implementations • 13 Sep 2022 • FatemehSadat Mireshghallah, Nikolai Vogler, Junxian He, Omar Florez, Ahmed El-Kishky, Taylor Berg-Kirkpatrick
User-generated social media data is constantly changing as new trends influence online discussion and personal information is deleted due to privacy concerns.
no code implementations • 3 Jun 2022 • FatemehSadat Mireshghallah, Arturs Backurs, Huseyin A Inan, Lukas Wutschitz, Janardhan Kulkarni
Recent papers have shown that large pre-trained language models (LLMs) such as BERT, GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private models for many downstream Natural Language Processing (NLP) tasks while simultaneously guaranteeing differential privacy.
1 code implementation • 25 May 2022 • FatemehSadat Mireshghallah, Archit Uniyal, Tianhao Wang, David Evans, Taylor Berg-Kirkpatrick
Large language models are shown to present privacy risks through memorization of training data, and several recent works have studied such risks for the pre-training phase.
1 code implementation • 25 Mar 2022 • Mirian Hipolito Garcia, Andre Manoel, Daniel Madrigal Diaz, FatemehSadat Mireshghallah, Robert Sim, Dimitrios Dimitriadis
We compare the platform with other state-of-the-art platforms and describe available features of FLUTE for experimentation in core areas of active research, such as optimization, privacy, and scalability.
1 code implementation • 24 Mar 2022 • FatemehSadat Mireshghallah, Kartik Goyal, Taylor Berg-Kirkpatrick
Recent work on controlled text generation has either required attribute-based fine-tuning of the base language model (LM), or has restricted the parameterization of the attribute discriminator to be compatible with the base autoregressive LM.
no code implementations • 8 Mar 2022 • FatemehSadat Mireshghallah, Kartik Goyal, Archit Uniyal, Taylor Berg-Kirkpatrick, Reza Shokri
The wide adoption and application of Masked language models~(MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities -- to what extent do MLMs leak information about their training data?
no code implementations • 11 Feb 2022 • Hannah Brown, Katherine Lee, FatemehSadat Mireshghallah, Reza Shokri, Florian Tramèr
Language models lack the ability to understand the context and sensitivity of text, and tend to memorize phrases present in their training sets.
no code implementations • NAACL 2022 • FatemehSadat Mireshghallah, Vaishnavi Shrivastava, Milad Shokouhi, Taylor Berg-Kirkpatrick, Robert Sim, Dimitrios Dimitriadis
As such, these models are often unable to produce personalized responses for individual users, based on their data.
1 code implementation • EMNLP 2021 • FatemehSadat Mireshghallah, Taylor Berg-Kirkpatrick
Text style can reveal sensitive attributes of the author (e. g. race or age) to the reader, which can, in turn, lead to privacy violations and bias in both human and algorithmic decisions based on text.
1 code implementation • 9 Aug 2021 • Aman Priyanshu, Rakshit Naidu, FatemehSadat Mireshghallah, Mohammad Malekzadeh
Tuning the hyperparameters in the differentially private stochastic gradient descent (DPSGD) is a fundamental challenge.
1 code implementation • 26 Jun 2021 • Priyam Basu, Tiasa Singha Roy, Rakshit Naidu, Zumrut Muftuoglu, Sahib Singh, FatemehSadat Mireshghallah
Natural Language Processing (NLP) techniques can be applied to help with the diagnosis of medical conditions such as depression, using a collection of a person's utterances.
no code implementations • 24 Jun 2021 • Rakshit Naidu, Aman Priyanshu, Aadith Kumar, Sasikanth Kotti, Haofan Wang, FatemehSadat Mireshghallah
Given the increase in the use of personal data for training Deep Neural Networks (DNNs) in tasks such as medical imaging and diagnosis, differentially private training of DNNs is surging in importance and there is a large body of work focusing on providing better privacy-utility trade-off.
1 code implementation • 22 Jun 2021 • Archit Uniyal, Rakshit Naidu, Sasikanth Kotti, Sahib Singh, Patrik Joslin Kenfack, FatemehSadat Mireshghallah, Andrew Trask
Recent advances in differentially private deep learning have demonstrated that application of differential privacy, specifically the DP-SGD algorithm, has a disparate impact on different sub-groups in the population, which leads to a significantly high drop-in model utility for sub-populations that are under-represented (minorities), compared to well-represented ones.
no code implementations • NAACL 2021 • FatemehSadat Mireshghallah, Huseyin Inan, Marcello Hasegawa, Victor R{\"u}hle, Taylor Berg-Kirkpatrick, Robert Sim
In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a novel triplet-loss term.
no code implementations • 12 Mar 2021 • FatemehSadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor Rühle, Taylor Berg-Kirkpatrick, Robert Sim
In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a triplet-loss term.
1 code implementation • 14 Jan 2021 • Teddy Koker, FatemehSadat Mireshghallah, Tom Titcombe, Georgios Kaissis
Deep Neural Networks (DNNs) are widely used for decision making in a myriad of critical applications, ranging from medical to societal and even judicial.
1 code implementation • 1 Jan 2021 • Ahmed T. Elthakeb, Prannoy Pilligundla, Tarek Elgindi, FatemehSadat Mireshghallah, Charles-Alban Deledalle, Hadi Esmaeilzadeh
We show how WaveQ balance compute efficiency and accuracy, and provide a heterogeneous bitwidth assignment for quantization of a large variety of deep networks (AlexNet, CIFAR-10, MobileNet, ResNet-18, ResNet-20, SVHN, and VGG-11) that virtually preserves the accuracy.
2 code implementations • 10 Sep 2020 • Tom Farrand, FatemehSadat Mireshghallah, Sahib Singh, Andrew Trask
Deployment of deep learning in different fields and industries is growing day by day due to its performance, which relies on the availability of data and compute.
no code implementations • 25 Apr 2020 • Fatemehsadat Mireshghallah, Mohammadkazem Taram, Praneeth Vepakomma, Abhishek Singh, Ramesh Raskar, Hadi Esmaeilzadeh
In this survey, we review the privacy concerns brought by deep learning, and the mitigating techniques introduced to tackle these issues.
no code implementations • 26 Mar 2020 • Fatemehsadat Mireshghallah, Mohammadkazem Taram, Ali Jalali, Ahmed Taha Elthakeb, Dean Tullsen, Hadi Esmaeilzadeh
We formulate this problem as a gradient-based perturbation maximization method that discovers this subset in the input feature space with respect to the functionality of the prediction model used by the provider.
no code implementations • 29 Feb 2020 • Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah, Tarek Elgindi, Charles-Alban Deledalle, Hadi Esmaeilzadeh
We show how SINAREQ balance compute efficiency and accuracy, and provide a heterogeneous bitwidth assignment for quantization of a large variety of deep networks (AlexNet, CIFAR-10, MobileNet, ResNet-18, ResNet-20, SVHN, and VGG-11) that virtually preserves the accuracy.
3 code implementations • 26 May 2019 • Fatemehsadat Mireshghallah, Mohammadkazem Taram, Prakash Ramrakhyani, Dean Tullsen, Hadi Esmaeilzadeh
To address this challenge, this paper devises Shredder, an end-to-end framework, that, without altering the topology or the weights of a pre-trained network, learns additive noise distributions that significantly reduce the information content of communicated data while maintaining the inference accuracy.
no code implementations • 5 Nov 2018 • Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah, Amir Yazdanbakhsh, Hadi Esmaeilzadeh
We show how ReLeQ can balance speed and quality, and provide an asymmetric general solution for quantization of a large variety of deep networks (AlexNet, CIFAR-10, LeNet, MobileNet-V1, ResNet-20, SVHN, and VGG-11) that virtually preserves the accuracy (=< 0. 3% loss) while minimizing the computation and storage cost.