no code implementations • NAACL (PrivateNLP) 2022 • Adel Elmahdy, Huseyin A. Inan, Robert Sim
Recent work has demonstrated the successful extraction of training data from generative language models.
no code implementations • 1 Oct 2024 • Xuefeng Du, Reshmi Ghosh, Robert Sim, Ahmed Salem, Vitor Carvalho, Emily Lawton, Yixuan Li, Jack W. Stokes
These unlabeled prompts, which naturally arise when VLMs are deployed in the open world, consist of both benign and malicious information.
no code implementations • 12 Sep 2024 • Tal Baumel, Andre Manoel, Daniel Jones, Shize Su, Huseyin Inan, Aaron, Bornstein, Robert Sim
We conduct utility testing to evaluate the performance of machine learning models trained on the cloned datasets.
no code implementations • 22 Apr 2024 • Dujian Ding, Ankur Mallick, Chi Wang, Robert Sim, Subhabrata Mukherjee, Victor Ruhle, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah
Large language models (LLMs) excel in most NLP tasks but also require expensive cloud servers for deployment due to their size, while smaller models that can be deployed on lower cost (e. g., edge) devices, tend to lag behind in terms of response quality.
no code implementations • 11 Feb 2024 • Pierre Tholoniat, Huseyin A. Inan, Janardhan Kulkarni, Robert Sim
This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing.
no code implementations • 25 Oct 2023 • Fan Wu, Huseyin A. Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim
Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT.
no code implementations • 4 Oct 2023 • Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind.
1 code implementation • 21 Sep 2023 • Xinyu Tang, Richard Shin, Huseyin A. Inan, Andre Manoel, FatemehSadat Mireshghallah, Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Robert Sim
Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels.
no code implementations • 21 Jul 2023 • Daniel Madrigal Diaz, Andre Manoel, Jialei Chen, Nalin Singal, Robert Sim
Federated learning enables model training across devices and silos while the training data remains within its security boundary, by distributing a model snapshot to a client running inside the boundary, running client code to update the model, and then aggregating updated snapshots across many clients in a central orchestrator.
no code implementations • 14 Jun 2023 • Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Robert Sim, Anastasios Kyrillidis, Dimitrios Dimitriadis
Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly.
1 code implementation • 1 Feb 2023 • Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin
Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage.
1 code implementation • 6 Jan 2023 • Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, Robert Sim
To achieve this, prior attacks explicitly inject the insecure code payload into the training data, making the poison data detectable by static analysis tools that can remove such malicious data from the training set.
no code implementations • 4 Nov 2022 • Andre Manoel, Mirian Hipolito Garcia, Tal Baumel, Shize Su, Jialei Chen, Dan Miller, Danny Karmon, Robert Sim, Dimitrios Dimitriadis
Federated Learning (FL) is a novel machine learning approach that allows the model trainer to access more data samples, by training the model across multiple decentralized data sources, while data access constraints are in place.
1 code implementation • 25 Oct 2022 • Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim
Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.
no code implementations • 9 Jun 2022 • Adel Elmahdy, Huseyin A. Inan, Robert Sim
Recent work has demonstrated the successful extraction of training data from generative language models.
no code implementations • 27 Apr 2022 • Yae Jee Cho, Andre Manoel, Gauri Joshi, Robert Sim, Dimitrios Dimitriadis
In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server.
1 code implementation • 25 Mar 2022 • Mirian Hipolito Garcia, Andre Manoel, Daniel Madrigal Diaz, FatemehSadat Mireshghallah, Robert Sim, Dimitrios Dimitriadis
We compare the platform with other state-of-the-art platforms and describe available features of FLUTE for experimentation in core areas of active research, such as optimization, privacy, and scalability.
no code implementations • NAACL 2022 • FatemehSadat Mireshghallah, Vaishnavi Shrivastava, Milad Shokouhi, Taylor Berg-Kirkpatrick, Robert Sim, Dimitrios Dimitriadis
As such, these models are often unable to produce personalized responses for individual users, based on their data.
no code implementations • ACL 2021 • Su Lin Blodgett, Gilsinia Lopez, Alexandra Olteanu, Robert Sim, Hanna Wallach
Auditing NLP systems for computational harms like surfacing stereotypes is an elusive goal.
no code implementations • NAACL 2021 • FatemehSadat Mireshghallah, Huseyin Inan, Marcello Hasegawa, Victor R{\"u}hle, Taylor Berg-Kirkpatrick, Robert Sim
In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a novel triplet-loss term.
no code implementations • 27 May 2021 • Masoumeh Shafieinejad, Huseyin Inan, Marcello Hasegawa, Robert Sim
We propose a model that captures the correlation in the social network graph, and incorporates this correlation in the privacy calculations through Pufferfish privacy principles.
no code implementations • 12 Mar 2021 • FatemehSadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor Rühle, Taylor Berg-Kirkpatrick, Robert Sim
In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a triplet-loss term.
no code implementations • 14 Jan 2021 • Huseyin A. Inan, Osman Ramadan, Lukas Wutschitz, Daniel Jones, Victor Rühle, James Withers, Robert Sim
It has been demonstrated that strong performance of language models comes along with the ability to memorize rare training samples, which poses serious privacy threats in case the model is trained on confidential user content.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Xinya Du, Ahmed Hassan Awadallah, Adam Fourney, Robert Sim, Paul Bennett, Claire Cardie
We show that leveraging metadata information from web pages can improve the performance of models for answer passage selection/reranking.
no code implementations • 27 Jan 2020 • Maartje ter Hoeve, Robert Sim, Elnaz Nouri, Adam Fourney, Maarten de Rijke, Ryen W. White
Our contributions are three-fold: (1) We first present a survey to understand the space of document-centered assistance and the capabilities people expect in this scenario.