Search Results for author: Robert Sim

Found 22 papers, 5 papers with code

Differentially Private Training of Mixture of Experts Models

no code implementations11 Feb 2024 Pierre Tholoniat, Huseyin A. Inan, Janardhan Kulkarni, Robert Sim

This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing.

Computational Efficiency Privacy Preserving

Privately Aligning Language Models with Reinforcement Learning

no code implementations25 Oct 2023 Fan Wu, Huseyin A. Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim

Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT.

Instruction Following Privacy Preserving +3

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

no code implementations4 Oct 2023 Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim

Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind.

Model Compression Text Summarization

Project Florida: Federated Learning Made Easy

no code implementations21 Jul 2023 Daniel Madrigal Diaz, Andre Manoel, Jialei Chen, Nalin Singal, Robert Sim

Federated learning enables model training across devices and silos while the training data remains within its security boundary, by distributing a model snapshot to a client running inside the boundary, running client code to update the model, and then aggregating updated snapshots across many clients in a central orchestrator.

Federated Learning Management

Analyzing Leakage of Personally Identifiable Information in Language Models

1 code implementation1 Feb 2023 Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage.

Sentence

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

1 code implementation6 Jan 2023 Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, Robert Sim

To achieve this, prior attacks explicitly inject the insecure code payload into the training data, making the poison data detectable by static analysis tools that can remove such malicious data from the training set.

Data Poisoning

Federated Multilingual Models for Medical Transcript Analysis

no code implementations4 Nov 2022 Andre Manoel, Mirian Hipolito Garcia, Tal Baumel, Shize Su, Jialei Chen, Dan Miller, Danny Karmon, Robert Sim, Dimitrios Dimitriadis

Federated Learning (FL) is a novel machine learning approach that allows the model trainer to access more data samples, by training the model across multiple decentralized data sources, while data access constraints are in place.

Federated Learning

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

1 code implementation25 Oct 2022 Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.

Language Modelling Text Generation

Privacy Leakage in Text Classification: A Data Extraction Approach

no code implementations9 Jun 2022 Adel Elmahdy, Huseyin A. Inan, Robert Sim

Recent work has demonstrated the successful extraction of training data from generative language models.

Memorization text-classification +1

Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning

no code implementations27 Apr 2022 Yae Jee Cho, Andre Manoel, Gauri Joshi, Robert Sim, Dimitrios Dimitriadis

In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server.

Ensemble Learning Federated Learning +1

FLUTE: A Scalable, Extensible Framework for High-Performance Federated Learning Simulations

1 code implementation25 Mar 2022 Mirian Hipolito Garcia, Andre Manoel, Daniel Madrigal Diaz, FatemehSadat Mireshghallah, Robert Sim, Dimitrios Dimitriadis

We compare the platform with other state-of-the-art platforms and describe available features of FLUTE for experimentation in core areas of active research, such as optimization, privacy, and scalability.

Federated Learning Quantization +3

Privacy Regularization: Joint Privacy-Utility Optimization in LanguageModels

no code implementations NAACL 2021 FatemehSadat Mireshghallah, Huseyin Inan, Marcello Hasegawa, Victor R{\"u}hle, Taylor Berg-Kirkpatrick, Robert Sim

In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a novel triplet-loss term.

Memorization Privacy Preserving

On Privacy and Confidentiality of Communications in Organizational Graphs

no code implementations27 May 2021 Masoumeh Shafieinejad, Huseyin Inan, Marcello Hasegawa, Robert Sim

We propose a model that captures the correlation in the social network graph, and incorporates this correlation in the privacy calculations through Pufferfish privacy principles.

Language Modelling

Privacy Regularization: Joint Privacy-Utility Optimization in Language Models

no code implementations12 Mar 2021 FatemehSadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor Rühle, Taylor Berg-Kirkpatrick, Robert Sim

In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a triplet-loss term.

Memorization Privacy Preserving

Training Data Leakage Analysis in Language Models

no code implementations14 Jan 2021 Huseyin A. Inan, Osman Ramadan, Lukas Wutschitz, Daniel Jones, Victor Rühle, James Withers, Robert Sim

It has been demonstrated that strong performance of language models comes along with the ability to memorize rare training samples, which poses serious privacy threats in case the model is trained on confidential user content.

Sentence

Conversations with Documents. An Exploration of Document-Centered Assistance

no code implementations27 Jan 2020 Maartje ter Hoeve, Robert Sim, Elnaz Nouri, Adam Fourney, Maarten de Rijke, Ryen W. White

Our contributions are three-fold: (1) We first present a survey to understand the space of document-centered assistance and the capabilities people expect in this scenario.

Cannot find the paper you are looking for? You can Submit a new open access paper.