Search Results for author: Robert Sim

Found 22 papers, 5 papers with code

Privacy Leakage in Text Classification A Data Extraction Approach

no code implementations • NAACL (PrivateNLP) 2022 • Adel Elmahdy, Huseyin A. Inan, Robert Sim

Recent work has demonstrated the successful extraction of training data from generative language models.

Paper
Add Code

Differentially Private Training of Mixture of Experts Models

no code implementations • 11 Feb 2024 • Pierre Tholoniat, Huseyin A. Inan, Janardhan Kulkarni, Robert Sim

This position paper investigates the integration of Differential Privacy (DP) in the training of Mixture of Experts (MoE) models within the field of natural language processing.

Computational Efficiency Privacy Preserving

Paper
Add Code

Privately Aligning Language Models with Reinforcement Learning

no code implementations • 25 Oct 2023 • Fan Wu, Huseyin A. Inan, Arturs Backurs, Varun Chandrasekaran, Janardhan Kulkarni, Robert Sim

Positioned between pre-training and user deployment, aligning large language models (LLMs) through reinforcement learning (RL) has emerged as a prevailing strategy for training instruction following-models such as ChatGPT.

Instruction Following Privacy Preserving +3

Paper
Add Code

Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation

no code implementations • 4 Oct 2023 • Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim

Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind.

Model Compression Text Summarization

Paper
Add Code

Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation

1 code implementation • 21 Sep 2023 • Xinyu Tang, Richard Shin, Huseyin A. Inan, Andre Manoel, FatemehSadat Mireshghallah, Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni, Robert Sim

Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels.

In-Context Learning Privacy Preserving

Paper
Code

Project Florida: Federated Learning Made Easy

no code implementations • 21 Jul 2023 • Daniel Madrigal Diaz, Andre Manoel, Jialei Chen, Nalin Singal, Robert Sim

Federated learning enables model training across devices and silos while the training data remains within its security boundary, by distributing a model snapshot to a client running inside the boundary, running client code to update the model, and then aggregating updated snapshots across many clients in a central orchestrator.

Federated Learning Management

Paper
Add Code

FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts

no code implementations • 14 Jun 2023 • Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Robert Sim, Anastasios Kyrillidis, Dimitrios Dimitriadis

Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly.

Federated Learning

Paper
Add Code

Analyzing Leakage of Personally Identifiable Information in Language Models

1 code implementation • 1 Feb 2023 • Nils Lukas, Ahmed Salem, Robert Sim, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin

Understanding the risk of LMs leaking Personally Identifiable Information (PII) has received less attention, which can be attributed to the false assumption that dataset curation techniques such as scrubbing are sufficient to prevent PII leakage.

Sentence

Paper
Code

TrojanPuzzle: Covertly Poisoning Code-Suggestion Models

1 code implementation • 6 Jan 2023 • Hojjat Aghakhani, Wei Dai, Andre Manoel, Xavier Fernandes, Anant Kharkar, Christopher Kruegel, Giovanni Vigna, David Evans, Ben Zorn, Robert Sim

To achieve this, prior attacks explicitly inject the insecure code payload into the training data, making the poison data detectable by static analysis tools that can remove such malicious data from the training set.

Data Poisoning

Paper
Code

Federated Multilingual Models for Medical Transcript Analysis

no code implementations • 4 Nov 2022 • Andre Manoel, Mirian Hipolito Garcia, Tal Baumel, Shize Su, Jialei Chen, Dan Miller, Danny Karmon, Robert Sim, Dimitrios Dimitriadis

Federated Learning (FL) is a novel machine learning approach that allows the model trainer to access more data samples, by training the model across multiple decentralized data sources, while data access constraints are in place.

Federated Learning

Paper
Add Code

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

1 code implementation • 25 Oct 2022 • Xiang Yue, Huseyin A. Inan, Xuechen Li, Girish Kumar, Julia McAnallen, Hoda Shajari, Huan Sun, David Levitan, Robert Sim

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data.

Language Modelling Text Generation

Paper
Code

Privacy Leakage in Text Classification: A Data Extraction Approach

no code implementations • 9 Jun 2022 • Adel Elmahdy, Huseyin A. Inan, Robert Sim

Recent work has demonstrated the successful extraction of training data from generative language models.

Memorization text-classification +1

Paper
Add Code

Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning

no code implementations • 27 Apr 2022 • Yae Jee Cho, Andre Manoel, Gauri Joshi, Robert Sim, Dimitrios Dimitriadis

In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server.

Ensemble Learning Federated Learning +1

Paper
Add Code

FLUTE: A Scalable, Extensible Framework for High-Performance Federated Learning Simulations

1 code implementation • 25 Mar 2022 • Mirian Hipolito Garcia, Andre Manoel, Daniel Madrigal Diaz, FatemehSadat Mireshghallah, Robert Sim, Dimitrios Dimitriadis

We compare the platform with other state-of-the-art platforms and describe available features of FLUTE for experimentation in core areas of active research, such as optimization, privacy, and scalability.

Federated Learning Quantization +3

178

Paper
Code

UserIdentifier: Implicit User Representations for Simple and Effective Personalized Sentiment Analysis

no code implementations • NAACL 2022 • FatemehSadat Mireshghallah, Vaishnavi Shrivastava, Milad Shokouhi, Taylor Berg-Kirkpatrick, Robert Sim, Dimitrios Dimitriadis

As such, these models are often unable to produce personalized responses for individual users, based on their data.

Few-Shot Learning Sentiment Analysis

Paper
Add Code

Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets

no code implementations • ACL 2021 • Su Lin Blodgett, Gilsinia Lopez, Alexandra Olteanu, Robert Sim, Hanna Wallach

Auditing NLP systems for computational harms like surfacing stereotypes is an elusive goal.

coreference-resolution Fairness +1

Paper
Add Code

Privacy Regularization: Joint Privacy-Utility Optimization in LanguageModels

no code implementations • NAACL 2021 • FatemehSadat Mireshghallah, Huseyin Inan, Marcello Hasegawa, Victor R{\"u}hle, Taylor Berg-Kirkpatrick, Robert Sim

In this work, we introduce two privacy-preserving regularization methods for training language models that enable joint optimization of utility and privacy through (1) the use of a discriminator and (2) the inclusion of a novel triplet-loss term.

Memorization Privacy Preserving

Paper
Add Code

On Privacy and Confidentiality of Communications in Organizational Graphs

no code implementations • 27 May 2021 • Masoumeh Shafieinejad, Huseyin Inan, Marcello Hasegawa, Robert Sim

We propose a model that captures the correlation in the social network graph, and incorporates this correlation in the privacy calculations through Pufferfish privacy principles.

Language Modelling

Paper
Add Code

Privacy Regularization: Joint Privacy-Utility Optimization in Language Models

no code implementations • 12 Mar 2021 • FatemehSadat Mireshghallah, Huseyin A. Inan, Marcello Hasegawa, Victor Rühle, Taylor Berg-Kirkpatrick, Robert Sim

Memorization Privacy Preserving

Paper
Add Code

Training Data Leakage Analysis in Language Models

no code implementations • 14 Jan 2021 • Huseyin A. Inan, Osman Ramadan, Lukas Wutschitz, Daniel Jones, Victor Rühle, James Withers, Robert Sim

It has been demonstrated that strong performance of language models comes along with the ability to memorize rare training samples, which poses serious privacy threats in case the model is trained on confidential user content.

Sentence

Paper
Add Code

Leveraging Structured Metadata for Improving Question Answering on the Web

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Xinya Du, Ahmed Hassan Awadallah, Adam Fourney, Robert Sim, Paul Bennett, Claire Cardie

We show that leveraging metadata information from web pages can improve the performance of models for answer passage selection/reranking.

Question Answering

Paper
Add Code

Conversations with Documents. An Exploration of Document-Centered Assistance

no code implementations • 27 Jan 2020 • Maartje ter Hoeve, Robert Sim, Elnaz Nouri, Adam Fourney, Maarten de Rijke, Ryen W. White

Our contributions are three-fold: (1) We first present a survey to understand the space of document-centered assistance and the capabilities people expect in this scenario.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.