Search Results for author: Congzheng Song

Found 21 papers, 8 papers with code

Fooling OCR Systems with Adversarial Text Images

no code implementations • 15 Feb 2018 • Congzheng Song, Vitaly Shmatikov

We demonstrate that state-of-the-art optical character recognition (OCR) based on deep learning is vulnerable to adversarial images.

Adversarial Text Optical Character Recognition +1

Paper
Add Code

Kernel Distillation for Fast Gaussian Processes Prediction

no code implementations • 31 Jan 2018 • Congzheng Song, Yiming Sun

Gaussian processes (GPs) are flexible models that can capture complex structure in large-scale dataset due to their non-parametric nature.

Gaussian Processes

Paper
Add Code

Learning Genomic Representations to Predict Clinical Outcomes in Cancer

1 code implementation • 27 Sep 2016 • Safoora Yousefi, Congzheng Song, Nelson Nauata, Lee Cooper

Genomics are rapidly transforming medical practice and basic biomedical research, providing insights into disease mechanisms and improving therapeutic strategies, particularly in cancer.

Survival Analysis

Paper
Code

Chiron: Privacy-preserving Machine Learning as a Service

no code implementations • 15 Mar 2018 • Tyler Hunt, Congzheng Song, Reza Shokri, Vitaly Shmatikov, Emmett Witchel

Existing ML-as-a-service platforms require users to reveal all training data to the service operator.

Cryptography and Security

Paper
Add Code

Overlearning Reveals Sensitive Attributes

no code implementations • ICLR 2020 • Congzheng Song, Vitaly Shmatikov

For example, a binary gender classifier of facial images also learns to recognize races\textemdash even races that are not represented in the training data\textemdash and identities.

Paper
Add Code

Generalized Zero-shot ICD Coding

no code implementations • 28 Sep 2019 • Congzheng Song, Shanghang Zhang, Najmeh Sadoughi, Pengtao Xie, Eric Xing

The International Classification of Diseases (ICD) is a list of classification codes for the diagnoses.

General Classification Generalized Zero-Shot Learning +3

Paper
Add Code

Robust Membership Encoding: Inference Attacks and Copyright Protection for Deep Learning

no code implementations • 27 Sep 2019 • Congzheng Song, Reza Shokri

In this paper, we present \emph{membership encoding} for training deep neural networks and encoding the membership information, i. e. whether a data point is used for training, for a subset of training data.

Model Compression

Paper
Add Code

Information Leakage in Embedding Models

no code implementations • 31 Mar 2020 • Congzheng Song, Ananth Raghunathan

We demonstrate that embeddings, in addition to encoding generic semantics, often also present a vector that leaks sensitive information about the input data.

Sentence Sentence Embeddings

Paper
Add Code

You Autocomplete Me: Poisoning Vulnerabilities in Neural Code Completion

no code implementations • 5 Jul 2020 • Roei Schuster, Congzheng Song, Eran Tromer, Vitaly Shmatikov

We demonstrate that neural code autocompleters are vulnerable to poisoning attacks.

Code Completion Data Poisoning +1

Paper
Add Code

Training a Tokenizer for Free with Private Federated Learning

no code implementations • 15 Mar 2022 • Eugene Bagdasaryan, Congzheng Song, Rogier Van Dalen, Matt Seigel, Áine Cahill

During private federated learning of the language model, we sample from the model, train a new tokenizer on the sampled sequences, and update the model embeddings.

Federated Learning Language Modelling

Paper
Add Code

Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices

no code implementations • 18 Jul 2022 • MingBin Xu, Congzheng Song, Ye Tian, Neha Agrawal, Filip Granqvist, Rogier Van Dalen, Xiao Zhang, Arturo Argueta, Shiyi Han, Yaqiao Deng, Leo Liu, Anmol Walia, Alex Jin

Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP.

Federated Learning Language Modelling

Paper
Add Code

Population Expansion for Training Language Models with Private Federated Learning

no code implementations • 14 Jul 2023 • Tatsuki Koga, Congzheng Song, Martin Pelikan, Mona Chitnis

Federated learning (FL) combined with differential privacy (DP) offers machine learning (ML) training with distributed devices and with a formal privacy guarantee.

Domain Adaptation Federated Learning +1

Paper
Add Code

Samplable Anonymous Aggregation for Private Federated Data Analysis

no code implementations • 27 Jul 2023 • Kunal Talwar, Shan Wang, Audra McMillan, Vojta Jina, Vitaly Feldman, Bailey Basile, Aine Cahill, Yi Sheng Chan, Mike Chatzidakis, Junye Chen, Oliver Chick, Mona Chitnis, Suman Ganta, Yusuf Goren, Filip Granqvist, Kristine Guo, Frederic Jacobs, Omid Javidbakht, Albert Liu, Richard Low, Dan Mascenik, Steve Myers, David Park, Wonhee Park, Gianni Parsa, Tommy Pauly, Christian Priebe, Rehan Rishi, Guy Rothblum, Michael Scaria, Linmao Song, Congzheng Song, Karl Tarbe, Sebastian Vogt, Luke Winstrom, Shundong Zhou

We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data.

Federated Learning

Paper
Add Code

Momentum Approximation in Asynchronous Private Federated Learning

no code implementations • 14 Feb 2024 • Tao Yu, Congzheng Song, Jianyu Wang, Mona Chitnis

Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a massive number of clients.

Federated Learning

Paper
Add Code

Auditing Data Provenance in Text-Generation Models

2 code implementations • 1 Nov 2018 • Congzheng Song, Vitaly Shmatikov

To help enforce data-protection regulations such as GDPR and detect unauthorized uses of personal data, we develop a new \emph{model auditing} technique that helps users check if their data was used to train a machine learning model.

Memorization Text Generation

Paper
Code

Adversarial Semantic Collisions

1 code implementation • EMNLP 2020 • Congzheng Song, Alexander M. Rush, Vitaly Shmatikov

We study semantic collisions: texts that are semantically unrelated but judged as similar by NLP models.

Extractive Summarization Paraphrase Identification +1

Paper
Code

Machine Learning Models that Remember Too Much

1 code implementation • 22 Sep 2017 • Congzheng Song, Thomas Ristenpart, Vitaly Shmatikov

In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model yet the model is as accurate and predictive as a conventionally trained model.

BIG-bench Machine Learning Data Augmentation +2

Paper
Code

Exploiting Unintended Feature Leakage in Collaborative Learning

1 code implementation • 10 May 2018 • Luca Melis, Congzheng Song, Emiliano De Cristofaro, Vitaly Shmatikov

First, we show that an adversarial participant can infer the presence of exact data points -- for example, specific locations -- in others' training data (i. e., membership inference).

Federated Learning

Paper
Code

FLAIR: Federated Learning Annotated Image Repository

1 code implementation • 18 Jul 2022 • Congzheng Song, Filip Granqvist, Kunal Talwar

We believe FLAIR can serve as a challenging benchmark for advancing the state-of-the art in federated learning.

Federated Learning Multi-Label Classification

Paper
Code

pfl-research: simulation framework for accelerating research in Private Federated Learning

1 code implementation • 9 Apr 2024 • Filip Granqvist, Congzheng Song, Áine Cahill, Rogier Van Dalen, Martin Pelikan, Yi Sheng Chan, Xiaojun Feng, Natarajan Krishnaswami, Vojta Jina, Mona Chitnis

Federated learning (FL) is an emerging machine learning (ML) training paradigm where clients own their data and collaborate to train a global model, without revealing any data to the server and other participants.

Federated Learning

192

Paper
Code

Membership Inference Attacks against Machine Learning Models

11 code implementations • 18 Oct 2016 • Reza Shokri, Marco Stronati, Congzheng Song, Vitaly Shmatikov

We quantitatively investigate how machine learning models leak information about the individual data records on which they were trained.

BIG-bench Machine Learning General Classification +2

328

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.