De-identification

37 papers with code • 0 benchmarks • 2 datasets

De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data.

Benchmarks

Add a Result

These leaderboards are used to track progress in De-identification

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Subtasks

Privacy Preserving Deep Learning

Most implemented papers

Most implemented Social Latest No code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

pyannote/pyannote-audio • • CVPR 2022

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

Paper
Code

Synthesis of Realistic ECG using Generative Adversarial Networks

Brophy-E/ECG_GAN_MBD • • 19 Sep 2019

Finally, we discuss the privacy concerns associated with sharing synthetic data produced by GANs and test their ability to withstand a simple membership inference attack.

Paper
Code

Face Identity Disentanglement via Latent Space Mapping

YotamNitzan/ID-disentanglement • • 15 May 2020

Learning disentangled representations of data is a fundamental problem in artificial intelligence.

Paper
Code

Publicly Available Clinical BERT Embeddings

EmilyAlsentzer/clinicalBERT • • WS 2019

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months.

Paper
Code

Speech Pseudonymisation Assessment Using Voice Similarity Matrices

Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022 • • 30 Aug 2020

The proliferation of speech technologies and rising privacy legislation calls for the development of privacy preservation solutions for speech applications.

Paper
Code

The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization

norskregnesentral/text-anonymization-benchmark • • 25 Jan 2022

We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods.

Paper
Code

De-identification of Patient Notes with Recurrent Neural Networks

Franck-Dernoncourt/NeuroNER • • 10 Jun 2016

It yields an F1-score of 97. 85 on the i2b2 2014 dataset, with a recall 97. 38 and a precision of 97. 32, and an F1-score of 99. 23 on the MIMIC de-identification dataset, with a recall 99. 25 and a precision of 99. 06.

Paper
Code

Natural Language Generation for Electronic Health Records

scotthlee/nrc • • 1 Jun 2018

A variety of methods existing for generating synthetic electronic health records (EHRs), but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness or progress notes.

Paper
Code