De-identification

37 papers with code • 0 benchmarks • 2 datasets

De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data.

Most implemented papers

Ego4D: Around the World in 3,000 Hours of Egocentric Video

pyannote/pyannote-audio CVPR 2022

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

Synthesis of Realistic ECG using Generative Adversarial Networks

Brophy-E/ECG_GAN_MBD 19 Sep 2019

Finally, we discuss the privacy concerns associated with sharing synthetic data produced by GANs and test their ability to withstand a simple membership inference attack.

Face Identity Disentanglement via Latent Space Mapping

YotamNitzan/ID-disentanglement 15 May 2020

Learning disentangled representations of data is a fundamental problem in artificial intelligence.

Publicly Available Clinical BERT Embeddings

EmilyAlsentzer/clinicalBERT WS 2019

Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months.

Speech Pseudonymisation Assessment Using Voice Similarity Matrices

Voice-Privacy-Challenge/Voice-Privacy-Challenge-2022 30 Aug 2020

The proliferation of speech technologies and rising privacy legislation calls for the development of privacy preservation solutions for speech applications.

The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization

norskregnesentral/text-anonymization-benchmark 25 Jan 2022

We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods.

De-identification of Patient Notes with Recurrent Neural Networks

Franck-Dernoncourt/NeuroNER 10 Jun 2016

It yields an F1-score of 97. 85 on the i2b2 2014 dataset, with a recall 97. 38 and a precision of 97. 32, and an F1-score of 99. 23 on the MIMIC de-identification dataset, with a recall 99. 25 and a precision of 99. 06.

Natural Language Generation for Electronic Health Records

scotthlee/nrc 1 Jun 2018

A variety of methods existing for generating synthetic electronic health records (EHRs), but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness or progress notes.

DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text

vmenger/deduce Telematics and Informatics 2018

In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons.

Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models

orenmel/synth-clinical-notes WS 2019

Large-scale clinical data is invaluable to driving many computational scientific advances today.