28 papers with code • 0 benchmarks • 1 datasets
De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data.
These leaderboards are used to track progress in De-identification
Finally, we discuss the privacy concerns associated with sharing synthetic data produced by GANs and test their ability to withstand a simple membership inference attack.
The proliferation of speech technologies and rising privacy legislation calls for the development of privacy preservation solutions for speech applications.
The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization
We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods.
It yields an F1-score of 97. 85 on the i2b2 2014 dataset, with a recall 97. 38 and a precision of 97. 32, and an F1-score of 99. 23 on the MIMIC de-identification dataset, with a recall 99. 25 and a precision of 99. 06.
A variety of methods existing for generating synthetic electronic health records (EHRs), but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness or progress notes.
In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons.
Large-scale clinical data is invaluable to driving many computational scientific advances today.