De-identification
39 papers with code • 0 benchmarks • 2 datasets
De-identification is the task of detecting privacy-related entities in text, such as person names, emails and contact data.
Benchmarks
These leaderboards are used to track progress in De-identification
Datasets
Most implemented papers
Ego4D: Around the World in 3,000 Hours of Egocentric Video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
Synthesis of Realistic ECG using Generative Adversarial Networks
Finally, we discuss the privacy concerns associated with sharing synthetic data produced by GANs and test their ability to withstand a simple membership inference attack.
Face Identity Disentanglement via Latent Space Mapping
Learning disentangled representations of data is a fundamental problem in artificial intelligence.
Publicly Available Clinical BERT Embeddings
Contextual word embedding models such as ELMo (Peters et al., 2018) and BERT (Devlin et al., 2018) have dramatically improved performance for many natural language processing (NLP) tasks in recent months.
Speech Pseudonymisation Assessment Using Voice Similarity Matrices
The proliferation of speech technologies and rising privacy legislation calls for the development of privacy preservation solutions for speech applications.
The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization
We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods.
De-identification of Patient Notes with Recurrent Neural Networks
It yields an F1-score of 97. 85 on the i2b2 2014 dataset, with a recall 97. 38 and a precision of 97. 32, and an F1-score of 99. 23 on the MIMIC de-identification dataset, with a recall 99. 25 and a precision of 99. 06.
Natural Language Generation for Electronic Health Records
A variety of methods existing for generating synthetic electronic health records (EHRs), but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness or progress notes.
DEDUCE: A pattern matching method for automatic de-identification of Dutch medical text
In order to use medical text for research purposes, it is necessary to de-identify the text for legal and privacy reasons.
Towards Automatic Generation of Shareable Synthetic Clinical Notes Using Neural Language Models
Large-scale clinical data is invaluable to driving many computational scientific advances today.