4 dataset results for Weakly-Supervised Named Entity Recognition AND English

CoNLL 2003

CoNLL-2003 is a named entity recognition dataset released as a part of CoNLL-2003 shared task: language-independent named entity recognition. The data consists of eight files covering two languages: English and German. For each of the languages there is a training file, a development file, a test file and a large file with unannotated data.

648 PAPERS • 16 BENCHMARKS

OntoNotes 5.0

OntoNotes 5.0 is a large corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows) in three languages (English, Chinese, and Arabic) with structural information (syntax and predicate argument structure) and shallow semantics (word sense linked to an ontology and coreference).

240 PAPERS • 11 BENCHMARKS

BC5CDR (BioCreative V CDR corpus)

BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions.

176 PAPERS • 6 BENCHMARKS

ShARe/CLEF 2014: Task 2 Disorders

3 PAPERS • 2 BENCHMARKS

Datasets

4 dataset results for Weakly-Supervised Named Entity Recognition AND English