SCIREX is a document level IE dataset that encompasses multiple IE tasks, including salient entity identification and document level N-ary relation identification from scientific articles. The dataset is annotated by integrating automatic and human annotations, leveraging existing scientific knowledge resources.
32 PAPERS • 2 BENCHMARKS
BioRED is a first-of-its-kind biomedical relation extraction dataset with multiple entity types (e.g. gene/protein, disease, chemical) and relation pairs (e.g. gene–disease; chemical–chemical) at the document level, on a set of600 PubMed abstracts. Furthermore, BioRED label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information.
15 PAPERS • 3 BENCHMARKS
The Bacteria Biotope (BB) Task is part of the BioNLP Open Shared Tasks and meets the BioNLP-OST standards of quality, originality and data formats. Manually annotated data is provided for training, development and evaluation of information extraction methods. Tools for the detailed evaluation of system outputs are available. Support in performing linguistic processing are provided in the form of analyses created by various state-of-the art tools on the dataset texts.
8 PAPERS • 2 BENCHMARKS