Europarl-ASR (EN) is a 1300-hour English-language speech and text corpus of parliamentary debates for (streaming) Automatic Speech Recognition training and benchmarking, speech data filtering and speech data verbatimization, based on European Parliament speeches and their official transcripts (1996-2020). Includes dev-test sets for streaming ASR benchmarking, made up of 18 hours of manually revised speeches. The availability of manual non-verbatim and verbatim transcripts for dev-test speeches makes this corpus also useful for the assessment of automatic filtering and verbatimization techniques. The corpus is released under an open licence at https://www.mllp.upv.es/europarl-asr/
5 PAPERS • 2 BENCHMARKS
The CropAndWeed dataset is focused on the fine-grained identification of 74 relevant crop and weed species with a strong emphasis on data variability. Annotations of labeled bounding boxes, semantic masks and stem positions are provided for about 112k instances in more than 8k high-resolution images of both real-world agricultural sites and specifically cultivated outdoor plots of rare weed types. Additionally, each sample is enriched with meta-annotations regarding environmental conditions.
4 PAPERS • NO BENCHMARKS YET
The Apron Dataset focuses on training and evaluating classification and detection models for airport-apron logistics. In addition to bounding boxes and object categories the dataset is enriched with meta parameters to quantify the models’ robustness against environmental influences.
1 PAPER • NO BENCHMARKS YET