The ICDAR 2013 dataset consists of 229 training images and 233 testing images, with word-level annotations provided. It is the standard benchmark dataset for evaluating near-horizontal text detection.
232 PAPERS • 3 BENCHMARKS
Total-Text is a text detection dataset that consists of 1,555 images with a variety of text types including horizontal, multi-oriented, and curved text instances. The training split and testing split have 1,255 images and 300 images, respectively.
145 PAPERS • 2 BENCHMARKS
The MSRA-TD500 dataset is a text detection dataset that contains 300 training images and 200 test images. Text regions are arbitrarily orientated and annotated at sentence level. Different from the other datasets, it contains both English and Chinese text.
121 PAPERS • 1 BENCHMARK
The COCO-Text dataset is a dataset for text detection and recognition. It is based on the MS COCO dataset, which contains images of complex everyday scenes. The COCO-Text dataset contains non-text images, legible text images and illegible text images. In total there are 22184 training images and 7026 validation images with at least one instance of legible text.
80 PAPERS • 2 BENCHMARKS
The SCUT-CTW1500 dataset contains 1,500 images: 1,000 for training and 500 for testing. In particular, it provides 10,751 cropped text instance images, including 3,530 with curved text. The images are manually harvested from the Internet, image libraries such as Google Open-Image, or phone cameras. The dataset contains a lot of horizontal and multi-oriented text.
41 PAPERS • 3 BENCHMARKS
TextOCR is a dataset to benchmark text recognition on arbitrary shaped scene-text. TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.
23 PAPERS • NO BENCHMARKS YET
The PKU dataset has almost 4,000 images categorized into five groups (G1-G5) that show different situations. For example, G1 has images of highways during the day with only one car in them. On the other hand, G5 has images of crosswalks during the day or at night with multiple cars and license plates (LPs).
2 PAPERS • NO BENCHMARKS YET
This dataset is an extremely challenging set of over 20,000+ original Number plate images captured and crowdsourced from over 700+ urban and rural areas, where each image is manually reviewed and verified by computer vision professionals at Datacluster Labs
0 PAPER • NO BENCHMARKS YET
This dataset contains images and annotations for scene text detection and recognition. It is made up of two parts: (1) 1,175 images manually labeled with a total of 59,588 text instances at the line and word levels; and (2) 929 signboard images collected from the VinText, Total-Text, and ICDAR15 datasets. Each text instance in the first part of our dataset has a quadrilateral bounding box and a ground truth character sequence associated with it. In the second part, images are selected if they contain signboards. This portion of the dataset comprises 20,261 text instances at word levels. This brings the total text instances in our final dataset up to 79,814. Following the ICDAR15 standard, we annotated each image with all of the text instances, polygons, and content that were present. Manual annotations were done on each and every image.