This dataset contains images and annotations for scene text detection and recognition. It is made up of two parts: (1) 1,175 images manually labeled with a total of 59,588 text instances at the line and word levels; and (2) 929 signboard images collected from the VinText, Total-Text, and ICDAR15 datasets. Each text instance in the first part of our dataset has a quadrilateral bounding box and a ground truth character sequence associated with it. In the second part, images are selected if they contain signboards. This portion of the dataset comprises 20,261 text instances at word levels. This brings the total text instances in our final dataset up to 79,814. Following the ICDAR15 standard, we annotated each image with all of the text instances, polygons, and content that were present. Manual annotations were done on each and every image.
0 PAPER • NO BENCHMARKS YET