Form Understanding in Noisy Scanned Documents (FUNSD) comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking.
147 PAPERS • 3 BENCHMARKS
In this project, we formally present the task of Open-domain Visual Entity recognitioN (OVEN), where a model need to link an image onto a Wikipedia entity with respect to a text query. We construct OVEN-Wiki by re-purposing 14 existing datasets with all labels grounded onto one single label space: Wikipedia entities. OVEN challenges models to select among six million possible Wikipedia entities, making it a general visual recognition benchmark with the largest number of labels.
11 PAPERS • 1 BENCHMARK
MEIR is a substantially challenging dataset over that which has been previously available to support research into image repurposing detection. The new dataset includes location, person, and organization manipulations on real-world data sourced from Flickr.
8 PAPERS • NO BENCHMARKS YET
Twitter-MEL is a multimodal entity linking (MEL) dataset built from Twitter. The dataset consists of tweets that had both text and images, with a total of 2.6M timeline tweets and 20k entities.
3 PAPERS • NO BENCHMARKS YET
WIKIPerson is a high-quality human-annotated visual person linking dataset based on Wikipedia. The dataset contains a total of 48k different news images, covering 13k out of 120K Person named entities, each of which corresponds to a celebrity in Wikipedia. Unlike previously commonly-used datasets in EL, the mention in WIKIPerson is only an image containing the person entity with its bounding box. The corresponding label identifies a unique entity in Wikipedia. For each entity in the Wikipedia, we provide textual descriptions as well as images to satisfy the need of three sub-tasks.