The MS COCO (Microsoft Common Objects in Context) dataset is a large-scale object detection, segmentation, key-point detection, and captioning dataset. The dataset consists of 328K images.
10,363 PAPERS • 93 BENCHMARKS
Outside Knowledge Visual Question Answering (OK-VQA) includes more than 14,000 questions that require external knowledge to answer.
268 PAPERS • 2 BENCHMARKS
VQG is a collection of datasets for visual question generation. VQG questions were collected by crowdsourcing the task on Amazon Mechanical Turk (AMT). The authors provided details on the prompt and the specific instructions for all the crowdsourcing tasks in this paper in the supplementary material. The prompt was successful at capturing nonliteral questions. Images were taken from the MSCOCO dataset.
78 PAPERS • 1 BENCHMARK
The MMD (MultiModal Dialogs) dataset is a dataset for multimodal domain-aware conversations. It consists of over 150K conversation sessions between shoppers and sales agents, annotated by a group of in-house annotators using a semi-automated manually intense iterative process.
18 PAPERS • NO BENCHMARKS YET
ARID is a large-scale, multi-view object dataset collected with an RGB-D camera mounted on a mobile robot.
5 PAPERS • NO BENCHMARKS YET