The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean dense geometry, high resolution and high dynamic range textures, glass and mirror surface information, planar segmentation as well as semantic class and instance segmentation.
296 PAPERS • 3 BENCHMARKS
SUNCG is a large-scale dataset of synthetic 3D scenes with dense volumetric annotations.
181 PAPERS • NO BENCHMARKS YET
R2R is a dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. There is a test evaluation server for this dataset available at EvalAI.
144 PAPERS • 2 BENCHMARKS
The 2D-3D-S dataset provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. It covers over 6,000 m2 collected in 6 large-scale indoor areas that originate from 3 different buildings. It contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360° equirectangular images) as well as camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large-scale indoor spaces.
129 PAPERS • 8 BENCHMARKS
AVD focuses on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes.
29 PAPERS • 1 BENCHMARK
Talk The Walk is a large-scale dialogue dataset grounded in action and perception. The task involves two agents (a “guide” and a “tourist”) that communicate via natural language in order to achieve a common goal: having the tourist navigate to a given target location.
11 PAPERS • NO BENCHMARKS YET
IQUAD is a dataset for Visual Question Answering in interactive environments. It is built upon AI2-THOR, a simulated photo-realistic environment of configurable indoor scenes with interactive object. IQUAD V1 has 75,000 questions, each paired with a unique scene configuration.
6 PAPERS • NO BENCHMARKS YET
Talk2Nav is a large-scale dataset with verbal navigation instructions.
2 PAPERS • NO BENCHMARKS YET