8 dataset results for Visual Navigation AND Images

The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean dense geometry, high resolution and high dynamic range textures, glass and mirror surface information, planar segmentation as well as semantic class and instance segmentation.

296 PAPERS • 3 BENCHMARKS

SUNCG

SUNCG is a large-scale dataset of synthetic 3D scenes with dense volumetric annotations.

181 PAPERS • NO BENCHMARKS YET

R2R (Room-to-Room)

R2R is a dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. There is a test evaluation server for this dataset available at EvalAI.

144 PAPERS • 2 BENCHMARKS

2D-3D-S (2D-3D-Semantic)

The 2D-3D-S dataset provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. It covers over 6,000 m2 collected in 6 large-scale indoor areas that originate from 3 different buildings. It contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360° equirectangular images) as well as camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large-scale indoor spaces.

129 PAPERS • 8 BENCHMARKS

AVD (Active Vision Dataset)

AVD focuses on simulating robotic vision tasks in everyday indoor environments using real imagery. The dataset includes 20,000+ RGB-D images and 50,000+ 2D bounding boxes of object instances densely captured in 9 unique scenes.

29 PAPERS • 1 BENCHMARK

Talk the Walk

Talk The Walk is a large-scale dialogue dataset grounded in action and perception. The task involves two agents (a “guide” and a “tourist”) that communicate via natural language in order to achieve a common goal: having the tourist navigate to a given target location.

11 PAPERS • NO BENCHMARKS YET

IQUAD (Interactive Question Answering Dataset)

IQUAD is a dataset for Visual Question Answering in interactive environments. It is built upon AI2-THOR, a simulated photo-realistic environment of configurable indoor scenes with interactive object. IQUAD V1 has 75,000 questions, each paired with a unique scene configuration.

6 PAPERS • NO BENCHMARKS YET

Talk2Nav

Talk2Nav is a large-scale dataset with verbal navigation instructions.

2 PAPERS • NO BENCHMARKS YET

Datasets

8 dataset results for Visual Navigation AND Images