R2R is a dataset for visually-grounded natural language navigation in real buildings. The dataset requires autonomous agents to follow human-generated navigation instructions in previously unseen buildings, as illustrated in the demo above. For training, each instruction is associated with a Matterport3D Simulator trajectory. 22k instructions are available, with an average length of 29 words. There is a test evaluation server for this dataset available at EvalAI.
144 PAPERS • 2 BENCHMARKS
Talk The Walk is a large-scale dialogue dataset grounded in action and perception. The task involves two agents (a “guide” and a “tourist”) that communicate via natural language in order to achieve a common goal: having the tourist navigate to a given target location.
11 PAPERS • NO BENCHMARKS YET
ReaSCAN is a synthetic navigation task that requires models to reason about surroundings over syntactically difficult languages.
7 PAPERS • NO BENCHMARKS YET
BnB is a large-scale and diverse in-domain VLN (Vision and Language Navigation) dataset.
3 PAPERS • NO BENCHMARKS YET
Situated Dialogue Navigation (SDN) is a navigation benchmark of 183 trials with a total of 8415 utterances, around 18.7 hours of control streams, and 2.9 hours of trimmed audio. SDN is developed to evaluate the agent's ability to predict dialogue moves from humans as well as generate its own dialogue moves and physical navigation actions.
2 PAPERS • NO BENCHMARKS YET
The XL-R2R dataset is built upon the R2R dataset and extends it with Chinese instructions. XL-R2R preserves the same splits as in R2R and thus consists of train, val-seen, and val-unseen splits with both English and Chinese instructions, and test split with English instructions only.