6 dataset results for Vision-Language Navigation AND English

TEACh (Task-driven Embodied Agents that Chat)

Robots operating in human spaces must be able to engage in natural language interaction with people, both understanding and executing instructions, and using conversation to resolve ambiguity and recover from mistakes. To study this, we introduce TEACh, a dataset of over 3,000 human--human, interactive dialogues to complete household tasks in simulation. A Commander with access to oracle information about a task communicates in natural language with a Follower. The Follower navigates through and interacts with the environment to complete tasks varying in complexity from "Make Coffee" to "Prepare Breakfast", asking questions and getting additional information from the Commander. We propose three benchmarks using TEACh to study embodied intelligence challenges, and we evaluate initial models' abilities in dialogue understanding, language grounding, and task execution.

25 PAPERS • NO BENCHMARKS YET

Talk the Walk

Talk The Walk is a large-scale dialogue dataset grounded in action and perception. The task involves two agents (a “guide” and a “tourist”) that communicate via natural language in order to achieve a common goal: having the tourist navigate to a given target location.

11 PAPERS • NO BENCHMARKS YET

ReaSCAN

ReaSCAN (ReaSCAN: Compositional Reasoning in Language Grounding)

ReaSCAN is a synthetic navigation task that requires models to reason about surroundings over syntactically difficult languages.

7 PAPERS • NO BENCHMARKS YET

BnB

BnB is a large-scale and diverse in-domain VLN (Vision and Language Navigation) dataset.

3 PAPERS • NO BENCHMARKS YET

SDN (Situated Dialogue Navigation)

Situated Dialogue Navigation (SDN) is a navigation benchmark of 183 trials with a total of 8415 utterances, around 18.7 hours of control streams, and 2.9 hours of trimmed audio. SDN is developed to evaluate the agent's ability to predict dialogue moves from humans as well as generate its own dialogue moves and physical navigation actions.

2 PAPERS • NO BENCHMARKS YET

XL-R2R

XL-R2R (Cross-lingual Room-to-Room)

The XL-R2R dataset is built upon the R2R dataset and extends it with Chinese instructions. XL-R2R preserves the same splits as in R2R and thus consists of train, val-seen, and val-unseen splits with both English and Chinese instructions, and test split with English instructions only.

2 PAPERS • NO BENCHMARKS YET

Datasets

6 dataset results for Vision-Language Navigation AND English