1 dataset result for Scene Understanding AND Texts

The VideoNavQA dataset contains pairs of questions and videos generated in the House3D environment. The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the Embodied Question Answering (EQA) task.

3 PAPERS • NO BENCHMARKS YET

Datasets

1 dataset result for Scene Understanding AND Texts