Evaluating human-scene interaction requires precise annotations for camera pose and scene geometry. However, such information is not available in existing datasets for egocentric human pose estimation. To solve this issue, we collected a new real-world dataset using a head-mounted fisheye camera combined with a calibration board. The ground truth scene geometry is obtained with the SfM method from a multi-view capture system with 120 synced 4K resolution cameras and the ground truth egocentric camera pose is obtained by localizing a calibration board rigidly attached to the egocentric camera. This dataset contains around 28K frames of two actors, performing various human-scene interacting motions such as sitting, reading a newspaper, and using a computer. This dataset is evenly split into training and testing splits. We fine-tuned the method on the training split before the evaluation. This dataset will be made publicly available and additional details of it are shown in the supplementary materials.
Paper | Code | Results | Date | Stars |
---|