10 dataset results for Human-Object Interaction Detection AND Images

HICO-DET is a dataset for detecting human-object interactions (HOI) in images. It contains 47,776 images (38,118 in train set and 9,658 in test set), 600 HOI categories constructed by 80 object categories and 117 verb classes. HICO-DET provides more than 150k annotated human-object pairs. V-COCO provides 10,346 images (2,533 for training, 2,867 for validating and 4,946 for testing) and 16,199 person instances. Each person has annotations for 29 action categories and there are no interaction labels including objects.

157 PAPERS • 5 BENCHMARKS

V-COCO (Verbs in COCO)

Verbs in COCO (V-COCO) is a dataset that builds off COCO for human-object interaction detection. V-COCO provides 10,346 images (2,533 for training, 2,867 for validating and 4,946 for testing) and 16,199 person instances. Each person has annotations for 29 action categories and there are no interaction labels including objects.

138 PAPERS • 1 BENCHMARK

HICO (Humans Interacting with Common Objects)

HICO is a benchmark for recognizing human-object interactions (HOI).

45 PAPERS • 2 BENCHMARKS

RICH (Real scenes, Interaction, Contact and Humans)

Inferring human-scene contact (HSC) is the first step toward understanding how humans interact with their surroundings. While detecting 2D human-object interaction (HOI) and reconstructing 3D human pose and shape (HPS) have enjoyed significant progress, reasoning about 3D human-scene contact from a single image is still challenging. Existing HSC detection methods consider only a few types of predefined contact, often reduce body and scene to a small number of primitives, and even overlook image evidence. To predict human-scene contact from a single image, we address the limitations above from both data and algorithmic perspectives. We capture a new dataset called RICH for “Real scenes, Interaction, Contact and Humans.” RICH contains multiview outdoor/indoor video sequences at 4K resolution, ground-truth 3D human bodies captured using markerless motion capture, 3D body scans, and high resolution 3D scene scans. A key feature of RICH is that it also contains accurate vertex-level contact

38 PAPERS • 1 BENCHMARK

BEHAVE

BEHAVE is a full body human-object interaction dataset with multi-view RGBD frames and corresponding 3D SMPL and object fits along with the annotated contacts between them. Dataset contains ~15k frames at 5 locations with 8 subjects performing a wide range of interactions with 20 common objects.

33 PAPERS • 3 BENCHMARKS

Watch-n-Patch

The Watch-n-Patch dataset was created with the focus on modeling human activities, comprising multiple actions in a completely unsupervised setting. It is collected with Microsoft Kinect One sensor for a total length of about 230 minutes, divided in 458 videos. 7 subjects perform human daily activities in 8 offices and 5 kitchens with complex backgrounds. Moreover, skeleton data are provided as ground truth annotations.

12 PAPERS • NO BENCHMARKS YET

COUCH

COUCH is a large human-chair interaction dataset with clean annotations. The dataset consists of 3 hours and over 500 sequences of motion capture (MoCap) on human-chair interactions.

7 PAPERS • NO BENCHMARKS YET

Ambiguous-HOI

Ambiguous-HOI is a challenging dataset containing ambiguous human-object interaction images for HOI detection based on HICO-DET.

2 PAPERS • NO BENCHMARKS YET

H2O

The Human-to-Human-or-Object Interaction Dataset (H2O) dataset is a dataset for Human-Object Interaction (HOI) detection. It consists in determining and locating the list of triplets <subject,verb,target> which describe all the simultaneous interactions in an image.

1 PAPER • NO BENCHMARKS YET

H²O Interaction (Human-to-Human-or-Object Interaction)

H²O is an image dataset annotated for Human-to-human-or-object interaction detection. H²O is composed of the images from V-COCO dataset to which are added images which mostly contain interactions between people. The dataset has been introduced in this paper: Orcesi, A., Audigier, R., Toukam, F. P., & Luvison, B. (2021, December). Detecting Human-to-Human-or-Object (H 2 O) Interactions with DIABOLO. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) (pp. 1-8). IEEE. The annotations were made with Pixano, an opensource, smart annotation tool for computer vision applications: https://pixano.cea.fr/

0 PAPER • NO BENCHMARKS YET

Datasets

10 dataset results for Human-Object Interaction Detection AND Images