2 dataset results for Audio-Visual Active Speaker Detection AND Videos

AVA is a project that provides audiovisual annotations of video for improving our understanding of human activity. Each of the video clips has been exhaustively annotated by human annotators, and together they represent a rich variety of scenes, recording conditions, and expressions of human activity. There are annotations for:

98 PAPERS • 7 BENCHMARKS

VPCD (Video Person-Clustering)

VPCD contains multi-modal annotations (face, body and voice) for all primary and secondary characters from a range of diverse TV-shows and movies. It is used for evaluating multi-modal person-clustering. It contains body-tracks for each annotated character, face-tracks when visible, and voice-tracks when speaking, with their associated features.

7 PAPERS • 1 BENCHMARK

Datasets

2 dataset results for Audio-Visual Active Speaker Detection AND Videos