2 dataset results for Speaker Recognition AND Videos

VGG-Sound

Consists of more than 210k videos for 310 audio classes.

155 PAPERS • 3 BENCHMARKS

MAVS

MAVS (Multilingual Audio-Visual Smartphone dataset)

MAVS is an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems.

1 PAPER • NO BENCHMARKS YET

Datasets

2 dataset results for Speaker Recognition AND Videos