Consists of more than 210k videos for 310 audio classes.
155 PAPERS • 3 BENCHMARKS
MAVS is an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems.
1 PAPER • NO BENCHMARKS YET