WSJ0-2mix is a speech recognition corpus of speech mixtures using utterances from the Wall Street Journal (WSJ0) corpus.
144 PAPERS • 2 BENCHMARKS
LibriMix is an open-source alternative to wsj0-2mix. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!.
82 PAPERS • 1 BENCHMARK
The WSJ0 Hipster Ambient Mixtures (WHAM!) dataset pairs each two-speaker mixture in the wsj0-2mix dataset with a unique noise background scene. It has an extension called WHAMR! that adds artificial reverberation to the speech signals in addition to the background noise.
81 PAPERS • 5 BENCHMARKS
WHAMR! is a dataset for noisy and reverberant speech separation. It extends WHAM! by introducing synthetic reverberation to the speech sources in addition to the existing noise. Room impulse responses were generated and convolved using pyroomacoustics. Reverberation times were chosen to approximate domestic and classroom environments (expected to be similar to the restaurants and coffee shops where the WHAM! noise was collected), and further classified as high, medium, and low reverberation based on a qualitative assessment of the mixture’s noise recording.
46 PAPERS • 3 BENCHMARKS
AVSpeech is a large-scale audio-visual dataset comprising speech clips with no interfering background signals. The segments are of varying length, between 3 and 10 seconds long, and in each clip the only visible face in the video and audible sound in the soundtrack belong to a single speaking person. In total, the dataset contains roughly 4700 hours of video segments with approximately 150,000 distinct speakers, spanning a wide variety of people, languages and face poses.
37 PAPERS • NO BENCHMARKS YET
Kinect-WSJ is a multichannel, multispeaker, reverberated, noisy dataset which extends the WSJ0-2mix singlechannel, non-reverberated, noiseless dataset to the strong reverberation and noise conditions and the Kinect-like microphone array geometry used in CHiME-5.
1 PAPER • NO BENCHMARKS YET