CVSS is a massively multilingual-to-English speech to speech translation (S2ST) corpus, covering sentence-level parallel S2ST pairs from 21 languages into English. CVSS is derived from the Common Voice speech corpus and the CoVoST 2 speech-to-text translation (ST) corpus, by synthesizing the translation text from CoVoST 2 into speech using state-of-the-art TTS systems
20 PAPERS • 1 BENCHMARK
Each episode directory contains word-level and segment-level information of the whole episode and also parallel samples extracted under segments_eng and segments_spa subdirectories. Each sample is stored as an WAV audio file, text file and a CSV file containing word timing information and word-level paralinguistic and prosodic features.
1 PAPER • NO BENCHMARKS YET
LibriS2S is a Speech to Speech Translation (S2ST) dataset build further upon existing resources. The dataset provides English-German speech and text quadruplets ranging just over 50 hours for both languages.