A Brazilian Portuguese TTS dataset featuring a female voice recorded with high quality in a controlled environment, with neutral emotion and more than 20 hours of recordings. with neutral emotion and more than 20 hours of recordings. Our dataset aims to facilitate transfer learning for researchers and developers working on TTS applications: a highly professional neutral female voice can serve as a good warm-up stage for learning language-specific structures, pronunciation and other non-individual characteristics of speech, leaving to further training procedures only to learn the specific adaptations needed (e.g. timbre, emotion and prosody). This can surely help enabling the accommodation of a more diverse range of female voices in Brazilian Portuguese. By doing so, we also hope to contribute to the development of accessible and high-quality TTS systems for several use cases such as virtual assistants, audiobooks, language learning tools and accessibility solutions.
1 PAPER • NO BENCHMARKS YET
A database containing high sampling rate recordings of a single speaker reading sentences in Brazilian Portuguese with neutral voice, along with the corresponding text corpus. Intended for speech synthesis and automatic speech recognition applications, the dataset contains text extracted from a popular Brazilian news TV program, totalling roughly 20 h of audio spoken by a trained individual in a controlled environment. The text was normalized in the recording process and special textual occurrences (e.g. acronyms, numbers, foreign names etc.) were replaced by their phonetic translation to a readable text in Portuguese. There are no noticeable accidental sounds and background noise has been kept to a minimum in all audio samples.