The SWC is a corpus of aligned Spoken Wikipedia articles from the English, German, and Dutch Wikipedia. This corpus has several outstanding characteristics:
- hundreds of hours of aligned audio
- from a diverse set of readers
- about a diverse set of topics
- in a well-researched textual genre
- licensed under a free license (CC BY-SA 4.0)
- Annotations can be mapped back to the original html
- phoneme-level alignments