Extracted from the Tashkeela Corpus, the dataset consists of 55K lines containing about 2.3M words.
16 PAPERS • 1 BENCHMARK