A dataset of natural language data collected by putting together more than 150 existing mono-lingual and multilingual datasets together and crawling known multilingual websites. The focus of this dataset is on 500 extremely low-resource languages.
Github: https://github.com/cisnlp/Glot500
Paper | Code | Results | Date | Stars |
---|