CoWeSe is a Spanish biomedical corpus consisting of 4.5GB (about 750M tokens) of clean plain text. CoWeSe is the result of a massive crawler on 3000 Spanish domains executed in 2020.
3 PAPERS • NO BENCHMARKS YET