2 dataset results for Word Embeddings AND Chechen

WikiMatrix is a dataset of parallel sentences in the textual content of Wikipedia for all possible language pairs. The mined data consists of:

87 PAPERS • NO BENCHMARKS YET

WikiANN

WikiANN (PAN-X)

WikiANN, also known as PAN-X, is a multilingual named entity recognition dataset. It consists of Wikipedia articles that have been annotated with LOC (location), PER (person), and ORG (organization) tags in the IOB2 format¹². This dataset serves as a valuable resource for training and evaluating named entity recognition models across various languages.

60 PAPERS • 3 BENCHMARKS

Datasets

2 dataset results for Word Embeddings AND Chechen