xSID, a new evaluation benchmark for cross-lingual (X) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect, covering Arabic (ar), Chinese (zh), Danish (da), Dutch (nl), English (en), German (de), Indonesian (id), Italian (it), Japanese (ja), Kazakh (kk), Serbian (sr), Turkish (tr) and an Austro-Bavarian German dialect, South Tyrolean (de-st).
13 PAPERS • NO BENCHMARKS YET
MultiSubs is a dataset of multilingual subtitles gathered from the OPUS OpenSubtitles dataset, which in turn was sourced from opensubtitles.org. We have supplemented some text fragments (visually salient nouns in this release) within the subtitles with web images, where the word sense of the fragment has been disambiguated using a cross-lingual approach. We have introduced a fill-in-the-blank task and a lexical translation task to demonstrate the utility of the dataset. Please refer to our paper for a more detailed description of the dataset and tasks. Multisubs will benefit research on visual grounding of words especially in the context of free-form sentence.
4 PAPERS • 5 BENCHMARKS
The CareerCoach 2022 gold standard is available for download in the NIF and JSON format, and draws upon documents from a corpus of over 99,000 education courses which have been retrieved from 488 different education providers.
1 PAPER • NO BENCHMARKS YET