4 dataset results for Intent Classification AND English

SLURP (Spoken Language Understanding Resource Package)

A new challenging dataset in English spanning 18 domains, which is substantially bigger and linguistically more diverse than existing datasets.

84 PAPERS • 2 BENCHMARKS

xSID (Cross-lingual Slot and Intent Detection)

xSID, a new evaluation benchmark for cross-lingual (X) Slot and Intent Detection in 13 languages from 6 language families, including a very low-resource dialect, covering Arabic (ar), Chinese (zh), Danish (da), Dutch (nl), English (en), German (de), Indonesian (id), Italian (it), Japanese (ja), Kazakh (kk), Serbian (sr), Turkish (tr) and an Austro-Bavarian German dialect, South Tyrolean (de-st).

13 PAPERS • NO BENCHMARKS YET

ORCAS-I

ORCAS-I (Queries Annotated with Intent using Weak Supervision)

A labelled version of the ORCAS click-based dataset of Web queries, which provides 18 million connections to 10 million distinct queries.

1 PAPER • 1 BENCHMARK

Skit-S2I

Skit-S2I (Skit-S2I: An Indian Accented Speech to Intent dataset)

This dataset for Intent classification from human speech covers 14 coarse-grained intents from the Banking domain. This work is inspired by a similar release in the Minds-14 dataset - here, we restrict ourselves to Indian English but with a much larger training set. The data was generated by 11 (Indian English) speakers and recorded over a telephony line. We also provide access to anonymized speaker information - like gender, languages spoken, and native language - to allow more structured discussions around robustness and bias in the models you train.

1 PAPER • 1 BENCHMARK

Datasets

4 dataset results for Intent Classification AND English