General Language Understanding Evaluation (GLUE) benchmark is a collection of nine natural language understanding tasks, including single-sentence tasks CoLA and SST-2, similarity and paraphrasing tasks MRPC, STS-B and QQP, and natural language inference tasks MNLI, QNLI, RTE and WNLI.
2,776 PAPERS • 25 BENCHMARKS
The Corpus of Linguistic Acceptability (CoLA) consists of 10657 sentences from 23 linguistics publications, expertly annotated for acceptability (grammaticality) by their original authors. The public version contains 9594 sentences belonging to training and development sets, and excludes 1063 sentences belonging to a held out test set.
627 PAPERS • 3 BENCHMARKS
DaLAJ 1.0, a dataset for Linguistic Acceptability Judgments for Swedish, comprising 9,596 sentences in its first version; and the initial experiment using it for the binary classification task. DaLAJ is based on the SweLL second language learner data, consisting of essays at different levels of proficiency.
5 PAPERS • 1 BENCHMARK
ItaCoLA is a corpus for monolingual and cross-lingual acceptability judgments which contains almost 10,000 sentences with acceptability judgments.
The Russian Corpus of Linguistic Acceptability (RuCoLA) is built from the ground up under the well-established binary LA approach. RuCoLA consists of 9.8k in-domain sentences from linguistic publications and 3.6k out-of-domain sentence produced by generative models.
4 PAPERS • 1 BENCHMARK