WinoBias contains 3,160 sentences, split equally for development and test, created by researchers familiar with the project. Sentences were created to follow two prototypical templates but annotators were encouraged to come up with scenarios where entities could be interacting in plausible ways. Templates were selected to be challenging and designed to cover cases requiring semantics and syntax separately.
109 PAPERS • NO BENCHMARKS YET
The HELP dataset is an automatically created natural language inference (NLI) dataset that embodies the combination of lexical and logical inferences focusing on monotonicity (i.e., phrase replacement-based reasoning). The HELP (Ver.1.0) has 36K inference pairs consisting of upward monotone, downward monotone, non-monotone, conjunction, and disjunction.
28 PAPERS • 1 BENCHMARK
CI-MNIST (Correlated and Imbalanced MNIST) is a variant of MNIST dataset with introduced different types of correlations between attributes, dataset features, and an artificial eligibility criterion. For an input image $x$, the label $y \in \{1, 0\}$ indicates eligibility or ineligibility, respectively, given that $x$ is even or odd. The dataset defines the background colors as the protected or sensitive attribute $s \in \{0, 1\}$, where blue denotes the unprivileged group and red denotes the privileged group. The dataset was designed in order to evaluate bias-mitigation approaches in challenging setups and be capable of controlling different dataset configurations.
4 PAPERS • NO BENCHMARKS YET
The Dialogue Fairness dataset is used to evaluate and understand fairness in dialogue models, focusing on gender and racial biases.
2 PAPERS • NO BENCHMARKS YET
A maintained database tracks ICLR submissions and reviews, augmented with author profiles and higher-level textual features.
1 PAPER • NO BENCHMARKS YET
ec-darkpattern is a dataset for dark pattern detection and prepared its baseline detection performance with state-of-the-art machine learning methods. The original dataset was obtained from Mathur et al.’s study in 2019 [11kScale], which consists of 1,818 dark pattern texts from shopping sites. Negative samples, i.e., non-dark pattern texts, by retrieving texts from the same websites as Mathur et al.'s dataset.