3 dataset results for Formal Logic AND Texts

BIG-bench (Beyond the Imitation Game Benchmark)

The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Big-bench include more than 200 tasks.

238 PAPERS • 134 BENCHMARKS

Mindgames

We generate epistemic reasoning problems using modal logic to target theory of mind (tom) in natural language processing models.

3 PAPERS • 1 BENCHMARK

probability_words_nli

This dataset tests the capabilities of language models to correctly capture the meaning of words denoting probabilities (WEP, also called verbal probabilities), e.g. words like "probably", "maybe", "surely", "impossible".

1 PAPER • NO BENCHMARKS YET

Datasets

3 dataset results for Formal Logic AND Texts