MathVista is a consolidated Mathematical reasoning benchmark within Visual contexts. It consists of three newly created datasets, IQTest, FunctionQA, and PaperQA, which address the missing visual domains and are tailored to evaluate logical reasoning on puzzle test figures, algebraic reasoning over functional plots, and scientific reasoning with academic paper figures, respectively. It also incorporates 9 MathQA datasets and 19 VQA datasets from the literature, which significantly enrich the diversity and complexity of visual perception and mathematical reasoning challenges within our benchmark. In total, MathVista includes 6,141 examples collected from 31 different datasets.
36 PAPERS • NO BENCHMARKS YET
MathBench is an All in One math dataset for language model evaluation, with:
3 PAPERS • NO BENCHMARKS YET
Conic10K is an open-ended math problem dataset on conic sections in Chinese senior high school education. This dataset contains 10,861 carefully annotated problems, each one has a formal representation, the corresponding text spans, the answer, and natural language rationales. These questions require long reasoning steps while the topic is limited to conic sections. It could be used to evaluate models with 2 tasks: semantic parsing and mathematical question answering (mathQA).
1 PAPER • NO BENCHMARKS YET
MGSM8KInstruct, the multilingual math reasoning instruction dataset, encompassing ten distinct languages, thus addressing the issue of training data scarcity in multilingual math reasoning.