Elementary Mathematics

5 papers with code • 1 benchmarks • 3 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Elementary Mathematics

Trend	Dataset	Best Model	Paper	Code	Compare
	BIG-bench	Chinchilla (few-shot, k=5)			See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Measuring Massive Multitask Language Understanding

hendrycks/test • 7 Sep 2020

By comprehensively evaluating the breadth and depth of a model's academic and professional understanding, our test can be used to analyze models across many tasks and to identify important shortcomings.

Paper
Code

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

allenai/dolma • NA 2021

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Paper
Code

Mathematical Capabilities of ChatGPT

snfrieder/ghosts • NeurIPS 2023

We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology.

Paper
Code

The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts

ddemszky/classroom-transcript-analysis • • 21 Nov 2022

Classroom discourse is a core medium of instruction - analyzing it can provide a window into teaching and learning as well as driving the development of new tools for improving instruction.

Paper
Code

An Empirical Study on Challenging Math Problem Solving with GPT-4

microsoft/autogen • 2 Jun 2023

Employing Large Language Models (LLMs) to address mathematical problems is an intriguing research endeavor, considering the abundance of math problems expressed in natural language across numerous science and engineering fields.

Paper
Code

Elementary Mathematics

Benchmarks Add a Result

Datasets

Most implemented papers

Measuring Massive Multitask Language Understanding

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

Mathematical Capabilities of ChatGPT

The NCTE Transcripts: A Dataset of Elementary Math Classroom Transcripts

An Empirical Study on Challenging Math Problem Solving with GPT-4

Content

Benchmarks

Add a Result