StrategyQA
14 papers with code • 0 benchmarks • 0 datasets
StrategyQA aims to measure the ability of models to answer questions that require multi-step implicit reasoning.
Source: BIG-bench
Benchmarks
These leaderboards are used to track progress in StrategyQA
Most implemented papers
PaLM: Scaling Language Modeling with Pathways
To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model PaLM.
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.
Training Compute-Optimal Large Language Models
We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget.
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
A key limitation in current datasets for multi-hop reasoning is that the required steps for answering the question are mentioned in it explicitly.
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.
Distilling Reasoning Capabilities into Smaller Language Models
In this work, we propose an alternative reasoning scheme, Socratic CoT, that learns a decomposition of the original problem into a sequence of subproblems and uses it to guide the intermediate reasoning steps.
Visconde: Multi-document QA with GPT-3 and Neural Reranking
This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents.
Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks
Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge.
Teaching Smaller Language Models To Generalise To Unseen Compositional Questions
We equip a smaller Language Model to generalise to answering challenging compositional questions that have not been seen in training.
Tailoring Self-Rationalizers with Multi-Reward Distillation
Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline.