no code implementations • EMNLP 2020 • Vered Shwartz, Rachel Rudinger, Oyvind Tafjord
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models.
no code implementations • Findings (EMNLP) 2021 • Faeze Brahman, Meng Huang, Oyvind Tafjord, Chao Zhao, Mrinmaya Sachan, Snigdha Chaturvedi
When reading a literary piece, readers often make inferences about various characters’ roles, personalities, relationships, intents, actions, etc.
no code implementations • 22 Feb 2024 • Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme
Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic.
2 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
1 code implementation • 31 Jan 2024 • Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo
As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations.
no code implementations • 16 Dec 2023 • Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge
We invite submissions to our benchmark and organize results by comparability based on compliance with guidelines such as removal of benchmark contamination from pretraining.
1 code implementation • 15 Dec 2023 • Dirk Groeneveld, Anas Awadalla, Iz Beltagy, Akshita Bhagia, Ian Magnusson, Hao Peng, Oyvind Tafjord, Pete Walsh, Kyle Richardson, Jesse Dodge
The success of large language models has shifted the evaluation paradigms in natural language processing (NLP).
no code implementations • 12 Dec 2023 • Peter Clark, Bhavana Dalvi Mishra, Oyvind Tafjord
This shows the clear progression of models towards improved factual accuracy and entailment reasoning, and the dataset provides a new benchmark that more cleanly separates and quantifies these two notions.
no code implementations • 16 Nov 2023 • Yuling Gu, Oyvind Tafjord, Peter Clark
While LLMs can provide reasoned explanations along with their answers, the nature and quality of those explanations are still poorly understood.
no code implementations • 16 Oct 2023 • Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark
Language agents have shown some ability to interact with an external environment, e. g., a virtual world such as ScienceWorld, to perform complex tasks, e. g., growing a plant, without the startup costs of reinforcement learning.
1 code implementation • 24 May 2023 • Sarah Wiegreffe, Matthew Finlayson, Oyvind Tafjord, Peter Clark, Ashish Sabharwal
For example, both normalization and prompting methods for reducing SFC can be ineffective or even detrimental to task performance for some LMs.
no code implementations • 23 May 2023 • Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schuetze, Peter Clark
To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs.
1 code implementation • 31 Oct 2022 • Swaroop Mishra, Matthew Finlayson, Pan Lu, Leonard Tang, Sean Welleck, Chitta Baral, Tanmay Rajpurohit, Oyvind Tafjord, Ashish Sabharwal, Peter Clark, Ashwin Kalyan
Mathematical reasoning skills are essential for general-purpose intelligent systems to perform tasks from grocery shopping to climate modeling.
Ranked #1 on Mathematical Reasoning on Lila (OOD)
no code implementations • 21 Oct 2022 • Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark
Our goal is a question-answering (QA) system that can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning.
1 code implementation • 20 Sep 2022 • Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, Ashwin Kalyan
We further design language models to learn to generate lectures and explanations as the chain of thought (CoT) to mimic the multi-hop reasoning process when answering ScienceQA questions.
Ranked #6 on Science Question Answering on ScienceQA
no code implementations • 27 Apr 2022 • Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Clark
Our goal is a teachable reasoning system for question-answering (QA), where a user can interact with faithful answer explanations, and correct its errors so that the system improves over time.
no code implementations • EMNLP 2021 • Nora Kassner, Oyvind Tafjord, Hinrich Schütze, Peter Clark
We show that, in a controlled experimental setting, these two mechanisms result in more consistent beliefs in the overall system, improving both the accuracy and consistency of its answers over time.
no code implementations • 12 Sep 2021 • Faeze Brahman, Meng Huang, Oyvind Tafjord, Chao Zhao, Mrinmaya Sachan, Snigdha Chaturvedi
When reading a literary piece, readers often make inferences about various characters' roles, personalities, relationships, intents, actions, etc.
2 code implementations • 6 Sep 2021 • Oyvind Tafjord, Peter Clark
Despite the successes of pretrained language models, there are still few high-quality, general-purpose QA systems that are freely available.
1 code implementation • EMNLP 2021 • Bhavana Dalvi, Peter Jansen, Oyvind Tafjord, Zhengnan Xie, Hannah Smith, Leighanna Pipatanangkura, Peter Clark
Our approach is to generate explanations in the form of entailment trees, namely a tree of multipremise entailment steps from facts that are known, through intermediate conclusions, to the hypothesis of interest (namely the question + answer).
no code implementations • 16 Apr 2021 • Nora Kassner, Oyvind Tafjord, Hinrich Schutze, Peter Clark
(This is an old and now obsolete draft.
no code implementations • 5 Feb 2021 • Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark
We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset.
no code implementations • Findings (ACL) 2021 • Oyvind Tafjord, Bhavana Dalvi Mishra, Peter Clark
In this work we show that a generative model, called ProofWriter, can reliably generate both implications of a theory and the natural language proof(s) that support them.
1 code implementation • NeurIPS 2020 • Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, Jonathan Berant
In this work, we provide a first demonstration that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh Hajishirzi
As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats.
Ranked #5 on Common Sense Reasoning on WinoGrande
1 code implementation • 6 Apr 2020 • Vered Shwartz, Rachel Rudinger, Oyvind Tafjord
Pre-trained language models (LMs) may perpetuate biases originating in their training corpus to downstream models.
2 code implementations • 14 Feb 2020 • Peter Clark, Oyvind Tafjord, Kyle Richardson
However, expressing the knowledge in a formal (logical or probabilistic) representation has been a major obstacle to this research.
1 code implementation • ACL 2020 • Lucy Lu Wang, Oyvind Tafjord, Arman Cohan, Sarthak Jain, Sam Skjonsberg, Carissa Schoenick, Nick Botner, Waleed Ammar
We fine-tune the contextualized word representations of the RoBERTa language model using labeled DDI data, and apply the fine-tuned model to identify supplement interactions.
no code implementations • IJCNLP 2019 • Oyvind Tafjord, Matt Gardner, Kevin Lin, Peter Clark
QuaRTz contains general qualitative statements, e. g., "A sunscreen with a higher SPF protects the skin longer.
no code implementations • 4 Sep 2019 • Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin, Michael Schmitz
This paper reports unprecedented success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90% on the exam's non-diagram, multiple choice (NDMC) questions.
no code implementations • WS 2019 • Kevin Lin, Oyvind Tafjord, Peter Clark, Matt Gardner
A system is presented a background passage containing at least one of these relations, a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the background passage in the context of the situation.
1 code implementation • LREC 2020 • Dongfang Xu, Peter Jansen, Jaycie Martin, Zhengnan Xie, Vikas Yadav, Harish Tayyar Madabushi, Oyvind Tafjord, Peter Clark
Prior work has demonstrated that question classification (QC), recognizing the problem domain of a question, can help answer it more accurately.
1 code implementation • 1 May 2019 • Arindam Mitra, Peter Clark, Oyvind Tafjord, Chitta Baral
While in recent years machine learning (ML) based approaches have been the popular approach in developing end-to-end question answering systems, such systems often struggle when additional knowledge is needed to correctly answer the questions.
no code implementations • 20 Nov 2018 • Oyvind Tafjord, Peter Clark, Matt Gardner, Wen-tau Yih, Ashish Sabharwal
Many natural language questions require recognizing and reasoning with qualitative relationships (e. g., in science, economics, and medicine), but are challenging to answer with corpus-based methods.
1 code implementation • WS 2018 • Matt Gardner, Joel Grus, Mark Neumann, Oyvind Tafjord, Pradeep Dasigi, Nelson Liu, Matthew Peters, Michael Schmitz, Luke Zettlemoyer
This paper describes AllenNLP, a platform for research on deep learning methods in natural language understanding.
1 code implementation • 14 Mar 2018 • Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord
We present a new question set, text corpus, and baselines assembled to encourage AI research in advanced question answering.
no code implementations • EMNLP 2016 • Jayant Krishnamurthy, Oyvind Tafjord, Aniruddha Kembhavi
Situated question answering is the problem of answering questions about an environment such as an image or diagram.
3 code implementations • 14 Apr 2016 • Carissa Schoenick, Peter Clark, Oyvind Tafjord, Peter Turney, Oren Etzioni
Given recent successes in AI (e. g., AlphaGo's victory against Lee Sedol in the game of GO), it's become increasingly important to assess: how close are AI systems to human-level intelligence?