12 dataset results for Decision Making AND Texts

Charades-STA

Charades-STA is a new dataset built on top of Charades by adding sentence temporal annotations.

187 PAPERS • 4 BENCHMARKS

LogiQA

LogiQA consists of 8,678 QA instances, covering multiple types of deductive reasoning. Results show that state-of-the-art neural models perform by far worse than human ceiling. The dataset can also serve as a benchmark for reinvestigating logical AI under the deep learning NLP setting.

76 PAPERS • NO BENCHMARKS YET

ShARC (Shaping Answers with Rules through Conversation)

ShARC is a Conversational Question Answering dataset focussing on question answering from texts containing rules.

40 PAPERS • NO BENCHMARKS YET

PeerRead

PearRead is a dataset of scientific peer reviews. The dataset consists of over 14K paper drafts and the corresponding accept/reject decisions in top-tier venues including ACL, NIPS and ICLR, as well as over 10K textual peer reviews written by experts for a subset of the papers.

33 PAPERS • NO BENCHMARKS YET

Evidence Inference

Evidence Inference is a corpus for this task comprising 10,000+ prompts coupled with full-text articles describing RCTs.

26 PAPERS • NO BENCHMARKS YET

SentiCap

The SentiCap dataset contains several thousand images with captions with positive and negative sentiments. These sentimental captions are constructed by the authors by re-writing factual descriptions. In total there are 2000+ sentimental captions.

26 PAPERS • NO BENCHMARKS YET

ROPES (Reasoning Over Paragraph Effects in Situations)

ROPES is a QA dataset which tests a system's ability to apply knowledge from a passage of text to a new situation. A system is presented a background passage containing a causal or qualitative relation(s), a novel situation that uses this background, and questions that require reasoning about effects of the relationships in the back-ground passage in the context of the situation.

23 PAPERS • NO BENCHMARKS YET

CHALET (Cornell House Agent Learning Environment)

CHALET is a 3D house simulator with support for navigation and manipulation. Unlike existing systems, CHALET supports both a wide range of object manipulation, as well as supporting complex environemnt layouts consisting of multiple rooms. The range of object manipulations includes the ability to pick up and place objects, toggle the state of objects like taps or televesions, open or close containers, and insert or remove objects from these containers. In addition, the simulator comes with 58 rooms that can be combined to create houses, including 10 default house layouts. CHALET is therefore suitable for setting up challenging environments for various AI tasks that require complex language understanding and planning, such as navigation, manipulation, instruction following, and interactive question answering.

9 PAPERS • NO BENCHMARKS YET

UKP (UKP Argument Annotated Essays)

The UKP Argument Annotated Essays corpus consists of argument annotated persuasive essays including annotations of argument components and argumentative relations.

8 PAPERS • NO BENCHMARKS YET

OMICS (Open Mind Indoor Common Sense)

OMICS is an extensive collection of knowledge for indoor service robots gathered from internet users. Currently, it contains 48 tables capturing different sorts of knowledge. Each tuple of the Help table maps a user desire to a task that may meet the desire (e.g., ⟨ “feel thirsty”, “by offering drink” ⟩). Each tuple of the Tasks/Steps table decomposes a task into several steps (e.g., ⟨ “serve a drink”, 0. “get a glass”, 1. “get a bottle”, 2. “fill class from bottle”, 3. “give class to person” ⟩). Given this, OMICS offers useful knowledge about hierarchism of naturalistic instructions, where a high-level user request (e.g., “serve a drink”) can be reduced to lower-level tasks (e.g., “get a glass”, ⋯). Another feature of OMICS is that elements of any tuple in an OMICS table are semantically related according to a predefined template. This facilitates the semantic interpretation of the OMICS tuples.

6 PAPERS • NO BENCHMARKS YET

Covid-HeRA

Covid-HeRA is a dataset for health risk assessment and severity-informed decision making in the presence of COVID19 misinformation. It is a benchmark dataset for risk-aware health misinformation detection, related to the 2019 coronavirus pandemic. Social media posts (Twitter) are annotated based on the perceived likelihood of health behavioural changes and the perceived corresponding risks from following unreliable advice found online.

2 PAPERS • NO BENCHMARKS YET

PubMed PICO Element Detection Dataset

PICO is a framework to formulate a well-defined focused clinical question. This framework identifies the sentences in a given medical text that belong to the four components: Participants/Problem (P), Intervention (I), Comparison (C) and Outcome (O). The PubMed PICO Element Detection dataset is a dataset for evaluating models that automatically detect PICO elements.

2 PAPERS • NO BENCHMARKS YET

Datasets

12 dataset results for Decision Making AND Texts