The Reddit dataset is a graph dataset from Reddit posts made in the month of September, 2014. The node label in this case is the community, or “subreddit”, that a post belongs to. 50 large communities have been sampled to build a post-to-post graph, connecting posts if the same user comments on both. In total this dataset contains 232,965 posts with an average degree of 492. The first 20 days are used for training and the remaining days for testing (with 30% used for validation). For features, off-the-shelf 300-dimensional GloVe CommonCrawl word vectors are used.
602 PAPERS • 13 BENCHMARKS
We release E-commerce Dialogue Corpus, comprising a training data set, a development set and a test set for retrieval based chatbot. The statistics of E-commerical Conversation Corpus are shown in the following table.
37 PAPERS • 1 BENCHMARK
The DSTC7 Task 1 dataset is a dataset and task for goal-oriented dialogue. The data originates from human-human conversations, which is built from online resources, specifically the Ubuntu Internet Relay Chat (IRC) channel and an Advising dataset from the University of Michigan.
11 PAPERS • 1 BENCHMARK
Reddit Corpus is part of a repository of conversational datasets consisting of hundreds of millions of examples, and a standardised evaluation procedure for conversational response selection models using '1-of-100 accuracy'. The Reddit Corpus contains 726 million multi-turn dialogues from the Reddit board.
7 PAPERS • 1 BENCHMARK
Advising Corpus is a dataset based on an entirely new collection of dialogues in which university students are being advised which classes to take. These were collected at the University of Michigan with IRB approval. They were released as part of DSTC 7 track 1 and used again in DSTC 8 track 2.
4 PAPERS • 1 BENCHMARK
This dataset is for evaluating the task of Black-box Multi-agent Integration which focuses on combining the capabilities of multiple black-box conversational agents at scale. It provides data to explore two main frameworks of exploration: question agent pairing and question response pairing.
1 PAPER • 1 BENCHMARK