Dialog

UDC (Ubuntu Dialogue Corpus)

Introduced by Lowe et al. in The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

Ubuntu Dialogue Corpus (UDC) is a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This provides a unique resource for research into building dialogue managers based on neural language models that can make use of large amounts of unlabeled data. The dataset has both the multi-turn property of conversations in the Dialog State Tracking Challenge datasets, and the unstructured nature of interactions from microblog services such as Twitter.

Source: The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Conversational Response Selection	Ubuntu Dialogue (v1, Ranking)	Dial-MAE
Answer Selection	Ubuntu Dialogue (v2, Ranking)	BERT + Keep Learning
Dialogue Generation	Ubuntu Dialogue (Tense)	MrRNN Act.-Ent.
Dialogue Generation	Ubuntu Dialogue (Cmd)	MrRNN Act.-Ent.
Answer Selection	Ubuntu Dialogue (v1, Ranking)	SA-BERT
Dialogue Generation	Ubuntu Dialogue (Activity)	MrRNN Act.-Ent.
Conversational Response Selection	Ubuntu Dialogue (v2, Ranking)	Uni-Encoder
Dialogue Generation	Ubuntu Dialogue (Entity)	MrRNN Act.-Ent.