This is a dataset for multi-document summarization in Portuguese, what means that it has examples of multiple documents (input) related to human-written summaries (output). In particular, it has entries of multiple related texts from Brazilian websites about a subject, and the summary is the Portuguese Wikipedia lead section on the same subject (lead: the first section, i.e., summary, of any Wipedia article). Input texts were extracted from BrWac corpus, and the output from Brazilian Wikipedia dumps page.
1 PAPER • NO BENCHMARKS YET