OpenWebText is an open-source recreation of the WebText corpus. The text is web content extracted from URLs shared on Reddit with at least three upvotes. (38GB).
141 PAPERS • NO BENCHMARKS YET