Search Results for author: Wojciech Mańke

Found 2 papers, 1 papers with code

Searching for Efficient Transformers for Language Modeling

no code implementations • NeurIPS 2021 • David So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc Le

For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X.

Language Modelling

Paper
Add Code

Primer: Searching for Efficient Transformers for Language Modeling

4 code implementations • 17 Sep 2021 • David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

For example, at a 500M parameter size, Primer improves the original T5 architecture on C4 auto-regressive language modeling, reducing the training cost by 4X.

Ranked #1 on Language Modelling on C4

Language Modelling

49,589

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.