PAR Transformer is a Transformer model that uses 63% fewer self-attention blocks, replacing them with feed-forward blocks, while retaining test accuracies. It is based on the Transformer-XL architecture and uses neural architecture search to find an an efficient pattern of blocks in the transformer architecture.
Source: Pay Attention when RequiredPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 1 | 25.00% |
Paraphrase Identification | 1 | 25.00% |
Question Answering | 1 | 25.00% |
Sentiment Analysis | 1 | 25.00% |