The Synthesizer is a model that learns synthetic attention weights without token-token interactions. Unlike Transformers, the model eschews dot product self-attention but also content-based self-attention altogether. Synthesizer learns to synthesize the self-alignment matrix instead of manually computing pairwise dot products. It is transformation-based, only relies on simple feed-forward layers, and completely dispenses with dot products and explicit token-token interactions.
This new module employed by the Synthesizer is called "Synthetic Attention": a new way of learning to attend without explicitly attending (i.e., without dot product attention or content-based attention). Instead, Synthesizer generate the alignment matrix independent of token-token dependencies.
Source: Synthesizer: Rethinking Self-Attention in Transformer ModelsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Pose Estimation | 3 | 7.14% |
Zero-Shot Object Detection | 2 | 4.76% |
Voice Cloning | 2 | 4.76% |
Object Detection | 2 | 4.76% |
Face Reenactment | 2 | 4.76% |
Texture Synthesis | 2 | 4.76% |
Language Modelling | 2 | 4.76% |
Machine Translation | 2 | 4.76% |
Text Generation | 2 | 4.76% |
Component | Type |
|
---|---|---|
Dense Synthesized Attention
|
Attention Mechanisms | (optional) |
Factorized Dense Synthesized Attention
|
Attention Mechanisms | (optional) |
Factorized Random Synthesized Attention
|
Attention Mechanisms | (optional) |
Multi-Head Attention
|
Attention Modules | |
Random Synthesized Attention
|
Attention Mechanisms | (optional) |