MODERN, or Modulated Residual Network, is an architecture for visual question answering (VQA). It employs conditional batch normalization to allow a linguistic embedding from an LSTM to modulate the batch normalization parameters of a ResNet. This enables the linguistic embedding to manipulate entire feature maps by scaling them up or down, negating them, or shutting them off, etc.
Source: Modulating early visual processing by languagePaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Question Answering | 1 | 33.33% |
Visual Question Answering | 1 | 33.33% |
Visual Question Answering (VQA) | 1 | 33.33% |
Component | Type |
|
---|---|---|
Conditional Batch Normalization
|
Normalization | |
LSTM
|
Recurrent Neural Networks | |
ResNet
|
Convolutional Neural Networks | |
Tanh Activation
|
Activation Functions |