Modulated Residual Network

Introduced by Vries et al. in Modulating early visual processing by language

MODERN, or Modulated Residual Network, is an architecture for visual question answering (VQA). It employs conditional batch normalization to allow a linguistic embedding from an LSTM to modulate the batch normalization parameters of a ResNet. This enables the linguistic embedding to manipulate entire feature maps by scaling them up or down, negating them, or shutting them off, etc.

Source: Modulating early visual processing by language

Read Paper See Code