Gated Linear Unit

Introduced by Dauphin et al. in Language Modeling with Gated Convolutional Networks

A Gated Linear Unit, or GLU computes:

$$ \text{GLU}\left(a, b\right) = a\otimes \sigma\left(b\right) $$

It is used in natural language processing architectures, for example the Gated CNN, because here $b$ is the gate that control what information from $a$ is passed up to the following layer. Intuitively, for a language modeling task, the gating mechanism allows selection of words or features that are important for predicting the next word. The GLU also has non-linear capabilities, but has a linear path for the gradient so diminishes the vanishing gradient problem.

Source: Language Modeling with Gated Convolutional Networks

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	93	9.05%
Question Answering	57	5.54%
Decoder	46	4.47%
Sentence	40	3.89%
Text Generation	39	3.79%
Retrieval	33	3.21%
Translation	25	2.43%
Natural Language Understanding	21	2.04%
Machine Translation	21	2.04%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Activation Functions