FCANet contains a novel multi-spectral channel attention module. Given an input feature map $X \in \mathbb{R}^{C \times H \times W}$, multi-spectral channel attention first splits $X$ into many parts $x^{i} \in \mathbb{R}^{C' \times H \times W}$. Then it applies a 2D DCT to each part $x^{i}$. Note that a 2D DCT can use pre-processing results to reduce computation. After processing each part, all results are concatenated into a vector. Finally, fully connected layers, ReLU activation and a sigmoid are used to get the attention vector as in an SE block. This can be formulated as: \begin{align} s = F_\text{fca}(X, \theta) & = \sigma (W_{2} \delta (W_{1}[(\text{DCT}(\text{Group}(X)))])) \end{align} \begin{align} Y & = s X \end{align} where $\text{Group}(\cdot)$ indicates dividing the input into many groups and $\text{DCT}(\cdot)$ is the 2D discrete cosine transform.
This work based on information compression and discrete cosine transforms achieves excellent performance on the classification task.
Source: FcaNet: Frequency Channel Attention NetworksPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Image Classification | 1 | 25.00% |
Instance Segmentation | 1 | 25.00% |
Object Detection | 1 | 25.00% |
Semantic Segmentation | 1 | 25.00% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |