Spatial Token Mixer

4 papers with code • 0 benchmarks • 0 datasets

Spatial Token Mixer (STM) is a module for vision transformers that aims to improve the efficiency of token mixing. STM is a type of depthwise convolution that operates on the spatial dimension of the tokens. STM is a drop-in replacement for the token mixing layers in vision transformers.

Benchmarks

Add a Result

These leaderboards are used to track progress in Spatial Token Mixer

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Most implemented papers

Most implemented Social Latest No code

WaveMix: A Resource-efficient Neural Network for Image Analysis

pranavphoenix/WaveMix • • 28 May 2022

The whole architecture is a stack of self-similar and resolution-preserving WaveMix blocks, which allows architectural flexibility for various tasks and levels of resource availability.

Paper
Code

Demystify Transformers & Convolutions in Modern Image Deep Networks

opengvlab/stm-evaluation • • 10 Nov 2022

Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.

Paper
Code

CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder

edwardyehuang/CAR • • 11 Jan 2023

Extensive experiments and ablation studies conducted on multiple benchmark datasets demonstrate that the proposed CAR can boost the accuracy of all baseline models by up to 2. 23% mIOU with superior generalization ability.

Paper
Code

UniNeXt: Exploring A Unified Architecture for Vision Recognition

jianlong-yuan/uninext • • 26 Apr 2023

Interestingly, the ranking of these spatial token mixers also changes under our UniNeXt, suggesting that an excellent spatial token mixer may be stifled due to a suboptimal general architecture, which further shows the importance of the study on the general architecture of vision backbone.

Paper
Code

Spatial Token Mixer

Benchmarks Add a Result

Most implemented papers

WaveMix: A Resource-efficient Neural Network for Image Analysis

Demystify Transformers & Convolutions in Modern Image Deep Networks

CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder

UniNeXt: Exploring A Unified Architecture for Vision Recognition

Content

Benchmarks

Add a Result