Training Techniques | SGD with Momentum, Weight Decay, Mixup |
---|---|
Architecture | 1x1 Convolution, Bottleneck Residual Block, Group Normalization, Weight Standardization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax |
ID | resnetv2_101x1_bitm |
SHOW MORE |
Training Techniques | SGD with Momentum, Weight Decay, Mixup |
---|---|
Architecture | 1x1 Convolution, Bottleneck Residual Block, Group Normalization, Weight Standardization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax |
ID | resnetv2_101x3_bitm |
SHOW MORE |
Training Techniques | SGD with Momentum, Weight Decay, Mixup |
---|---|
Architecture | 1x1 Convolution, Bottleneck Residual Block, Group Normalization, Weight Standardization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax |
ID | resnetv2_152x2_bitm |
SHOW MORE |
Training Techniques | SGD with Momentum, Weight Decay, Mixup |
---|---|
Architecture | 1x1 Convolution, Bottleneck Residual Block, Group Normalization, Weight Standardization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax |
ID | resnetv2_152x4_bitm |
SHOW MORE |
Training Techniques | SGD with Momentum, Weight Decay, Mixup |
---|---|
Architecture | 1x1 Convolution, Bottleneck Residual Block, Group Normalization, Weight Standardization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax |
ID | resnetv2_50x1_bitm |
SHOW MORE |
Training Techniques | SGD with Momentum, Weight Decay, Mixup |
---|---|
Architecture | 1x1 Convolution, Bottleneck Residual Block, Group Normalization, Weight Standardization, Convolution, Global Average Pooling, Residual Block, Residual Connection, ReLU, Max Pooling, Softmax |
ID | resnetv2_50x3_bitm |
SHOW MORE |
Big Transfer (BiT) is a type of pretraining recipe that pre-trains on a large supervised source dataset, and fine-tunes the weights on the target task. Models are trained on the JFT-300M dataset. The finetuned models contained in this collection are finetuned on ImageNet.
To load a pretrained model:
import timm
m = timm.create_model('resnetv2_50x1_bitm', pretrained=True)
m.eval()
Replace the model name with the variant you want to use, e.g. resnetv2_50x1_bitm
. You can find the IDs in the model summaries at the top of this page.
You can follow the timm recipe scripts for training a new model afresh.
@misc{kolesnikov2020big,
title={Big Transfer (BiT): General Visual Representation Learning},
author={Alexander Kolesnikov and Lucas Beyer and Xiaohua Zhai and Joan Puigcerver and Jessica Yung and Sylvain Gelly and Neil Houlsby},
year={2020},
eprint={1912.11370},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
MODEL | TOP 1 ACCURACY | TOP 5 ACCURACY |
---|---|---|
resnetv2_152x4_bitm | 84.95% | 97.45% |
resnetv2_152x2_bitm | 84.4% | 97.43% |
resnetv2_101x3_bitm | 84.38% | 97.37% |
resnetv2_50x3_bitm | 83.75% | 97.12% |
resnetv2_101x1_bitm | 82.21% | 96.47% |
resnetv2_50x1_bitm | 80.19% | 95.63% |