Access code for baidu
is swin
.
ImageNet-1K and ImageNet-22K Pretrained Swin-V1 Models
ImageNet-1K and ImageNet-22K Pretrained Swin-V2 Models
Note:
- SwinV2-B* (SwinV2-L*) with input resolution of 256x256 and 384x384 both fine-tuned from the
same pre-training model using a smaller input resolution of 192x192.
- SwinV2-B* (384x384) achieves 78.08 acc@1 on ImageNet-1K-V2 while SwinV2-L* (384x384) achieves
78.31.
ImageNet-1K Pretrained Swin MLP Models
Note: C24 means each head has 24 channels.
ImageNet-22K Pretrained Swin-MoE Models
name |
#experts |
k |
router |
resolution |
window |
IN-22K acc@1 |
IN-1K/ft acc@1 |
IN-1K/5-shot acc@1 |
22K model |
Swin-MoE-S |
1 (dense) |
- |
- |
192x192 |
8x8 |
35.5 |
83.5 |
70.3 |
github/baidu/config |
Swin-MoE-S |
8 |
1 |
Linear |
192x192 |
8x8 |
36.8 |
84.5 |
75.2 |
github/baidu/config |
Swin-MoE-S |
16 |
1 |
Linear |
192x192 |
8x8 |
37.6 |
84.9 |
76.5 |
github/baidu/config |
Swin-MoE-S |
32 |
1 |
Linear |
192x192 |
8x8 |
37.4 |
84.7 |
75.9 |
github/baidu/config |
Swin-MoE-S |
32 |
1 |
Cosine |
192x192 |
8x8 |
37.2 |
84.3 |
75.2 |
github/baidu/config |
Swin-MoE-S |
64 |
1 |
Linear |
192x192 |
8x8 |
37.8 |
84.7 |
75.7 |
- |
Swin-MoE-S |
128 |
1 |
Linear |
192x192 |
8x8 |
37.4 |
84.5 |
75.4 |
- |
Swin-MoE-B |
1 (dense) |
- |
- |
192x192 |
8x8 |
37.3 |
85.1 |
75.9 |
config |
Swin-MoE-B |
8 |
1 |
Linear |
192x192 |
8x8 |
38.1 |
85.3 |
77.2 |
config |
Swin-MoE-B |
16 |
1 |
Linear |
192x192 |
8x8 |
38.7 |
85.5 |
78.2 |
config |
Swin-MoE-B |
32 |
1 |
Linear |
192x192 |
8x8 |
38.6 |
85.5 |
77.9 |
config |
Swin-MoE-B |
32 |
1 |
Cosine |
192x192 |
8x8 |
38.5 |
85.3 |
77.3 |
config |
Swin-MoE-B |
32 |
2 |
Linear |
192x192 |
8x8 |
38.6 |
85.5 |
78.7 |
- |
Simmim Pretrained Swin-V2 Models
- model size counts only the backbone weights and does not include weights in the decoders / classification heads
- batch size of all models is set 2048.
- validation loss is counted on the ImageNet-1K validation set.
- fine-tuned acc@1 means the top-1 accuracy on the ImageNet-1K validation set by fine-tuning.
name |
model size |
pre-train dataset |
pre-train iterations |
validation loss |
fine-tuned acc@1 |
pre-trained model |
fine-tuned model |
SwinV2-Small |
49M |
ImageNet-1K 10% |
125k |
0.4820 |
82.69 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K 10% |
250k |
0.4961 |
83.11 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K 10% |
500k |
0.5115 |
83.17 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K 20% |
125k |
0.4751 |
83.05 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K 20% |
250k |
0.4722 |
83.56 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K 20% |
500k |
0.4734 |
83.75 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K 50% |
125k |
0.4732 |
83.04 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K 50% |
250k |
0.4681 |
83.67 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K 50% |
500k |
0.4646 |
83.96 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K |
125k |
0.4728 |
82.92 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K |
250k |
0.4674 |
83.66 |
azure |
azure |
SwinV2-Small |
49M |
ImageNet-1K |
500k |
0.4641 |
84.08 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 10% |
125k |
0.4822 |
83.33 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 10% |
250k |
0.4997 |
83.60 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 10% |
500k |
0.5112 |
83.41 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 20% |
125k |
0.4703 |
83.86 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 20% |
250k |
0.4679 |
84.37 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 20% |
500k |
0.4711 |
84.61 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 50% |
125k |
0.4683 |
84.04 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 50% |
250k |
0.4633 |
84.57 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K 50% |
500k |
0.4598 |
84.95 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K |
125k |
0.4680 |
84.13 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K |
250k |
0.4626 |
84.65 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-1K |
500k |
0.4588 |
85.04 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-22K |
125k |
0.4695 |
84.11 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-22K |
250k |
0.4649 |
84.57 |
azure |
azure |
SwinV2-Base |
87M |
ImageNet-22K |
500k |
0.4614 |
85.11 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 10% |
125k |
0.4995 |
83.69 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 10% |
250k |
0.5140 |
83.66 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 10% |
500k |
0.5150 |
83.50 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 20% |
125k |
0.4675 |
84.38 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 20% |
250k |
0.4746 |
84.71 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 20% |
500k |
0.4960 |
84.59 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 50% |
125k |
0.4622 |
84.78 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 50% |
250k |
0.4566 |
85.38 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K 50% |
500k |
0.4530 |
85.80 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K |
125k |
0.4611 |
84.98 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K |
250k |
0.4552 |
85.45 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-1K |
500k |
0.4507 |
85.91 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-22K |
125k |
0.4649 |
84.61 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-22K |
250k |
0.4586 |
85.39 |
azure |
azure |
SwinV2-Large |
195M |
ImageNet-22K |
500k |
0.4536 |
85.81 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K 20% |
125k |
0.4789 |
84.35 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K 20% |
250k |
0.5038 |
84.16 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K 20% |
500k |
0.5071 |
83.44 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K 50% |
125k |
0.4549 |
85.09 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K 50% |
250k |
0.4511 |
85.64 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K 50% |
500k |
0.4559 |
85.69 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K |
125k |
0.4531 |
85.23 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K |
250k |
0.4464 |
85.90 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-1K |
500k |
0.4416 |
86.34 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-22K |
125k |
0.4564 |
85.14 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-22K |
250k |
0.4499 |
85.86 |
azure |
azure |
SwinV2-Huge |
655M |
ImageNet-22K |
500k |
0.4444 |
86.27 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-1K 50% |
125k |
0.4534 |
85.44 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-1K 50% |
250k |
0.4515 |
85.76 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-1K 50% |
500k |
0.4719 |
85.51 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-1K |
125k |
0.4513 |
85.57 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-1K |
250k |
0.4442 |
86.12 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-1K |
500k |
0.4395 |
86.46 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-22K |
125k |
0.4544 |
85.39 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-22K |
250k |
0.4475 |
85.96 |
azure |
azure |
SwinV2-giant |
1.06B |
ImageNet-22K |
500k |
0.4416 |
86.53 |
azure |
azure |
Simmim Pretrained Swin-V1 Models
ImageNet-1K Pre-trained and Fine-tuned Models