Table 2 Comparison of LadderMoE with representative PEFT and MoE-based methods
From: Ladder-side mixture of experts adapters for bronze inscription recognition
Method | Category | Inserted modules | Inserted location | Routing | Trainable parameters | FLOPs (G) | Training memory usage (GB) | Training time |
|---|---|---|---|---|---|---|---|---|
Switch Transformer45 | MoE | ✗ | ✗ | ✓ | 7B + | – | – | – |
Ours (LoRA38) | PEFT | Low-rank linear | Attention output and MLP linears | ✗ | 75.49 M | 103.2 | 11.9 | 34 h |
Ours (CLIP-adapter37) | PEFT | Linear layer | Top of the image encoder | ✗ | 0.59 M | 52.2 | 3.5 | 8 h |
Ours (LadderMoE) | Hybrid | Attention layer | Side of the image encoder | ✓ | 250 M | 151.9 | 39.4 | 26 h |
Full fine-tuning | – | ✗ | ✗ | ✗ | 427 M | 52.1 | 15.8 | 28 h |