Extended Data Table 5 Parameters used for the scaling laws
From: Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing
Model | a | α | b | β | g | γ | c |
|---|---|---|---|---|---|---|---|
MoE | 18.1 | 0.115 | 30.8 | 0.147 | 2.1 | 0.58 | 0.47 |
Dense | 16.3 | 0.126 | 26.7 | 0.127 | - | - | 0.47 |