Extended Data Table 5 Parameters used for the scaling laws

From: Efficient scaling of large language models with mixture of experts and 3D analog in-memory computing

Model

a

α

b

β

g

γ

c

MoE

18.1

0.115

30.8

0.147

2.1

0.58

0.47

Dense

16.3

0.126

26.7

0.127

-

-

0.47

  1. Fitted parameters used for the scaling law study presented in Fig. 5.