Fig. 4: Scaling laws of Emu3 across multimodal tasks. | Nature

Fig. 4: Scaling laws of Emu3 across multimodal tasks.

From: Multimodal learning with next-token prediction for large multimodal models

Fig. 4: Scaling laws of Emu3 across multimodal tasks.

a, Validation loss surfaces for three tasks: T2I, I2T and T2V, shown as functions of model size and number of training tokens. All three tasks demonstrated clear power-law behaviour with respect to scale. b, Predicted versus observed validation loss using the fitted scaling laws for the 7B Emu3 model on T2I, I2T and T2V tasks. The predictions were closely aligned with measured performance, which validated the extrapolation capability of the learned scaling relationships. MAE, mean absolute error; MAPE, mean absolute percentage error.

Back to article page