Extended Data Fig. 2: The structural diagram of human-knowledge encoding in pretraining and application phase for EyeFM.
From: An eyecare foundation model for clinical assistance: a randomized controlled trial

a) The diagram of pretraining process for EyeFM. EyeFM was first pretrained for its image encoder with five modalities of images, then conducted vision-language joint pretraining. The image module includes one encoder and five decoders. The encoder comprises 24 Transformer blocks. Each decoder comprises two Transformer blocks. The linear projection layer is implemented with a single convolutional layer. In the vision-language module, the projection is implemented using a single linear layer, which serves to connect the image encoder with the language module. The language module is based on LLaMA 2 architecture with 7 billion parameters. b) The diagram of human-in-the-loop process for EyeFM. EyeFM human-in-the-loop utilized DPO and federated learning for distributed knowledge evolution.