Extended Data Fig. 4: Schematics of the seven downstream tasks used to evaluate OVFM.

Spatiotemporal-level tasks include surgical step recognition, tool presence recognition, complication detection, and surgical skill assessment, where OVFM is connected to a linear layer for classification. Spatial-level tasks include surgical scene segmentation, limbus boundary segmentation, and nucleus localization, where OVFM is connected to a decoder.