FOVEA: Preoperative and intraoperative retinal fundus images with optic disc and retinal vessel annotations

Ravasio, Claudio S.; Flores-Sánchez, Blanca; Bloch, Edward; Bergeles, Christos; da Cruz, Lyndon

doi:10.1038/s41597-025-04965-2

Download PDF

Data Descriptor
Open access
Published: 26 April 2025

FOVEA: Preoperative and intraoperative retinal fundus images with optic disc and retinal vessel annotations

Scientific Data volume 12, Article number: 703 (2025) Cite this article

1781 Accesses
Metrics details

Subjects

Abstract

The performance and scope of computer vision methods applied to ophthalmic images is highly dependent on the availability of labelled training data. While there are a number of colour fundus photography datasets, FOVEA is to the best of our knowledge the first dataset that matches high-quality annotations in the intraoperative domain with those in the preoperative one. It comprises data from 40 patients collected at Moorfields Eye Hospital (London, UK) and includes preoperative and intraoperative retinal vessel and optic disc annotations performed by two independent clinical research fellows, as well as short video clips showing the retinal fundus though biomicroscopy imaging in the intraoperative setting. The annotations were validated and converted into binary segmentation masks, with the code used available on GitHub. We expect this data to be useful for deep learning applications aimed at supporting surgeons during vitreoretinal surgery procedures e.g. by localising points of interest or registering additional imaging modalities.

Predicting sex from retinal fundus photographs using automated deep learning

Article Open access 13 May 2021

Comparison of long-term visual and anatomical outcomes between internal limiting membrane flap and peeling techniques for macular holes with a propensity score analysis

Article 18 May 2022

OCT-based deep-learning models for the identification of retinal key signs

Article Open access 05 September 2023

Background

Preoperative as well as intraoperative imaging in ophthalmology has long been an integral part of the process of identifying, localising, and monitoring disease in advance of any surgical treatment, as well as for orientation during procedures¹. Traditionally, assessment of preoperative images relied exclusively on clinicians and grading centres. The emergence of capable deep learning algorithms in recent years, however, has shifted this focus towards the efficiency and accuracy of modern computer vision methods. These have the potential to alleviate workload, improve surgical outcomes, increase patient satisfaction, and leverage data from a multiplicity of sources for tasks intractable for clinical professionals².

This development both feeds off the availability and in turn necessitates the further collection of clinical data from a variety sources, as the benefit of large manually labelled datasets to the performance of deep learning models has been well established. However, current developments such as extensive and powerful foundation models – a term coined by Stanford University Human-Centered Artificial Intelligence for “models [...] trained on broad data [...] that can be adapted to a wide range of downstream tasks”³ – are making the use of few-shot methods more viable⁴. Other current approaches include the use of contrastive learning, or even extracting knowledge inherent in a diffusion model trained on unlabelled data⁵. While not eliminating the need for manual labelling in the target domain, this does underline the importance of high-quality annotations over sheer quantity.

There exist a number of public datasets such as STARE⁶, DRIVE⁷, CHASE_DB1⁸, and more recently FIVES⁹ for retinal vessel segmentation, or ORIGA¹⁰, G1020¹¹, and PAPILA¹² for optic disc segmentation. However, all of these are exclusively based on preoperative colour fundus photography – and the availability of such data has facilitated a large and varied body of work e.g. on preoperative retinal vessel segmentation¹³ or tasks like the detection and grading of spontaneous retinal venous pulsations¹⁴.

Unfortunately, in the much more variable and therefore challenging intraoperative domain much less has been achieved. In tasks such as intraoperatively building a vessel map^15,16, noise in the vessel detection obtained from a more traditional exploratory algorithm is explicitly noted as a limiting factor. In-domain labels therefore have great potential to help intraoperative localisation and segmentation by unlocking the use of deep learning methods and serving as a validation benchmark.

It is in this context that we present FOVEA, a dataset for retinal Fundus images with Optic disc and retinal VEssel Annotations, consisting of high-resolution preoperative fundus photography and intraoperative microscopy frames from 40 patients, each annotated by two independent clinical research fellows at Moorfields Eye Hospital (London, UK) and then processed and validated to ensure consistency and quality. To our knowledge, FOVEA is the first dataset to include high-quality annotations of combined preoperative and intraoperative retinal fundus images.

The provision of such labels with the matching annotations of preoperative images from the same patients also provides an opportunity to register retinal features in the preoperative domain with features detected during surgery. This can further improve orientation during procedures, especially relevant in the context of novel robotic tools¹⁷ which may partially rely on computer vision for navigation. Additionally, this could allow augmented reality to move past applications in training¹⁸ to providing clinicians with an enhanced surgical environment during actual procedures offering up information inherent in additional imaging modalities such as optical coherence tomography.

Methods

The FOVEA dataset¹⁹ has been carefully processed to obtain a high-quality, validated benchmark. The overall method can be divided into five steps which will be explained in full detail in the following sections. Figure 1 shows an overview of the process.

Acquisition

All data was collected at Moorfields Eye Hospital between 2017 and 2021, from adult patients undergoing pars plana vitrectomy in the course of standard of care treatment. A number of different conditions were present in this initial patient population, most commonly epiretinal membrane and macular hole, but also optic disc pit maculopathy, posterior intraocular lens dislocation, and different types of floaters.

Informed consent for the use of this data in scientific journals as well as its publication was obtained from the patients, with the national data opt-out service being applied. All data was acquired in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards, and the study was approved by the Medical Ethics Committee of Moorfields Eye Hospital (IRAS #261512). The dataset is also fully anonymised. Each record was assigned a sequential identifier – FOVEA001 to FOVEA040 – not linked to the patient’s hospital number. It has been verified that no data contains any burned-in patient identifier, or identifying metadata.

The colour fundus photography (CFP) images were acquired using a Topcon 3D OCT-1000 fundus camera after pupillary pharmacological dilation with 1.0% tropicamide and 2.5% phenylephrine during the preoperative review. Intraoperative footage was acquired using a Zeiss Lumera 700 ophthalmic microscope with an integrated RESIGHT fundus viewing system wide angle aspheric, achromatic 128D lens. Once switched on, the fundus viewing system inverted the image on the X and Y axis, and deactivated the microscope illuminating system. For this reason, videos were recorded with a fibre optic endoilluminator with a xenon straight light source.

Selection

To ensure the quality of the dataset, the imaging data collected at Moorfields Eye Hospital was pared down to 40 records by removing patients whose preoperative or intraoperative fundus images were not in line with the typical image quality, in terms of sharpness and illumination. Additionally, patients with pathologies rendering the fundus images unsuitable for annotation or non-representative for the larger patient population were also removed. Such pathologies include mainly choroideremia, with its characteristic absence of pigmentation and clearly visible choroidal vessels²⁰, and also myopic foveoschisis, similarly altering the appearance of the fundus due to thinning of the retinal pigment epithelium and the separation of retinal layers²¹.

For the intraoperative vessel annotation, a frame had to be selected from the ophthalmic microscope recordings. To this end, first 750 frames – 30 seconds of video at a frame rate of 25 Hz, included in this dataset – were clipped from the start of the procedure, showing an exploration of the retinal fundus previous to the planned surgical intervention itself taking place. Then, the most suitable frame was selected by visual inspection, with the intent being to maximise the visible retinal fundus area as well as the vessel tree, while minimising any unwanted phenomena such as motion blur, tools or their shadows, and uneven lighting.

Annotation

Two clinical research fellows independently annotated the optic disc and the retinal vessels in the preoperative as well as the intraoperative retinal fundus image. They both followed a simple annotation protocol:

Optic disc annotation: The optic disc was identified as the round or ovoid structure nasal to the macula, from which the retinal vessels emerge. The entire surface of the optic disc was annotated as a contiguous segment, starting by delineation of the edges but disregarding any peripapillary features, such as peripapillary atrophy or scleral crescent.
Retinal vessel annotation: The standard for retinal vessel annotation was to begin at the vessel emergence from the optic nerve, following the vascular path continuously along the arcades including bifurcations into secondary and tertiary branches. Isolated vessels were excluded unless the discontinuity was clearly caused by the image field of view. The entire calibre of each vessel was annotated following its natural curvature. If there was ambiguity about whether a vessel derived from the retinal or choroidal plexus, it was disregarded.
Annotation sessions were not to last more than 1 continuous hour with at least 30 minutes rest afterwards. The maximum duration of annotation per day was 4 hours.
Once an annotation had been completed, it was saved and not reviewed again or edited by the annotator.

In order to preserve the independence of their respective work, no communication on specific annotations and no final review or consensus processing took place between the annotators.

A small pilot study was performed to ascertain the best-suited image presentation approach for manual annotations. This trial saw both clinicians annotate a number of frames twice for three image options: full RGB, grayscale, and green channel only. Based on intra-annotator concordance and qualitative feedback, it was decided to move ahead with the use of the green colour image channel. This matches the expectation that red-free images would aid in the task of highlighting retinal but not choroidal vessels, due to green light being absorbed by blood but reflected by the retinal pigment epithelium¹.

Both optic disc and vessel annotations were performed on an iPad Pro, using a pressure-sensitive Apple Pencil (2nd gen) and Procreate, a raster graphics editor. This process took an average of one hour per clinician for each of the 80 images annotated.

Processing

The annotations could be exported from Procreate as psd files and were subsequently converted to png files. They are grayscale images representing a drawn annotation of the optic disc and retinal vessels, respectively. However, this is not a binary mask, but contains smooth vessel outlines with a graduation from 0 to 1. This is due to the annotation method and represents the most natural-looking segmentation of the vessels, matching the more gradual vessel edges found in most images. This ‘raw’ annotation is included in the dataset along with the green channel image used for the annotation, but it has two issues:

A visual inspection revealed a small number of areas of annotated pixels seemingly uncorrelated to any visible vessel structure. This is most likely the result of accidental pen touches, and should be removed in the interest of a higher quality annotation mask.
For most applications such as deep learning segmentation models, a binary mask is required.

The latter can be easily achieved by thresholding the raw annotation – the most balanced choice of threshold value being 0.5. The former is addressed by finding contiguous regions of annotation in the binary mask, establishing their respective pixel count, and then deleting all with surface area less than 20 (empirically set). However, it was found that in some images these two steps also led to the unwanted deletion of small vessel branches. This is due to the light touch required while annotating very thin vessels with the pressure-sensitive pencil, resulting in parts of the annotation falling below the binarisation threshold value. These pixels are then eliminated from the annotation which causes gaps in the vessel tree, and removing contiguous regions below a certain size originally aimed at eliminating accidental pencil touches will ultimately delete such small branches entirely.

Rather manually fine-tuning the threshold values to find the right trade-off, we opted for a solution that does not introduce further bias during data processing: we apply an pixelwise OR operation to the binary mask as derived above, and the skeletonised binary mask obtained by using a threshold value of 0.1:

$$mask(a)=bin(a > 0.5)| skel(bin(a > 0.1))$$

(1)

where:

a = raw annotation with values from 0 to 1

bin (⋅) = binarisation operation outputting 0 or 1

skel (⋅) = skeletonisation operation running on a binary mask

Combining the binarised mask with the skeleton removes any new gaps in the vessel tree, and using the binarisation threshold of 0.1 avoids very light accidental touches close to thin vessels being taken into account during the skeletonisation process. The ultimate result is a contiguous binary mask that does not contain annotation mistakes such as accidental pencil touches.

Data Records

The FOVEA dataset can be downloaded from figshare¹⁹. Inside the FOVEA.zip archive there are 601 files: 560 images, 40 video clips, and one json file. More specifically, FOVEA comprises:

40 preoperative RGB colour fundus images
40 intraoperative RGB colour video clips
40 RGB colour fundus images, one from each video clip
160 raw annotations, one for each of the two annotators, for each of the 80 colour fundus images in total. It contains, in the order of the RGB channels, the vessel annotation, the optic disc annotation, and the green channel of the image used for annotation
320 binary annotation masks, one for the vessel and one for the optic disc segmentation in each raw annotation
A metadata file containing information about the video frame rate and resolution, the colour fundus image dimensions, the main condition present in each dataset record, and the frame index (zero indexed) at which the intraoperative colour fundus image was selected from the respective video clips

File names conform to the following scheme: $\mathrm{FOVEA}\left|\begin{array}{l}{\mathtt{1}}\\ {\mathtt{2}}\\ \vdots \\ {\mathtt{40}}\end{array}\right|\,-\,\left|\begin{array}{l}{\mathtt{p}}\\ {\mathtt{i}}\end{array}\right|\,-\,\left|\begin{array}{l}\mathrm{img}\\ \mathrm{vid}\\ \mathrm{raw}\\ \mathrm{ve}\\ \mathrm{od}\end{array}\right|\,-\,\left|\begin{array}{l}{\mathtt{1}}\\ {\mathtt{2}}\end{array}\right|$, with the four choices corresponding to:

–
the sequential identifier: 1 through 40
–
the domain: preoperative or intraoperative
–
the type: image, video (only intraoperative), raw annotation, vessel map, optical disc map
–
the annotator: 1 or 2

Table 1 contains all relevant technical information for each file type, and Fig. 2 shows an example of a partial data record, with corresponding file names.

Table 1 Dataset file names and technical details.

Full size table

Technical Validation

The entire dataset was extensively analysed in order to validate the final output and ensure the consistency and quality of the annotations. This section will justify the choices made in the processing step, analyse the result, and illustrate a use of the dataset in the context of current deep learning methods.

Processing

The approach of combining the skeleton of the vessel annotation to the mask obtained by thresholding the raw annotation can be validated both visually and statistically. Figure 3 illustrates the issue: very fine retinal vessels were annotated with very little pressure on the Apple pencil to obtain a narrow pencil stroke, but this also leads to low values in the raw annotation which can easily fall below the chosen binarisation threshold, fragmenting the vessel.

Simply reducing the threshold value is an option, but will also widen all vessel segmentations, which may be undesirable. This is avoided by using the skeleton to fill the gaps instead. Figure 3 shows it to be well-suited to reconnecting the full vessel tree. At the same time, this will not affect isolated single areas of annotations caused by accidentally brushing the iPad screen with the pencil: they can still be eliminated by simply removing areas below a size limit.

This is also confirmed by evaluating the masks statistically: across the entire dataset of 160 vessel annotations, combining with the skeleton reduces the count of isolated annotated areas with a size below 100 px from 1093 to 163, and their mean size from 12 px (95^th percentile: 51 px) to 3 px (95^th percentile: 6 px). This is a very good indication that all remaining unconnected areas smaller than e.g. 20 px are accidental pencil touches and can be removed without losing any part of the actual vessel tree.

Statistical Analysis

Figure 4 shows the coverage of the annotations as a percentage of the total number of pixels per image across the dataset. It is clear that significantly fewer pixels were annotated for intraoperative data. This is to be expected, as the intraoperative field of view is larger and a lower zoom level is in use - as evidenced by optic disc areas being reduced 7 to 8-fold in images taken during surgery - both reducing the vessel widths as measured in pixels. Additionally, parts of the visible fundus are often occluded, and image clarity is generally much reduced with many smaller vessels not being visible.

Mean values across the dataset can be gleaned from Table 2. Annotator 1 (henceforth “A1”) consistently annotated fewer pixels than annotator 2 (henceforth “A2”) – an observation borne out by the mean across the dataset: 5.04% vs 5.71% for the preoperative and 0.92% vs 1.04% for the intraoperative fundus. A similar trend holds true for the optic disc annotations, with 1.51% vs 1.60% and 0.25% vs 0.29%, respectively.

Table 2 Dataset annotation and annotators’ agreement statistics.

Full size table

However, extending the analysis to the total length of the skeleton of the retinal vessel segmentation shows a different picture: in this context, the numbers no longer significantly different: 0.85% for both annotators in the preoperative domain, and 0.37% vs 0.38% in the intraoperative domain. This means that both annotators put the same effort into following the vessel tree - and though finer branches often do not coincide between annotators (see Fig. 5), both end up with a similarly complex structure. However, they do tend to disagree on the width of vessels: A2 appears to be interpreting more of the often blurry vessel edges as belonging to the vessels than A1. Another explanation is a systematic difference in annotation technique between the two annotators, caused for example by the length of individual pencil strokes (see the Methods section).

In fact, these numbers allow us to calculate a mean annotated retinal vessel width. For preoperative images, the results are 5.9 px and 6.7 px, for intraoperative images 2.5 px and 2.8 px for A1 and A2, respectively. As a rough estimation and neglecting possible differences in vessel width distributions across the vessel tree, this results in an absolute scale difference between preoperative and intraoperative images in the dataset of about 2.4, at the given resolutions. This is a good match with the average ratio of optic disc areas, the square root of which can be interpreted as the scale difference: 2.4 for A1 and 2.5 for A2 – demonstrating the consistency of the annotations across domains and classes.

Table 2 also contains data on the Dice score, formulated as:

$$\,{\rm{Dice}}\,=\frac{2\ast A}{(A+B)+(A+C)}$$

where:

A = Annotated by both A1 and A2

B = Annotated by A2 but not A1

C = Annotated by A1 but not A2

D = Annotated by neither A1 nor A2

It is very high for the optic discs, and 0.68 and 0.52 for the preoperative and intraoperative vessels. As a comparison, the retinal vessel annotations in the 28 records of the (preoperative) CHASE_DB1 dataset achieve a value of 0.78. However, it is worth noting that all subjects of the CHASE_DB1 dataset were healthy, and the individual images are at a significantly lower resolution (4 times fewer pixels overall) which hides most of the most ambiguous vessels where the highest inter-annotator disagreement would be expected.

Additionally, we also calculated the extended Cohen’s Kappa coefficient²², a newer attempt at assessing the agreement between two annotators in the field of semantic segmentation:

$$\begin{array}{l}k=\frac{\bar{P}-{\bar{P}}_{e}}{1-{\bar{P}}_{e}},\,{\rm{with}}\,\bar{P}=\frac{(A+D)}{A+B+C+D}\,{\rm{and}}\,{\bar{P}}_{e}=\frac{(A+B)(A+C)}{{(A+B+C+D)}^{2}}+\frac{(C+D)(B+D)}{{(A+B+C+D)}^{2}},\\ \,{\rm{and}}\,A,B,C,\,{\rm{and}}\,D\,{\rm{as\; above}}\end{array}$$

Despite a different formulation, the values are extremely similar to the Dice score. Values above 0.4 can be seen as moderate agreement, above 0.6 as substantial agreement, and above 0.8 as almost perfect agreement²³. Again, the lower value of agreement in the intraoperative domain reflects the much increased difficulty of annotations already mentioned at the start of this section.

Visual Analysis

Figure 5 shows one annotation comparison in each domain. This underscores the difficulty of the task, and illustrates how A1 annotated some vessels that A2 did not, and vice versa. The intersection of their annotations is a viable ground truth for thicker vessels, but becomes quickly segmented when progressing to the thinnest ones, where just two pixels offset in the annotation can mean the binary masks no longer overlap fully. Note that we do not make a qualitative judgement, as both annotations should be seen as individual and independent interpretations of the ground truth made by subject experts.

Usage Notes

Smaller retinal vessel branches are often difficult to distinguish clearly. This is especially true for intraoperative imaging modalities with their increased incidence of blur, inadequate illumination, or artefacts. Different annotators will often disagree, not only on the annotation of thin vessel branches, but also on other points such as the exact interpretation of vessel widths in the presence of blur. Some datasets such as FIVES⁹ avoid this by using a more complex annotation workflow aimed at finding a common ground truth amongst a group of annotators. In this case, the goal was to provide two independent ground truths annotations by subject experts, each with their own interpretation of the data.

It is therefore up to the dataset user to decide which annotation to use, or how to combine them. To facilitate this, we not only supply the raw annotations along with the binary masks derived from them (see Methods), but also a number of Python methods which will speed up the process. With their help, the dataset can be easily reprocessed e.g. using different threshold values for the conversion into binary masks, or by removing all but the main vessel tree, i.e. the largest contiguous area of annotated pixels.

As with any data, users should be mindful of potential biases and limitations. This dataset originates from a specific hospital and a relatively small patient group. Some degree of variability across institutions is expected due to local demographic factors such as age and gender, differing patient conditions or clinical practices, as well as variations in imaging equipment. This is likely to cause subtle differences in the data distribution which may impact generalisation to that domain.

Segmentation Application

To demonstrate the value of the data especially in the intraoperative domain, we fine-tuned a Segment Anything Model (SAM)²⁴ using the Learnable Prompt SAM repository⁴ on the FOVEA dataset to predict retinal vessel segmentations. With this, we also establish a performance baseline on this new dataset.

We used the ViT-Base version of the SAM and the default SGD optimiser. To determine the effect of the loss function, we tested both the standard cross-entropy as well as Lovász-Softmax²⁵. Data train/test splits are 30/10, our suggested standard for the dataset, and 1/39, with FOVEA011 serving as the single training image. The 10 records used as the test set for the 30/10 split are: 7, 12, 26, 30, 32, 33, 37, 38, 39, and 40. They were manually chosen to balance retinal conditions, fundus appearance, and image quality between the training and test set. Code to automatically split the FOVEA dataset according to our suggestion as well as any other desired split is given in our code repository (see the Code Availability section). The learning rate was set to 0.1 and the batch size to 5. All experiments were run on a single Quadro RTX 8000 GPU with 48 GB of RAM.

An important detail is the way the images are made to fit in the model, which expects input dimensions of 1024 × 1024 px:

Intraoperative data (1080 × 1920 px) was cropped randomly during training, and cropped at the centre during testing. Alternatively, five crops – one at each corner and one in the centre – were extracted and run separately, with the resulting 1024 × 1024 px outputs combined to a single 1080 × 1920 px prediction. This was performed by polling patches on each pixel they cover, and setting a “vote” majority threshold above which the pixel would be accepted as containing a vessel segmentation. Setting the threshold at 0 is equivalent to obtaining the union of all five patch outputs, while a value of 0.5 requires e.g. both patches to agree in areas covered by two patches.
Preoperative data (1934 × 1960 px) was either cropped to 1024 × 1024 px – randomly during training, from the centre during testing – or resized to fit. While technically disregarding the aspect ratio, the almost square nature of the preoperative images means no noticeable distortion was introduced. This resizing operation also brings the mean size of optic disc and vessel features closer in line with the intraoperative data (see the Statistical Analysis section).

Results are listed in Table 3. Some straight-forward observations can be made: as usual, training and testing in the same domain results in better performance. Intraoperative Dice scores are worse than preoperative ones, matching differences in inter-annotator agreement listed in Table 2. The Lovász-Softmax loss performance confirms previous observations showing it to achieve better results than cross-entropy on some semantic segmentation tasks²⁶.

Table 3 Dice scores of SAM (Segment Anything Model) with the prompt layer fine-tuned⁴ on various FOVEA dataset splits.

Full size table

Statistics calculated for each test frame show the model trained with cross-entropy is more hesitant to predict a vessel, meaning that there are fewer true positives but also fewer false positives – therefore, polling patches with a threshold of 0 improves results noticeably as it does not accumulate as many disparate false positives around the image. Training with Lovász-Softmax increases both true and false positives, therefore majority polling with a threshold of 0.5 is preferred, yielding slight increase in performance. This effect is also visible in Fig. 6.

The best-performing model was also evaluated on four of the videos in the FOVEA dataset, all belonging to the test split and therefore representing unseen data. Results can be viewed on the Robotics and Vision in Medicine YouTube channel as one continuous clip with a black screen inserted for 1 s between the individual videos, showing an overlay of the segmentation prediction on the original image. Notable failure moments include a zoomed-in section at 00:11 as well as a number of strongly blurred frames at 00:40.

In all experiments, testing on resized instead of merely cropped preoperative data leads to better performance, as this matches the scale of features present in the training data better. The only exception is, unsurprisingly, when the training data itself is made from preoperative image crops. While not unexpected, this result stresses the importance of considering relative feature scales when moving across domains.

Interestingly, even training on a single intraoperative frame is sufficient to obtain a useful output on the 39 remaining images, though training for too many steps tends to degrade the result. This demonstrates the capabilities inherent to extensively pre-trained foundation models despite their training images being unrelated to the target domain, and reinforces the point that even a small number of high-quality annotations in a new domain can be powerfully leveraged with modern deep learning methods.

Making use of both annotations available for each image in the dataset by randomly loading either one or the other mask during training increases performance. We presume the increased diversity in the annotation reduces overfitting on the training set, resulting in better generalisation to the test set. This is supported by the following observation: the overfitting when going from 10000 to 30000 steps using the i-1/39-1 split, indicated by a decrease in performance across the board, is absent for i-1/39-12 (Table 3), i.e. when using both annotations. The single additional ground truth, though of the same image, adds enough diversity for the model to generalise better to the unseen test set of 39 images.

It is important to note that these results were obtained without fine-tuning hyperparameters. We also did not develop a more sophisticated data training pipeline, which we would expect to be especially beneficial given the low amount of data. For example, the albumentation²⁷ augmentations (https://albumentations.ai/docs/getting_started/transforms_and_targets) “FancyPCA” would provide more realistic colour variations, while “ElasticTransform” would increase the geometric variety – and applying more scaling transforms and some amount of random blur might well avoid the failure modes observed in the videos. For appropriate applications, the larger amount of labelled data already available in the preoperative domain could be leveraged as an additional pre-training step before fine-tuning on the intraoperative data. Using a larger SAM, e.g. ViT-Huge, is likely to improve performance as well. Finally, it should be noted that techniques such as k-fold cross validation would provide more robust results, though this goes beyond the scope of this work.

Code availability

Code used in the Methods and Technical Validation sections as well as additional functions to process, explore and visualise the dataset are available from https://github.com/rvimlab/FOVEA under the MIT license. It runs in a current Python 3.12 environment, and the dependencies are available in the same repository. The README file contains information on the correct usage, and documentation is present for all code. A second repository, https://github.com/rvimlab/LearnablePromptSAM, was cloned and adapted from https://github.com/Qsingle/LearnablePromptSAM. Again, the README file contains information about the usage of the added files, and the code itself is documented.

References

Bernardes, R., Serranho, P. & Lobo, C. Digital ocular fundus imaging: A review. Ophthalmologica 226, 161–181 (2011).
Article PubMed Google Scholar
Tong, Y., Lu, W., Yu, Y. & Shen, Y. Application of machine learning in ophthalmic imaging modalities. Eye and Vision 7, 1–15 (2020).
Article Google Scholar
Bommasani, R. et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258 (2021).
Qiu, Z., Hu, Y., Li, H. & Liu, J. Learnable ophthalmology SAM. arXiv preprint arXiv:2304.13425 (2023).
Baranchuk, D., Voynov, A., Rubachev, I., Khrulkov, V. & Babenko, A. Label-efficient semantic segmentation with diffusion models. In Proceedings of the International Conference on Learning Representations (2022).
Hoover, A., Kouznetsova, V. & Goldbaum, M. Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Transactions on Medical imaging 19, 203–210 (2000).
Article CAS PubMed Google Scholar
Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A. & Van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE transactions on medical imaging 23, 501–509 (2004).
Article PubMed Google Scholar
Owen, C. G. et al. Measuring retinal vessel tortuosity in 10-year-old children: Validation of the computer-assisted image analysis of the retina (CAIAR) program. Investigative ophthalmology & visual science 50, 2004–2010 (2009).
Article Google Scholar
Jin, K. et al. FIVES: A fundus image dataset for artificial intelligence based vessel segmentation. Scientific Data 9, 475, https://doi.org/10.1038/s41597-022-01564-3 (2022).
Article PubMed PubMed Central Google Scholar
Zhang, Z. et al. ORIGA-light: An online retinal fundus image database for glaucoma analysis and research. In 2010 Annual international conference of the IEEE engineering in medicine and biology, 3065–3068 (IEEE, 2010).
Bajwa, M. N. et al. G1020: A benchmark retinal fundus image dataset for computer-aided glaucoma detection. In 2020 International Joint Conference on Neural Networks (IJCNN), 1–7 (IEEE, 2020).
Kovalyk, O. et al. PAPILA: Dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment. Scientific Data 9, 291, https://doi.org/10.1038/s41597-022-01388-1 (2022).
Article PubMed PubMed Central Google Scholar
Chen, C., Chuah, J. H., Ali, R. & Wang, Y. Retinal vessel segmentation using deep learning: A review. IEEE Access 9, 111985–112004 (2021).
Article Google Scholar
Panahi, A. et al. Autonomous assessment of spontaneous retinal venous pulsations in fundus videos using a deep learning framework. Scientific Reports 13, 14445 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Braun, D., Yang, S., Martel, J. N., Riviere, C. N. & Becker, B. C. EyeSLAM: Real-time simultaneous localization and mapping of retinal vessels during intraocular microsurgery. The International Journal of Medical Robotics and Computer Assisted Surgery 14, e1848 (2018).
Google Scholar
Mukherjee, S., Kaess, M., Martel, J. N. & Riviere, C. N. EyeSAM: Graph-based localization and mapping of retinal vasculature during intraocular microsurgery. International journal of computer assisted radiology and surgery 14, 819–828 (2019).
Article PubMed PubMed Central Google Scholar
Vander Poorten, E. et al. Robotic retinal surgery. In Handbook of robotic and image-guided surgery, 627–672 (Elsevier, 2020).
Ropelato, S., Menozzi, M., Michel, D. & Siegrist, M. Augmented reality microsurgery: A tool for training micromanipulations in ophthalmic surgery using augmented reality. Simulation in Healthcare 15, 122–127 (2020).
Article PubMed Google Scholar
Ravasio, C. S., Flores-Sanchez, B., Bloch, E., Bergeles, C. & Da Cruz, L. FOVEA: Preoperative and intraoperative fetinal fundus images with optic disc and retinal vessel annotations. figshare https://doi.org/10.6084/m9.figshare.28329338 (2025).
Pennesi, M. E., Birch, D. G., Duncan, J. L., Bennett, J. & Girach, A. Choroideremia: Retinal degeneration with an unmet need. Retina (Philadelphia, Pa.) 39, 2059 (2019).
Article CAS PubMed Google Scholar
Ang, M. et al. Imaging in myopia: Potential biomarkers, current challenges and future developments. British Journal of Ophthalmology 103, 855–862 (2019).
Article PubMed Google Scholar
Yang, F. et al. Assessing inter-annotator agreement for medical image segmentation. IEEE Access 11, 21300–21312 (2023).
Article PubMed PubMed Central Google Scholar
Landis, J. & Koch, G. The measurement of observer agreement for categorical data. Biometrics 33, 159174 (1977).
Article Google Scholar
Kirillov, A. et al. Segment anything. Proceedings of the IEEE/CVF international conference on computer vision (2023).
Berman, M., Triki, A. R. & Blaschko, M. B. The lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4413–4421 (2018).
Pissas, T., Ravasio, C. S., Da Cruz, L. & Bergeles, C. Effective semantic segmentation in cataract surgery: What matters most? In Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part IV 24, 509–518 (Springer, 2021).
Buslaev, A. et al. Albumentations: Fast and flexible image augmentations. Information 11 (2020).

Download references

Acknowledgements

This work was supported by The Michael Uren Foundation, and by core funding from the Wellcome/EPSRC Centre for Medical Engineering [WT203148/Z/16/Z]. For the purpose of Open Access, the Author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission. We would like to thank Dr Sepehr Jalali for advice on the annotation protocol and the pilot annotation study.

Author information

Authors and Affiliations

School of Biomedical Engineering & Imaging Sciences, King’s College London, London, SE1 7EU, UK
Claudio S. Ravasio & Christos Bergeles
Sheffield Teaching Hospitals, Sheffield, S5 7AU, UK
Blanca Flores-Sánchez
Institute of Ophthalmology, University College London, London, EC1V 9EL, UK
Blanca Flores-Sánchez, Edward Bloch & Lyndon da Cruz
University Hospitals Bristol, Bristol, BS1 2LX, UK
Edward Bloch
Moorfields Eye Hospital, London, EC1V 2PD, UK
Lyndon da Cruz

Authors

Claudio S. Ravasio
View author publications
Search author on:PubMed Google Scholar
Blanca Flores-Sánchez
View author publications
Search author on:PubMed Google Scholar
Edward Bloch
View author publications
Search author on:PubMed Google Scholar
Christos Bergeles
View author publications
Search author on:PubMed Google Scholar
Lyndon da Cruz
View author publications
Search author on:PubMed Google Scholar

Contributions

C.S.R. performed the data processing, analysis, and validation, conducted the deep learning experiments, wrote all the code, and wrote the manuscript. B.F.S. annotated the data. EB developed the annotation protocol, carried out the data selection, and annotated the data. L.D.C. and C.B. initiated, guided and funded the research and helped develop the annotation protocol. All authors advised on and reviewed the manuscript.

Corresponding author

Correspondence to Claudio S. Ravasio.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Ravasio, C.S., Flores-Sánchez, B., Bloch, E. et al. FOVEA: Preoperative and intraoperative retinal fundus images with optic disc and retinal vessel annotations. Sci Data 12, 703 (2025). https://doi.org/10.1038/s41597-025-04965-2

Download citation

Received: 21 December 2023
Accepted: 07 April 2025
Published: 26 April 2025
DOI: https://doi.org/10.1038/s41597-025-04965-2