Introduction

Over the last few decades, in vitro fertilization (IVF) has revolutionized reproductive therapy for infertile patients and has provided several approaches to achieve a successful pregnancy1,2. Previously, this practice was based on the transfer of more than one embryo, sometimes leading to multiple pregnancy and, consequently, resulting in higher maternal and neonatal risks than those in singleton pregnancies3. With the technological development of IVF laboratories, single embryo transfer (SET) is now recommended to achieve a healthy pregnancy, but maintaining success rates is challenging1,4. The selection of the embryo with the highest probability of implantation still relies largely on the subjective judgment of embryologists5.

Historically, the selection of embryos has been based on the evaluation of their morphology, and for embryos in the blastocyst stage, three parameters, namely, the degree of embryo expansion, the homogeneity of the inner cell mass (ICM), and the trophectoderm (TE), are usually evaluated. The Gardner classification5,6,7 is one of the most widely used criteria in clinical practice. Although well established and used worldwide, these parameters are not sufficiently precise to accurately predict potential for success, as they depend on the visual observation of the embryologist. Therefore, subjectivity and consequent divergences (inter- and intra-embryologists variations) are intrinsic factors of this evaluation system5,8.

The introduction of the time lapse system (TLS), with the acquisition of multiple images at different times of embryonic growth, quickly became popular because of its non-invasive nature and maintenance of ideal culture conditions9. The TLS allows the monitoring of embryonic development, from the zygote stage to full blastocyst expansion, without the need to remove the culture dish from its ideal conditions for morphological evaluation. The images obtained from embryos in various planes and at regular intervals can be used in digital processing programs to determine quality, thus enhancing the prediction of clinical pregnancy7,10,11.

Initially, some studies extracted images obtained with the TLS and analysed them with mathematical variables representative of known morphological attributes of the blastocyst12,13,14,15. Chéles et al.16 developed an automated processing protocol based on images from TLS in EmbryoScope (Vitrolife, Sweden) and Geri (Genea Biomedx, Australia) incubators, which generated 33 variables categorized as texture, mean grey level, grey level standard deviation, modal value, ICM area and diameter, ET thickness, and light level, among others. These variables have the potential to be used as inputs for the development of an artificial intelligence (AI) program to predict the viability of the embryo produced by IVF17,18.

Artificial intelligence (AI) aims to replicate human cognitive processes in order to address complex problem-solving tasks. It comprises a broad spectrum of computational approaches, including multilayer perceptron artificial neural networks (MLP ANNs), genetic algorithms (GAs), deep learning (DL), convolutional neural networks (CNNs), fuzzy logic systems, and machine learning (ML) techniques19.

In studies related to IVF, AI has wide application potential for sperm selection20oocyte selection21embryo classification12,22,23pregnancy prediction24live birth13,14,23 and embryonic ploidy25.

Although these examples are based on different methods, they demonstrate the potential of AI methods to automate the embryonic evaluation while considering the complexity of the embryological variables26 and as a complementary system to the current performance of manual and subjective selection27.

Although there is still some resistance to full confidence in AI programs28there are currently commercially available AI-based software programs on the market that play important roles in the clinical analysis of embryos in a retrospective/prospective manner29. Examples of this are those developed for application in reproductive medicine, such as the iDAScore (Vitrolife) implemented in the EmbryoScope incubator30the AI Chloe of the Fairtility group31 and the AI EMA of the AIVF group32among other existing software. Thus, there is a clear technological demand in assisted reproduction for an automated and objective methodology for the embryonic evaluation aiming, ultimately, to increase the probability of healthy live births and to decrease the complications of multiple pregnancy in couples undergoing treatment by assisted reproduction technology (ART)33.

Designing an AI model specifically for a Brazilian population enables results that more accurately reflect the country’s distinctive genetic diversity, with the largest population in Latin America34. Disparities in health outcomes across ethnic groups are particularly evident in reproductive and endocrine health, with infertility showing some of the most marked differences. For instance, ovarian reserve has been observed to differ among populations; women of latina and chinese descent between the ages of 40 and 45 may have lower antimullerian hormone (AMH) levels than African american women. In this regard, a steepest rate of decline was observed among Chinese women (10.5%), whereas African American women showed the lowest decrease (6.3%)35. Moreover, data from the Society for Assisted Reproductive Technology (SART) database indicate that clinical pregnancy and live birth rates are lower among Black, Asian, and Hispanic women compared to White women36. These considerations highlight the critical importance of accounting for population diversity in the development of a new reproductive technology.

The objective of this study was to use AI methods, particularly MLP ANNs associated with GAs, to predict gestational success (presence of a gestational sac and foetal heartbeat) from morphological variables automatically extracted from digital processing of images of blastocysts produced by IVF. Furthermore, the establishment of a technological domain in this theme – an innovation in Brazil and South America — represents the acquisition of competence with the differential AI training with a customized image bank, i.e., specific to the patients treated by the collaborating clinic in São Paulo, Brazil, who have specific demographic and ethnic characteristics typical of the country37,38. As a secondary objective, multicentre clinical tests were conducted as part of a prospective observational study (i.e., the model evaluation). This study aimed to evaluate the performance of the MAIA algorithm in a real clinical setting, using embryo images and associated clinical outcomes obtained in routine care.

Results

Single and mode accuracy of the MLP ANNs

MAIA (an acronym for Morphological Artificial Intelligence Assistance) was developed based on the five best-performing multilayer perceptron artificial neural networks. During the learning process, the models were trained and validated using a dataset of morphological images, with the aim of predicting clinical pregnancy (CP) outcomes. The data were divided into two distinct subsets: training and validation, as detailed in Table 1.

Table 1 Results of the training and validation data of the 5 best MLP ANNs obtained by gas.

To further assess the models’ generalization capability, internal validation was performed. In this evaluation, the MLP ANNs exhibited more consistent performance, achieving accuracies of 60.6% or higher (Table 2).

Table 2 Performance of MLP ANNs in the internal validation.

The analysis of the area under the curve (AUC) and the receiver operating characteristic (ROC) curves for both the training and for internal validation results are presented in Table 3; Fig. 1a,b, respectively.

Table 3 AUC values for training results and internal validation for responses to whether there is a clinical pregnancy (CP +) or no clinical pregnancy (CP −).
Fig. 1
figure 1

(a) ROC curves for the training results for the positive and negative clinical pregnancy cases; (b) ROC curves for the internal validation results for clinical pregnancy positive and negative cases.

Fig. 2
figure 2

Confusion matrix.

When all the results presented by the MLP ANNs were normalized (Supplementary Text 1, Supplementary Tables 1–8) and the mode between the ANNs was applied, the MAIA software result was 77.5% correct in the prediction of clinical pregnancy positive and 75.5% correct for the prediction of clinical pregnancy negative. The confusion matrix39 is presented in Fig. 2. The AUC for this case is presented in Table 4 and ROC curves in Fig. 3.

Table 4 AUC after normalization and mode application.
Fig. 3
figure 3

ROC curves for the normalized data and after applying the mode for the 5 MLP ANNs for the internal validation.

A graphical user interface was developed for MAIA and designed to facilitate its use in the daily routine of assisted reproduction clinics. This interface is shown in Supplementary Text 2, Supplementary Fig. 1a-1b, Supplementary Fig. 2–3 and Supplementary Text 3. A video demonstrating the entire operation of the graphical interface is presented in Supplementary Video 1.

MAIA performance in evaluation of the prediction model

In model evaluation (i.e., tests performed under a real multicentre clinical routine), using the MAIA, version 4.0, graphical interface, the clinical pregnancy rate of all patients who underwent embryo transfer was 53% (n = 106/200). In centres A, B and C, the clinical pregnancy rates were 51.8%, 61.3% and 40.9%, respectively. Detailed patient and cycle characteristics are summarized in Table 5.

Table 5 Patient and cycle characteristics used for MAIA clinical model evaluation.

MAIA scores (Supplementary Fig. 1b) between 0.1 and 5.9 were considered negative predictors of clinical pregnancy, and scores between 6.0 and 10.0 were considered positive predictors (Supplementary Text 4 and Supplementary Fig. 4). The AUC, considering all cases, was 0.65. In general, the accuracy of MAIA between positive and negative clinical pregnancies was 66.5%. Among the centres that independently evaluated the embryos (n = 200) with the aid of MAIA, centres A, B and C obtained correct rates of 67.9% (40.5% participating, n = 81, of the total number of analyses), 69.3% (37.5%, n = 75, of the total cases) and 59.1% (22.0%, n = 44, of the total cases), respectively. Linear regression analysis showed that MAIA’s predictions (both correct and incorrect) were strongly correlated with clinical pregnancy outcomes (CP + and CP−), with R values ranging from 0.65 to 1.0 (P < 0.001). In contrast, embryologists’ selections across the three centres yielded R values ranging from 0.053 to 0.685, with P values varying from non-significant (P = 0.792) to statistically significant (P = 0.001) (Table 6). The graphs representing the comparison of R and P values between centres A, B and C and MAIA are presented in Supplementary text 5 and Supplementary Fig. 5–8.

Table 6 Comparison of R and P values between centres A, B and C (performed by embryologists) and MAIA.

Among the patients, 93 had only one embryo to be transferred (nonselective cases), and considering only those cases, the AUC was 0.65, with an accuracy of 62.4%.

For elective cases, where 107 patients had more than one embryo eligible for transfer (total of 284 embryos), of the 107 transferred embryos, the AUC was 0.60, with an accuracy of 70.1% of the clinical pregnancy result. In cases in which the MAIA was responsible for the choice of embryo, the embryologist would choose another embryo (n = 44), the accuracy was 81.8%, and the clinical pregnancy rate was 75%. In cases in which the embryologist was responsible for the choice of MAIA and disagreed with the choice of MAIA (n = 38), the clinical pregnancy rate was 47.4%, and the success rate of MAIA was 60.5% in cases of single embryos selected for transfer. In the remaining elective cases (n = 25), the choices of the embryologists and MAIA were in agreement, and the clinical pregnancy and MAIA accuracy rates were 64.0%.

Most of the analysed embryos reached the blastocyst stage on the 5th day of development (n = 163/200, 81.5%), and MAIA correctly predicted 68.1% of the embryos. For the embryos that were analysed on the 6th day (n = 36/200, 18%), the accuracy of MAIA was of 61.1%. There was only one embryo, on Day 4, in which the MAIA prediction was negative and the embryo resulted in a clinical pregnancy. The combined AUC for Days 5 and 6 was 0.64.

For euploid embryos (n = 122/200, 61% of the transferred embryos), the accuracy of MAIA was 69.7%, with an AUC of 0.67, whereas for non biopsied embryos (n = 78/200, 39%), the accuracy was 61.5%, and the AUC was 0.62.

Discussion

In summary, in this work, we propose an AI-based platform (named MAIA) to aid embryologists in their clinical routine for embryo assessment. MAIA was developed entirely by a Brazilian partnership (university–private clinic) and involved a database from three reference centres in the city of São Paulo that provide assistance to patients from all over the country.

Our results indicate that the use of morphological parameters from time-lapse images of embryos7,11 strongly correlates with embryonic morphology and clinical pregnancy. In addition, the time-lapse system provides a predictive improvement in terms of morphological and morphokinetic information compared with standard incubators40,41,42. In the present study, the standardization of image quality (pixels, lighting, etc.) obtained at different centres using the same time-lapse system43 enabled the use of a multicentre approach. Corroborating the aforementioned studies, the present study used static images of blastocysts obtained from time-lapse technology for the digital processing described by Chéles et al.16 to obtain mathematical variables predictive of the morphological quality of the blastocyst. The results of the software developed in the internal validation showed predictions of 77.5% (CP +) and 75.5% (CP -), which indicate high potential for continuing its daily use in assisted reproductive care clinics. This fact is supported by the prospective analysis performed in three assisted reproductive care clinics, where MAIA was able to achieve an accuracy for clinical pregnancy of 70.1% in elective embryo transfers and 62.4% in nonelective embryo transfers.

As in the present study, the quantitative analysis of embryo characteristics on the basis of digital images has already been tested44,45 and has become an active area of research for application in AI. The acquisition of variables representative of the embryo by image processing was also used by Chavez-Badiola et al.46who developed an algorithm to predict biochemical pregnancy (β-hCG positive). These authors used machine learning to extract data on morphometric characteristics from embryo images. Unlike our study, these authors did not use time-lapse images but rather conventional microscopy. On the basis of staining of the blastocyst image performed by the embryologist, the algorithm was able to extract 24 image attributes that were related to pixel intensity, area and perimeter, resulting in accuracies of 0.75 and 0.62 (for the support vector machines method, respectively) and for the random forest method, per Chavez-Badiola et al.46. The method can be considered semiautomated and has been previously proposed for murine embryos47. In our study, all processing (i.e., digital image processing, mode prediction of the MLP ANNs, the MAIA score and the fold change) occurs automatically from the moment the blastocyst image is selected for analysis by MAIA.

Images in multiple focal planes of the embryo have been used in the development of automated models for the evaluation of embryonic quality. Wang et al.48 applied a deep learning model to automate the evaluation of the quality of “good-quality” or “bad-quality” blastocysts and used more than 10,000 images of embryos from SLT in 11 focal planes. The training of the model developed was based on the classification by Gardner and Schoolcraft6 and was performed by 5 experienced embryologists. They obtained an accuracy of 0.91 and an AUC of 0.93. Similar to what was performed in this study, Wang et al.48 removed images in which the blastocyst was incomplete or the image was blurred from the database. These data are not yet related to implantation or live birth data and were derived from only one reproductive care centre48. Depending on the image processing method used, the use of multiple focal planes may be more useful than the use of a single image in a focal plane for automating the morphological evaluation of the blastocyst. However, it can lead to inefficient AI, as there may be focal planes in which the embryo is blurred, leading to misinterpretation by AI. In our study, 33 input variables derived from processing16 were obtained from a single image (i.e., a single focal plane) and are directly correlated with the clinical outcome of the embryos.

Evaluating the entire video of embryonic development captured in time-lapse culture using EmbryoScope, Tran et al.24 applied an AI model based on deep learning, called IVY. In this proposal, more than 8,000 embryos from 8 clinics were used. The model obtained an average AUC of 0.93 in a 5-fold stratified cross-validation for the prediction of a successful pregnancy. Interestingly, although the dataset used for training is considered unbalanced (mostly negative CP data compared with positive CP data), in the experiments used to define this stage, the best result was achieved — possibly through training involving the entire embryonic development process, from zygote to blastocyst. The AUC was the only measure described in the study. Additionally, although multicentre data were used, the study was limited to a retrospective analysis24.

In a subsequent study in which the database used for publication by Tran et al.24 was expanded, Berntsen et al.49 applied deep learning, called iDAScore v1.0, to predict a foetal heartbeat. They used 115,832 image sequences from EmbryoScope Plus, in partnership with 18 clinics. Unlike the present study, the authors considered not only transferred embryos but also nontransferred embryos (termed discarded) in the model training process, and these embryos were pseudo-labelled as negative predictors in an attempt to make the evaluation more automated considering all the embryos in a cohort. The model obtained an AUC of 0.95 considering the entire cohort of embryos and 0.67 in the test set for embryos with known implantation49.

With the implantation outcome known, Fruchter-Goldmeier et al.50 performed a retrospective study that included 608 embryos also incubated at EmbryoScope. Unlike our study, they did not use embryos transferred after cryopreservation; embryos were analysed by preimplantation genetic testing, and only autologous cycles were considered. For training of the model developed, manual marking was performed in the parts corresponding to the ICM, trophectoderm and blastocyst perimeter by experienced embryologists. The segmentation model applied to the manual markings of the blastocysts was the Chloe™ by Fairtility LTD, and a result of 0.70 was obtained. The authors concluded that the embryos that resulted in implantation had a lower ICM size-to-blastocyst size ratio than did those that did not result in implantation. This segmentation model is similar to our study in that it determines quantitative measurements of parts of the blastocyst, as well as its entirety, for application in AI50although our study is fully automated in embryo segmentation.

Although our study presents an AI model developed on the basis of retrospective data and, in addition, on the performance of MAIA in a multicentre clinical routine trial, a prospective randomized double-blind noninferiority trial has not yet been performed. A study with these characteristics was performed by Illingworth51who demonstrated that deep learning is not inferior to standard morphology in terms of clinical pregnancy.

In this regard, our study has several limitations. Despite the high training AUC results and the lower performance in the internal validation (Table 3), there is no clear evidence of model overfitting. This is further supported by the 66.5% accuracy observed in multicentre clinical tests, which although still subject to optimization, demonstrates the model’s ability to generalize and perform consistently in real-world clinical settings. Furthermore, prospective double-blind randomized analysis of any AI model is an important step after retrospective analysis and its absence can be considered one of the main limitations of the present study. The morphokinetic parameters of embryonic development and the clinical data of patients constitute additional information to be applied as inputs in our algorithm in the future.

In recent years, the use of AI in human reproduction has increased46,52,53and there are commercially available options for its routine clinical use54. An example is iDAScore, a deep learning algorithm developed for Vitrolife’s EmbryoScope Plus, which uses 3D convolutional neural networks to automatically identify both spatial (morphological) and temporal (morphokinetic) patterns from raw time-lapse image sequences in order to rank embryos and predict clinical pregnancy outcomes55. In a validation study of iDASCore v2.0, Lassen et al. (2023)55 evaluated the algorithm’s performance using data from over 100,000 embryos at different days of development. For embryos assessed on day 5 or later, the model achieved test AUC of 0.694. In our study, MAIA demonstrated a comparable performance in the multicentre test, reaching an AUC of  0.64 for embryos on days 5 and 6.

In contrast, Chloe system, from the Fairtility group, uses AI algorithms to analyse images of embryos in time during IVF and automatically notes and classifies morphokinetic and morphological events, providing information for embryo selection and clinical research and automated annotations. Evidence regarding the application of Chloe in routine clinical practice remains limited. In a retrospective study, the model reported an AUC of 0.64 for predicting clinical pregnancy following SET; however, it is unclear whether this value refers to a training or test dataset, which limits the interpretability and generalizability of the finding56.

Developed, trained and validated entirely in Brazil, the MAIA software has the potential to become part of the daily routine of assisted reproductive care clinics, given its prospective predictive performance in 3 different IVF clinics. MAIA may provide support for the appropriate selection of embryos to be transferred on the basis of the automatic evaluation of a single image of the blastocyst obtained without interrupting its culture, improving the gestational success rate and reducing the number of cycles required for a blastocyst to yield a healthy pregnancy.

To our knowledge, no other study has proposed the use of predictive variables derived from morphological quality for application in AI software (using MLP and GA ANNs) and use in clinical practice with a user-friendly interface. Additionally, the methodology proposed in our study (through image processing prior to AI analysis) is original to our group16. Because it is fully automated, together with the application of AI methods, it makes the process less dependent on the subjectivity and experience of the embryologist in the evaluation and annotation of morphokinetic or morphological variables, which are normally included in traditional embryonic analysis. This fact was observed in a study using a methodology that was extremely similar to that of the present study45with in vitro-produced bovine blastocysts, in which the agreement (Cohen’s kappa statistic) between the 3 best-trained embryologists was lower than the agreement of the 3 best MLP ANNs when the same digital image was analysed.

In conclusion, we developed a fully automated AI-based software, MAIA, capable of ranking blastocyst images uploaded by the user and providing a robust, objective assessment to complement the embryologist’s expertise. In this study, MAIA demonstrated a strong correlation with clinical pregnancy outcomes and, in some metrics, performed comparably or superior to embryologists’ selections. These findings suggest that MAIA can serve as a valuable decision-support tool, enhancing consistency and objectivity in embryo selection while preserving the clinical judgment of experienced professionals.

Methods

Database

In this retrospective study, data from 1,015 embryos from 1,015 in vitro fertilization cycles of 891 patients who underwent single embryo transfer (fresh and frozen between November 2017 and June 2022) at three different assisted reproduction centres were used. Detailed patient and cycle characteristics are summarized in Table 7. The study was approved by the Brazilian National Council for Research Ethics (CONEP), through the Research Ethics Committee (CEP) of Hospital Heliópolis – UGA I, São Paulo/Brazil, under number CAAE 06081218.4.0000.5449. In addition, all patients signed the Free and Informed Consent Form. This study follows the guidelines stipulated in TRIPOD – AI in the development and evaluation of the prediction model57.

Table 7 Patient and cycle characteristics used for MAIA model training.

The mean age of all patients was 38.8 ± 4.5 years, and the mean BMI was 23.0 ± 3.2. Patients in autologous cycles (mean age 37.4 ± 3.7 years and BMI 22.8 ± 3.2) constituted 74.2% (n = 753) of the patients, and among them, 75.0% (n = 565) had undergone preimplantation genetic testing for aneuploidy (PGT-A). A total of 25.8% (n = 262) of the patients used donated eggs (mean age 42.8 ± 4.0 years and BMI 23.7 ± 3.3) and 20.6% (n = 54) underwent PGT-A. The inclusion criterion was a single embryo transfer with clinical pregnancy results confirmed by ultrasound (positive cases) and willingness to sign of the informed consent form at the Huntington Clinic Huntington Reproductive Medicine, São Paulo, Brazil.

The embryos were cultured to the blastocyst stage in incubators fitted with an Embryoscope + time-lapse system (Vitrolife) that acquires images every 10 min in 11 focal planes in 2048 × 1088 pixels (2.2 MP) with a 12-bit monochrome CMOS camera (EmbryoScope™+ incubator user manual, 2024)43.

Using the focal planes provided by the EmbryoScope, the embryologist selected and exported a single image of the expanded blastocyst — captured at the focal plane that offered the best visualization of the inner cell mass and trophectoderm — with a resolution of 500 × 500 pixels for analysis by MAIA. Of the 1,015 blastocyst images from the retrospective cohort, 755 images (74.4%) were used for the effective learning of AI, 174 were used for the internal validation (17.1%), and 86 were excluded from the study (8.5%). The reason for exclusion was incomplete or suboptimal visualization of the blastocyst, which could compromise accurate evaluation. Specifically, embryos were excluded due to incomplete visualization of the blastocyst (n = 30), oval-shaped appearance that hindered full structural assessment (n = 11), being out of focus (n = 23), not having reached the blastocyst stage (n = 15), or incomplete data in the database (n = 7). These exclusions were necessary to ensure that only high-quality, standardized images were used for reliable analysis and consistent AI assessment, as poor image quality or incomplete development could bias the results.

Standardization of blastocyst image processing

Previously developed digital processing software was used with the MATLAB® platform, which automatically extracts 33 mathematical variables representing the morphological characteristics of the embryo from the digital image of the blastocyst16.

Algorithm for artificial intelligence

For the prediction of clinical pregnancy positive or negative (CP + or CP -), an artificial intelligence algorithm was developed on the MATLAB® platform, which uses the method for multilayer perceptron artificial neural networks associated with the genetic algorithm (GA) method.

The AI algorithm included the 33 variables derived from the previous digital processing of the blastocyst images as inputs. For the training and validation phases, an MLP ANN with 1 to 3 intermediate layers was used, where the number of neurons varied between 20 and 500 in each layer. The output was a numerical CP prediction vector (between the highest probability of CP + and the highest probability of CP-). The stopping criterion for the MLP ANNs was the number of epochs, which was between 50 and 700.

The learning algorithm employed was backpropagation, which minimizes error by comparing predicted and actual outputs and adjusting the connection weights accordingly. To train the MLP ANN, the dataset (755 images) was split into 70% for training and 30% for validation. Several hyperparameters were tuned during training, including the number of hidden layers, the number of neurons per layer, the learning rate, and the transfer functions. The transfer functions — applied randomly during the learning process — included tansig, logsig, purelin, hardlim, tribas, radbas, and satlin58,59.

The GA method was used to determine the most accurate MLP ANN architecture for the prediction of CP + and CP-. Initially, a random population with different MLP ANN architectures was built according to the aforementioned specifications. This population ranged from 100 to 1,000 individuals (i.e., the individual being the specific architecture of an MLP ANN). After the initial generation, the following generations were constructed considering 20 to 30% of the most accurate individuals of the previous population; 50 to 60% of the individuals were introduced by recombination (crossing over) of the individuals selected as the most accurate; 15% came from the migration of newly created MLP ANN architectures, and 5% came from mutation (i.e., previous architectures with random point modifications). Thus, the aim was to ensure that the populations (after the initial population) had constant variability and potential for the detection of the best individuals (more accurate MLP ANNs), a process called elitism60. As the stopping criterion for the GA method, 100 to 500 generations were adopted, i.e., epochs (illustrative flowchart in Supplementary Text 6 and Supplementary Fig. 9). As an illustration, the complete process of applying the AI method, from digital processing to the ANN MLP and the GA, is shown in Supplementary Text 7 and Supplementary Fig. 10.

Multicentric routine clinical tests

The multicentric clinical tests were conducted as part of a prospective observational study carried out across three IVF centres of the Huntington Group (named A, B and C). There was no intervention applied to modify clinical or laboratory practice; instead, the study aimed to evaluate the performance of the MAIA algorithm in a real-world clinical setting, using embryo images and associated clinical outcomes obtained under routine care. At each centre, 3 previously trained embryologists were assigned to this test. Thus, a total of 9 embryologists participated in the evaluation. The tests were performed between October 2023 and August 2024, totalling 200 single embryo transfers.

The test was performed concurrently at the 3 centres, and the following data were computed: date of single embryo transfer; the patient’s ID; the transferred embryo number; whether it was an elective case (more than one embryo available for transfer) or nonelective case (only one embryo available for transfer); whether the embryo was biopsied (when biopsied, only euploid embryos were considered – PGT-A threshold < 30% aneuploidy61); the day of embryonic development (day 4, day 5 or day 6); the total number of embryos analysed; the number of the first to fifth embryos (in descending order per the MAIA score and depending on the number of embryos available for each patient); the tiebreaker in the choice of embryo to be transferred (choice on the basis of MAIA, on the embryologist or if both agreed in the choice); and observations of the result generated by the “show process” button, such as an incorrectly segmented blastocyst image; if there was an early pregnancy (β-hCG positive or negative), if the result was positive, or if the result was a clinical pregnancy (i.e., the presence of a gestational sac and foetal heartbeat). The statistical analyses of the clinical trial data were subsequently performed separately and together for all 3 centres.