Introduction

Valvular Heart Disease (VHD) is an increasingly significant health issue in the United States, particularly as the population ages. Each year, over 5 million new VHD diagnoses are made, with deaths rising by 2.8% annually due to this condition1,2. VHD encompasses a range of disorders that lead to valvular dysfunction, including calcific aortic valve disease, degenerative mitral valve disease, and both primary and secondary tricuspid valve disease. The progression of VHD is associated with worsening prognosis with the development of heart failure, sudden cardiac arrest, and overall heightened mortality rates3.

Historically, advancements in diagnostic techniques, notably echocardiography, have improved patient outcomes. However, current methods for classifying VHD severity largely depend on expert-driven heuristics, where the knowledge has been derived using clinical studies that have focused on the incremental value of single-variable analyses for defining VHD severity and risk stratification. Recent studies highlight care variations in VHD using conventional expert-driven approaches that may cognitively limit abilities in the timely recognition and referral of complex VHD phenotypes, resulting in significant undertreatment in VHD4,5.

With improved capabilities to analyze multiple factors from echocardiographic data, there is a need for enhanced classification tools. This is where unsupervised machine learning (ML) offers promise, enabling more personalized approaches by integrating diverse clinical, imaging, and laboratory variables. Research is underway to apply ML methods to redefine patient phenotypes and uncover new treatment options, moving beyond traditional classification systems6,7,8,9,10,11,12,13. This review will explore current and future applications of unsupervised ML in VHD diagnostics and prognosis, focusing specifically on aortic stenosis, mitral regurgitation, and tricuspid regurgitation.

Results

Aortic Stenosis

Aortic Stenosis (AS), the most prevalent type of VHD, accounts for 1/3 of VHD cases in Europe14. AS is estimated to cause 127,000 global deaths in 2019 and a loss of 1.8 million disability-associated life years3,15. Globally, rheumatic heart disease is the predominant etiology of AS. At the same time, in developed regions like Europe and North America, calcific AS is more common and is linked to aging and chronic cardiovascular conditions3. Accurate assessment of disease severity in AS is vital for monitoring disease progression and optimal timing of aortic valve intervention to improve patient outcomes16. ML provides comprehensive risk assessment by integrating ventricular remodeling and phenotypic data with traditional taxonomy determined by echocardiographic measures, such as aortic valve area and transvalvular gradients1,17,18,19,20. Incorporation of this additional complexity helps us better understand the continuous nature of AS4.

Predicting disease severity

Algorithms can identify high-risk AS phenotype by incorporating measures of ventricular involvement and other potentially important variables (i.e., pulmonary vascular function) to improve post-AVR mortality and morbidity. Casaclang-Veroza et al. used topological data analysis (TDA) on 284 patients with mild to severe AS and corresponding echocardiography to define a Reeb graph12. The model showed that the progression of moderate AS to the severe phenotype follows an inconsistent pattern and instead derived multiple groups based on LV function. Moderate AS patients with low LVEF and higher LV mass progressed more quickly to severe AS than those with high left ventricular ejection fraction (LVEF) and low LV mass, even when they had a higher incidence of coronary artery disease. A validation mouse model confirmed the impact of LV mass and LVEFs in AS progression.

A similar question was explored by Sengupta et al. to use TDA on an expanded dataset of 1052 patients with AS varying severities: mild, moderate, severe, and discordant AS13. Five key echo variables were used as input: aortic valve (AV) area indexed to body surface area (BSA), LVEF, AV mean gradient, stroke volume (SV) indexed to BSA, and AV peak velocity. 99% of severe AS patients were classified as such by the algorithm. Notably, in the discordant group, TDA identified 64% as high severity, suggesting potential gaps in our current diagnostic system. The model’s groupings were predictable by a supervised classifier, which achieved an AUC of 0.988 on the new classes. The high-severity group identified by this model showed a 15x increase in progression to TAVR and a 3x increase in mortality, a trend consistent even in non-severe and discordant patient classifications. This study expanded on Casaclang’s work to show how incorporating new parameters can identify previously discordant patients with a high risk of rapid disease progression.

Aside from echo parameters, Kwak et al. emphasized the role of clinical variables in classifying moderate to severe AS patients (n = 398). Using Pearson coefficients and Bayesian criteria, they reduced 32 variables to 11, comprising six clinical and five imaging parameters10. Three distinct patient clusters were recognized in a validation set using agglomerative clustering and Gaussian finite mixture models. Their least healthy cluster, Cluster 1, as defined by reduced LV systolic function, cluster 2 was elderly with more comorbidities, and the third cluster was considered “Healthy AS.” When these cluster assignments were integrated into a 3-year mortality prediction model, a significant net classification improvement of 0.294 (P = 0.032) was noticed. Unique to the previous papers, Kwak et al. expanded to include clinical parameters to improve mortality. They also reemphasized the importance of monitoring cardiac dysfunction for disease severity, just like Casaclang and Sengupta.

Sen et al. analyzed a large cohort of moderate AS patients (n = 2469) and evaluated five different clustering algorithms to explore 54 demographic, clinical, and echo parameters21. They used the Davies-Bouldin metric to evaluate the goodness of each algorithm by cluster compactness and separation of clusters, favoring partition of medoids (PAM). PAM produced four clusters– low-risk, calcified-valve, low-flow and cardiovascular-comorbid–externally validated in an independent Australian registry (n = 1358). Their primary outcome was a composite of cardiac death, HF hospitalization or AVR. The two groups that showed any significant increase in the composite score compared to the low-risk group were the cardiovascular-comorbid group (HR 1.60, 95% CI [1.18–2.16]) and the low-flow group (HR 1.50, 95% CI [1.15–1.96]). In secondary analysis early AVR (≤1 year) most benefited the calcified-valve and cardiovascular-comorbid phenotypes, highlighting how integrating comorbidity with valve and ventricular metrics refines risk stratification and guides intervention timing in moderate AS.

A different approach to data interpretation was proposed by Lachmann et al. to further eliminate human intervention8. They employed a pre-trained convolutional neural network (CNN) on 366 patients with severe AS. The CNN was pre-trained on a large unrelated ImageNet dataset and then applied to compress echo Doppler readings into a 1-dimensional vector representation. They used K-means clustering to cluster 112 patients from the group with high-quality Doppler data. Using an ANN, the identified clusters could be assigned for the remaining 265 patients with sub-optimal Doppler data, achieving an impressive prediction accuracy of 97.5%. By embracing transfer learning, this study enabled the algorithm to capture relevant inputs that might be overlooked in human-driven data simplification, offering a crucial avenue to leverage limited medical data effectively.

These studies focus on expanding the number of variables to improve diagnostic accuracy in AS to understand disease progression. The first three articles all emphasize variables of LV dysfunction while including varying other echo and clinical variables.

Assessing recovery post-AV intervention

The next factor addressed is which patients will respond to therapy, enabling us to provide early treatment before disease progression. In the case of AS, the primary intervention is aortic valve replacement (AVR), either surgical or transcatheter. Lachmann et al. used agglomerative clustering on 366 severe AS patients post-transcatheter AVR (TAVR) with 12 combined echocardiographic and right heart (RH) catheterization variables22. They included patients with statistically similar demographic characteristics, including age, gender, and BMI, who showed different outcomes post-TAVR. Notably, the healthiest cluster had preserved LVEF and the absence of pulmonary hypertension (HTN). At the same time, the most vulnerable group demonstrated both LVEF reduction and pulmonary HTN, resulting in 2-year survival post-TAVR: 90.6% and 74.9%, respectively. In the same group of patients, Lachmann et al. aimed to understand the recovery of different echo parameters after TAVR7. They observed that TAVR improved LH problems. However, the RH function showed little improvement. This explained why their initial algorithm diagnosed patients with RH problems as the most high-risk group.

Bohbot et al. also explored post-AVR survival in severe AS using a multicenter study design. Their cohort included a training group (n = 613) and a validation group (n = 1303) that was collected pre-AVR and was followed to observe survival following AVR over the next 5 years11. They applied hierarchical clustering to define clusters and K-nearest neighbors (KNN) for cluster assignment on a validation set. The algorithm proposed four clusters that represented a continuous spectrum of disease progression. They defined this spectrum through myocardial changes causing systolic and diastolic dysfunction, which is initiated in the LH and extended to the RH. Further re-emphasizing the importance of ventricular remodeling and, in particular, RH function instead of valvular afterload to post-AVR recovery. This “myocardial continuum” concept highlighted the disease’s progression and its effect on outcomes, correlating with 5-year all-cause mortality survival trend and a hazard ratio (HR) of 2.18 (1.46–3.26) when comparing the cluster with RH involvement (i.e., the least healthy cluster) to the healthiest phenotype.

Kusunose et al. looked at a similar cohort of 1365 severe AS patients undergoing TAVR across 17 centers and identified 3 clusters. Cluster 1 was associated with elderly patients with high aortic valve gradients and LV hypertrophy, indicating more advanced disease stages and remodeling. Cluster 2, in contrast, consisted of patients with preserved LVEF, larger AVA, and higher blood pressure exhibiting better post-TAVR outcomes. Cluster 3 represented the most vulnerable phenotype with low-flow/low-gradient AS, tachycardia, and inferior vena cava (IVC) dilation, demonstrating the highest mortality and major adverse cardiovascular events with an HR of 4.18 (1.76–9.94). This study put more importance on LH findings than the previous two studies but still included measures of RH impact, such as increased IVC volume.

These studies show that once ventricle dysfunction occurs, outcomes worsen despite intervention. To improve outcomes, we must develop algorithms that identify high-risk patients early on the remodeling continuum, mainly before RH remodeling occurs. This will provide patients with the best chance of recovery with valve repair (Table 1) (Fig. 1).

Fig. 1: Unsupervised ML and Aortic valve stenosis.
figure 1

Unsupervised ML and Aortic valve stenosis. By incorporating multiple clinical and echocardiographic parameters into ML models, high risk phenotypes were identified such as low LV ejection fraction, LV mass, increased age, low flow-low gradient variant, high burden of co-morbidities as well as characteristics of echo Doppler data. Bi-ventricular remodeling presenting pulmonary hypertension, IVC dilatation, and reduction in LV systolic function were associated with poor outcomes post TAVR.

Table 1 Overview of algorithms applied to Aortic valve stenosis

Mitral regurgitation

The mitral valve (MV) requires synchronized coordination of multiple parts to perform valvular function23. Disruptions of this framework can introduce abnormalities in MV function, leading to MR, which has an estimated prevalence reaching close to 10 percent in patients greater than 75 years of age in the United States24. MR without proper intervention can lead to a vicious cycle causing heart failure and eventual death; 1-year mortality can reach close to 60%25. It is broadly categorized as primary MR, due to structural abnormalities of the mitral valve apparatus, and secondary MR, which results from left ventricular dysfunction without intrinsic valve disease. The advent of percutaneous intervention options has significantly improved treatment prognosis for both primary and secondary MR26.

Primary mitral valve regurgitation and optimal timing of mitral valve surgery

In primary MR, several studies have used unsupervised ML to refine surgical risk stratification beyond conventional guideline metrics. The optimal timing of MV surgery is poorly understood because of the heterogeneous nature of MR pathology27. Current guidelines have an oversimplified approach that limits analysis to a few clinical or echocardiographic parameters28. ML algorithms can incorporate additional parameters compared to contemporary guidelines to provide an improved disease understanding29.

Huttin et al. developed an ML model that created four homogenous echocardiographic-based clusters from 429 patients with mitral valve prolapse (MVP) and without prior history of cardiac surgery to determine their progression for myocardial fibrosis using cardiac MRI and predict adverse events30. The three most important variables for predicting cardiovascular events were LV systolic strain <21%, indexed left atrial (LA) volume <42 ml/m2, and severity of MR. Clusters 3 and 4 had the highest burden of myocardial fibrosis with LV dysfunction and LA remodeling. Focusing on clusters 3 and 4, the main difference was that cluster 4 had a global longitudinal strain (GLS) below 21%. The authors emphasized that low GLS better predicted adverse events than LVEF. Similarly, Bernard et al. utilized concomitant unsupervised and supervised ML algorithms in 400 patients with primary MR to create high-risk (HR) and low-risk phenotypes and to determine which patients may benefit from surgical intervention31. The most significant factors affecting MR severity identified by their algorithm were LV end-diastolic volume, LV end-systolic volume, E/e’, MR volume by PISA, and interventricular septal diameter, which were all elevated in the high-risk group. The high-severity phenotypes had improved survival following surgical intervention (P = 0.047), while the low-severity phenotypes had no improvement (P = 0.7). These findings were not significant in the external validation cohort, but the sample size was smaller (P = 0.20 and P = 0.5).

Risk stratification as well as optimal timing for MR intervention is challenging due to its heterogeneity. Symptoms at presentation do not consistently correlate with MR severity or surgical outcomes. Current guidelines focus on limited echocardiographic markers like LV end-systolic diameter and LVEF but do not account for LV or LA remodeling. The lack of response to surgery in some patient groups could be linked to the absence of clear understanding of underlying cardiac pathology. ML algorithms offer the potential to refine patient selection, identify those likely to benefit from surgery, and support clinical decision-making by flagging high-risk cases.

Pimor et al. utilized phenomapping in 122 patients with primary MR to create clusters to identify factors affecting postoperative atrial fibrillation (AF) and complications following MV surgery32. Among the three isolated phenogroups, phenogroup three was found to have a high incidence of postoperative AF (HR = 4.75, P = < 0.001) and cardiac events (HR = 3.57, P < 0.001). Phenogroup 3 was defined by changes in echocardiographic parameters such as increased LA volume, increased E/e’ ratio, and LA peak systolic strain. Abnormal LA peak systolic strain reflects LA dilation, which could explain the higher incidence of atrial fibrillation. Like Huttin et al. and Bernard et al., Pimor et al. found the LA peak systolic strain and LV diastolic dysfunction to be superior predictors of adverse events following MVR than LVEF. LVEF may remain normal despite subclinical myocardial dysfunction and may only decrease when myocardial damage is irreversible, causing a higher incidence of postoperative AF. By incorporating various echocardiographic parameters with ML algorithms, we can selectively perform earlier patient interventions before uncorrectable changes.

Choi et al. applied TDA on 850 patients with moderate to severe primary MR, identifying three distinct phenotypes33. Group B consisted of older patients with diastolic dysfunction, while Group C included those with both significant systolic and diastolic dysfunction alongside advanced LV remodeling. Group C showed higher long-term mortality (P < 0.001). The origin of diastolic dysfunction remains unclear, potentially stemming from other comorbidities, chronic MR, or a combination. They highlighted that diastolic dysfunction may be as crucial as LV dilation in the progression of primary MR. Still, there is a need for further research to optimize the timing of MV interventions.

Secondary Mitral valve regurgitation and response to intervention

Secondary MR is caused by LV dysfunction in the absence of any structural valve defect. Multiple mechanisms based on degree and proportion of enlargement of left sided cardiac chambers contribute towards severity of MR. The natural evolution of secondary MR is poorly understood. ML algorithms have the potential to identify patients who will benefit from percutaneous MVR.

Bartko et al. explored the role of hierarchical clustering and principal component analysis (PCA) to delineate various morphological features of 381 patients with MR secondary to congestive heart failure (CHF) with reduced ejection fraction (HFrEF)34. Cluster 1 and 2 were patients with small LA/LV size and minimally increased LA/LV size respectively. Cluster 3 corresponded to discordant remodeling with small LA and large LV, and Cluster 4 belonged to the uniformly increased LA/LV size. Severe MR was more prevalent in Clusters 3 and 4. They identified that if atrial indices were disproportionate to LV indices, it greatly affected mortality. as seen in Cluster 3 compared to other clusters (HR = 2.18, p = 0.002). The benefits of valve intervention are likely dependent on the severity of MR along with underlying myocardial impairment, and interaction of various indices and thus appropriate phenotyping of patients is crucial.

Layoun et al. examined the role of hierarchical clustering in 257 patients to evaluate the natural progression of secondary MR in two clusters of patients35. LA volume, LV end-diastolic volume, LV end-systolic volume, EF, RV size and function, systolic pulmonary artery pressure, left atrial strain, and LV volume strain were used to isolate clusters. Cluster 1 had lower EF and LA strain with higher LV and LA volume than Cluster 2, leading to a slower progression to severe MR. At the onset of severe MR, cluster 2 had similar LA volume and strain and less proportionate secondary MR. Despite these differences, no significant differences in mortality were found between the clusters. Cluster 2 had a higher incidence of MV intervention (P = 0.046), while Cluster 1 had more ventricular-directed therapies (P = 0.01, P = 0.08). Both clusters showed similar cardiomyopathy distributions, suggesting the etiology did not influence the progression pattern. This indicates that the mechanism, rather than the origin, drives the development of secondary MR patterns. The primary difference between the clusters was the MR severity relative to ventricular volumes, aligning with proportionate versus disproportionate MR. Cluster 2, characterized by rapid progression, had reduced ventricular size, increased ejection fraction, and a significantly higher EROA/LVEDV ratio (0.3 vs. 0.2 mm/ml, P < 0.001), indicating disproportionate MR and potentially explaining the higher rate of MV interventions. In contrast, Cluster 1 exhibited slower progression with proportionate MR, responding better to ventricular-based therapies.

Trenkwalder et al. utilized a combination of hierarchical clustering and artificial neural networks (ANN) to uncover unique phenotypes from severe MR patients undergoing transcatheter edge-to-edge repair (TEER) in 609 patients, and the findings were externally validated in 817 patients in 2 separate centers36. Their clustering algorithms were derived from eight echocardiographic variables such as left ventricular end-systolic diameter (LVESD), right mid-ventricular diameter, left atrial volume, right atrial area, Tricuspid annular plane systolic excursion (TAPSE), MV effective regurgitation orifice area (EROA), LVEF, pulmonary artery pressure (PAP) and underlying comorbidities. They isolated 4 clusters, which were characterized by isolated mitral valve disease (cluster 1), developing pulmonary hypertension (cluster 2), biventricular failure with functional impairment (cluster 3), and biatrial dilation (cluster 4). MV TEER significantly reduced PAP and improved survival in cluster 1 but did not improve outcomes in cluster 4 due to significant diastolic dysfunction. Similar clustering and survival outcomes were present in the externally validated cohorts. Unlike Bartako et al. and Layoun et al., Trenkwalder et al. showed secondary MR requires a broader perception beyond lesion-based mechanisms and must incorporate various aspects of cardiopulmonary function, such as underlying comorbidities. The high mortality evidenced in clusters 3 and 4 was likely exacerbated by worsening renal function and anemia. Present risk score algorithms such as Euroscore II reveal suboptimal performance for predicting long-term mortality, as seen in cluster 4. Future ML pipelines can improve risk-stratifying paradigms by analyzing the intricate relationship between various echocardiographic variables and underlying comorbidities on MV structure and function (Table 2) (Fig. 2).

Fig. 2: Unsupervised ML and Mitral valve regurgitation.
figure 2

Multi-component unsupervised machine algorithms are utilized in patients with primary and secondary mitral valve regurgitation (MR). In primary MR along with severity of regurgitation, left atrial size and strain as well as presence of LV diastolic dysfunction identified patients progressing to myocardial fibrosis and LV systolic dysfunction and predicted worse outcomes. In patients with secondary MR, ML algorithms have identified patients who would benefit from trans-catheter treatment.

Table 2 Overview of algorithms applied to Mitral valve regurgitation

Tricuspid Regurgitation

Tricuspid Regurgitation (TR) is becoming more prevalent as the population ages, with nearly 3% of individuals over 65 diagnosed with TR37. Currently, TR is categorized into four distinct types: “primary,” which results from a dysfunctional tricuspid valve; “ventricular functional,” arising from an enlarged right ventricle (RV) due to left-sided heart failure or RV issues; “atrial functional,” stemming from right atrial dilation; and “cardiac implantable electronic device-related”38,39. There is a growing interest in refining patient stratification, as early intervention proves most beneficial, especially with increased use of transcatheter TV treatments, and untreated TR is correlated with a higher risk of mortality40,41.” As increasing data pertaining to tricuspid valve therapies surfaces, a better understanding of its pathogenesis through ML models is critical.

In a retrospective study, Anand et al. examined 13,611 patients with moderate to severe TR, regardless of etiology, observed over 6.5 years42. Applying a hierarchical clustering algorithm to clinical, echocardiographic, and laboratory data (including liver functional enzymes, creatinine, and BNP), they condensed 38 variables to 26, creating five patient pheno-clusters. Pheno-clusters 1 and 2 were primarily related to primary TR causes, with cluster 1 patients showing better survival due to less RV enlargement and fewer comorbidities. This cluster included 8.8% of severe TR cases, revealing a potential misclassification. Cluster 2, characterized by significant RV enlargement, displayed an HR of 2.22. Clusters 3 to 5 were characterized by secondary TR associated with pulmonary disease, coronary artery disease, and chronic kidney disease, respectively, with corresponding hazard ratios of 2.45, 2.19, and 3.48. While clusters 3 to 5 had alternative mortality causes, clusters 1 and 2 showed the importance of RV enlargement for understanding TR pathophysiology and mortality impacts.

Vely et al. focused on secondary TR to understand how RH involvement impacts disease severity43. They studied patients with severity grade ≥3, including classifications of severe (grade 3), massive (grade 4), or torrential TR (grade 5)44. They trained another hierarchical clustering algorithm on 92 patients, using 23 echo parameters as inputs, which identified three clusters based on right atrium (RA) and RV function. Cluster 1 consisted of patients with normal RV function despite high TR severity, including 32.1% with grade 5 TR. In contrast, clusters 2 and 3 included patients with progressively impaired RV function and combined RA and RV dilation, respectively. Notably, no grade 3 TR patients were classified into the impaired RV or RA/RV function clusters, indicating a potential overestimation of TR severity in the new TR gradation scheme. Although the mortality rates were 39.6% in Cluster 1, 40.0% in Cluster 2, and 71.4% in Cluster 3, these differences did not reach statistical significance due to the small cohort size. A 149-patient validation cohort confirmed the three clusters and their association with RA and RV dysfunction—the importance of RH dysfunction aligned with the outcomes in Anand et al. 42.

Building on these findings, Badano et al. applied a K-means clustering algorithm to a larger cohort of 558 patients with moderate to severe secondary TR, incorporating both demographic and echocardiographic data45. They identified three distinct clusters, each differentiated by the degree of RV dysfunction, leading to significantly different 2-year event-free survival outcomes. The low-risk group had preserved RV size and function; the intermediate-risk group included older patients with dilated but preserved EF (HR 2.20; 95% CI, 1.44–3.37); and the high-risk group consisted of younger patients with dilated, dysfunctional RVs (HR 4.67; 95% CI, 3.20–6.82). These findings reinforce the prognostic value of RH remodeling and complement Vely et al.’s earlier observations.

However, unsupervised ML might not always be able to discern meaningful relationships. In Rao et al.‘s retrospective study on 2379 patients with a first-time diagnosis of severe TR, they explored both unsupervised and supervised machine learning techniques9. Unlike the studies above, their unsupervised approaches failed to identify distinct clusters, and instead, they had to rely on the supervised survival trees method. This approach used 38 input echo and clinical variables and trained the algorithm on death and heart failure hospitalization (HFH) outcomes to find four pheno-clusters. Specifically, cluster 1, characterized by compromised RV/LV function, had a 91.7% (82.3, 93.6) event rate. Cluster 2, with advanced kidney/liver conditions, registered an 84.0% (81.0, 86.5) event rate (Table 3). Cluster 3, primarily of female patients with LV/RV dysfunction, showed a 67.7% (61.3, 73.3) event rate. Cluster 4, the healthiest cluster, exhibited preserved RV function leading to a 44.4% (37.8, 50.8) event rate (Table 3). Notably, liver and kidney functions, along with LV/RV dysfunction, are crucial determinants of tricuspid outcomes (Table 3) (Fig. 3).

Fig. 3: Unsupervised ML and Tricuspid valve regurgitation.
figure 3

Unsupervised machine learning models in patients with severe tricuspid regurgitation identified a continuum of cardiac involvement with the progression of disease, leading to higher mortality. Multiple factors including RV enlargement, pulmonary disease, liver impairment and CDK defined high risk phenotype.

Table 3 Overview of Algorithms Applied to Tricuspid Valve regurgitation

Discussion

Although unsupervised ML provides limitless opportunities for expanding our insight within cardiology, it is not without its accompanying limitations46. Unlike supervised learning, this algorithm can decipher patterns in data without labeled input. From a practical standpoint, it is more interested in creating a hypothesis than proving one. Thus, each algorithm’s results must be carefully assessed for clinical relevance and utility demanding greater clinician involvement to comprehend and evaluate the output46.

The principles underlying clustering and data presentation can also be tricky in unsupervised ML. Practical issues compound the interpretability problem: models weigh variables strictly by their numerical magnitude, so the choice of units (e.g., centimeters vs meters) can lead to vastly different results. Thus, standardization and centering of units is required to improve model function and discover better correlations between different variables47. Processing is also computationally expensive compared to other algorithms as exploring many possible partitions without labels increases run-time and inflates the risk of false discoveries. Reducing that risk requires larger, better-curated datasets or continuous data “feeding,” resources that remain out of reach for many smaller institutions48.

Data sharing between centers can be complex due to multiple institutional review board approvals and maintaining HIPAA compliance49. Furthermore, cardiovascular imaging information is stored in multiple formats, including picture archive communication systems (PACS), digital imaging communications in medicine (DICOM), or other imaging formats. Each institution has individualized protocols for storing information for various cardiovascular imaging modalities. More universal or standardized approaches towards sharing and storing information46 can foster greater growth and progress for unsupervised learning.

To accelerate field advancement, despite these obstacles, we want to propose a standardized methodology for the future development of unsupervised ML models in VHD. Models should be developed on heterogeneous multi-center datasets to be generalizable. After training, external validation cohorts should be analyzed using supervised learning that predicts the previously identified clusters on a new dataset. Finally, we need to organize randomized control trials (RCTs) to test outcomes compared to current state-of-the-art heuristics (Fig. 4). Because ML models improve as data accumulate, future stratification systems should be built for continual updating as new images, laboratories, and outcomes enter the registry.

Fig. 4: Schematic of the appropriate workflow for future studies in VHD to develop unsupervised learning algorithms.
figure 4

Schematic of appropriate workflow for future studies in valvular heart disease starting from unsupervised machine learning with the second step of interval validation, followed by supervised machine learning and finally randomized controlled trials.

Many stratification tools explored in this paper have been trained on small, homogenous datasets. To establish a holistic stratification system, we must apply these techniques across more expansive datasets representative of our diverse population, aiming for a system capable of risk stratification for the general population. VHD has increased prevalence among individuals from lower socioeconomic backgrounds40. Our goal should be an inclusive development trajectory for the stratification system, ensuring equitable representation. Studies indicate discrepancies in prevalence when relying on medical records versus echocardiographic data for different populations41,50.

Our review found no RCTs that analyzed the benefits of these algorithms. As improved ML approaches are applied to larger datasets, we need to use RCTs to provide a better understanding of the efficacy of these ML-derived groupings. Many studies referenced in our paper leveraged retrospective datasets for algorithm training, and the transition to prospective study populations will showcase how these systems compare to existing heuristics.

Generative artificial intelligence (AI) has opened another option for exploring complex disease stratification. Generative AI is trained on sequential datasets (most frequently language) and are reinforced by predicting the next datapoint (i.e., word) in a sequence. Given the immense amount of data they are fed, they can process imaging, clinical, and laboratory data and integrate that with information and training data. Applications in the medical field have been limited because of a tendency to confabulate and limited interpretability compared to unsupervised models discussed in this paper. That said, recent advancements in VHD, such as EchoCLIP, have shown promising potential51. We anticipate that such models will continue to improve and increasingly support risk stratification across a range of diseases52,53.

Currently, the primary treatment for VHD is valve replacement. However, significant advancements are being made in pharmacotherapy and earlier transcatheter interventions. For example, in MV disease, blocking angiotensin II receptors may limit aortic dilation and the progression of MV prolapse54. Additionally, serotonin receptor signaling is a target for secondary MR, among others55. Other advances are being made to engineer heart valves that can grow and remodel with the human heart enabling earlier interventions not constrained by valve life expectancy56,57,58. ML-driven phenotyping has proven useful for disease stratification and identifying patients who may benefit from early intervention. As more treatment options become available, unsupervised ML has the potential to interpret and optimize complex therapeutic pathways tailored to each patient’s unique profile. Integrating these ML algorithms with molecular insights from multiomics offers an enhanced understanding of VHD and can even accelerate the discovery of pharmacotherapy targets in the early stages of disease progression leading to personalized, preventative care strategies.

In summary, VHD remains a common cause of mortality and morbidity. Unsupervised ML represents a potential advancement in the management of VHD by offering new, individualized avenues for patient stratification and early intervention that will only increase as new data becomes available. By analyzing diverse data inputs such as clinical, imaging, and laboratory parameters, these algorithms move beyond traditional human-driven heuristics to provide a more comprehensive understanding of disease progression. It can identify high-risk patients earlier, enabling timely and targeted therapies that could prevent irreversible cardiac damage before symptoms begin. As ML can scale with the availability of additional data, it will also be able to incorporate future therapeutic advancements, allowing for more personalized treatment plans. To maximize clinical impact, algorithms must be trained and validated on large, multicenter cohorts and tested prospectively against current standards. Continuous refinement of these models will improve predictive accuracy, driving a shift toward precision medicine in VHD management, and offering a promising future for improved patient outcomes.

Method

This scoping review was compiled in accordance with the PRISMA Scoping Reviews checklist (Supplementary Table and Supplementary Fig.)59. It is intended to map current knowledge and identify gaps in the use of unsupervised ML for management of VHD to lay the groundwork for future multicenter prospective trials.

A comprehensive PubMed search from the database inception through May 2025. Three separate MeSH strings were designed for the valves of interest:

  1. 1.

    Aortic stenosis: (Aortic Stenosis) AND (Cluster OR Network OR TPD OR Unsupervised Learning) AND (Echocardiography OR heart catheterization OR Computed Tomography). A total of 9/262 articles were identified that met the criteria for inclusion in the review.

  2. 2.

    Mitral regurgitation: (Mitral Regurgitation) AND (Cluster OR Network OR TPD OR Unsupervised Learning) AND (Echocardiography OR heart catheterization OR Computed Tomography). A total of 7/187 articles were identified that met the criteria for inclusion in the review.

  3. 3.

    Tricuspid regurgitation: (Tricuspid regurgitation) AND (Cluster OR Network OR TPD OR Unsupervised Learning) AND (Echocardiography OR heart catheterization OR Computed Tomography). A total of 5/95 articles were identified that met the criteria for inclusion in the review.

We included original research that employed an unsupervised ML method and focused on AS, MR, and TR. We excluded supervised ML approaches and manuscripts where the primary focus was heart failure or atrial fibrillation, although they used unsupervised learning and referenced VHD while not primarily exploring the topic.

Two reviewers independently screened study titles and abstracts, which were initially reviewed, and the full text of relevant studies was reviewed in detail with a senior author for final inclusion. For each study, we included data on study population, algorithmic input parameters, ML algorithm, goals, and outcome of the study. Extracted data were tabulated to facilitate narrative synthesis.