Artificial intelligence improves detection and classification of pulmonary venous hypertension related to left ventricular diastolic dysfunction by chest radiography

White, Richard D.; Demirer, Mutlu; Sebro, Ronnie A.; Cortopassi, Isabel O.; Stowell, Justin T.; McCann, Matthew R.; Barry, Timothy; Appleton, Christopher P.; Helgeson, Scott A.; Erdal, Barbaros S.

doi:10.1038/s41598-025-22026-x

Download PDF

Article
Open access
Published: 31 October 2025

Artificial intelligence improves detection and classification of pulmonary venous hypertension related to left ventricular diastolic dysfunction by chest radiography

Richard D. White^1,2,3,
Mutlu Demirer¹,
Ronnie A. Sebro¹,
Isabel O. Cortopassi²,
Justin T. Stowell²,
Matthew R. McCann²,
Timothy Barry⁴,
Christopher P. Appleton⁴,
Scott A. Helgeson⁵ &
…
Barbaros S. Erdal¹

Scientific Reports volume 15, Article number: 38181 (2025) Cite this article

2326 Accesses
1 Citations
Metrics details

Subjects

Abstract

Isolated-Left Ventricular Diastolic Dysfunction [LVDD] ranges (and may progress) from preclinical asymptomatic, symptomatic-LVDD, to LVDD-predominate Heart Failure [HF] presentations; if recognized early, LVDD progression might be preventable. Current early-HF screening remains limited, providing opportunities for insights from a standard Chest X-Ray [CXR]. While CXR assessment for “pulmonary congestion” supports suspected-HF evaluation in evidence-based guidelines, the potential for systematic Pulmonary Venous Hypertension [PVH]-Staging to contribute to initial detection and scaling of LVDD is unclear. This study compared CXR-based PVH-Staging to Doppler Echocardiography [DEcho]-based LVDD-Grading in the absence of systolic dysfunction. Questions included: (1) With PVH-Staging performed by cardiothoracic radiologists, what intra-/inter-reader variabilities remain? (2) Does PVH-Staging track LVDD-Grading? and (3) Can AI-assisted PVH prediction of LVDD-Grade match human performance? CXR examinations of 1,682 (including 750 asymptomatic/healthy) subjects, without: (1) Anatomical/physiological confounders of DEcho or CXR examinations (≤ 24 h apart), and (2) AI model-training confounders, were independently assigned 1 of 11 (9 PVH-related) Pulmonary Vasculature Patterns [PVPs] by 4 cardiothoracic radiologists and repeated for reliability evaluation. Expert-consensus Human Ground Truth [HGT] PVH PVPs were correlated with LVDD Grades (0 to 3–4), as were PVH-Rank predictions by a transformer-based AI model [“PVPI”]. Despite experience-dependent intra-/inter-reader reliability in PVP assignment, there was significant (p < 0.001) overall consistency. With increasing HGT PVH Stage, a significant (p < 0.001) trend towards increasing LVDD Grade was found; while PVH-Staging achieved confidence backing Grade 0/No LVDD, confident LVDD Grade recognition was not achieved until Grades 3–4/Restrictive Filling. However, a significantly (p < 0.001) stronger incrementally positive trend in PVPI PVH-Ranking with LVDD-Grading was demonstrated. Although validated, PVH-Staging for LVDD-Grading is limited by reader variabilities. AI-assisted PVH-Ranking may facilitate earlier and widespread objective CXR screening for LVDD which is ubiquitous in HF.

Introduction

Dysfunction of the left ventricle during the typically longer diastolic filling phase of the cardiac cycle (from impaired muscle relaxation and/or chamber distention) is a pathophysiological manifestation of multiple conditions^1,2. Left Ventricular Diastolic Dysfunction [LVDD] is ubiquitous in Heart Failure [HF] regardless of the LV Ejection Fraction [LVEF] during the systolic contraction phase^1,2,3, and it plays a greater role than systolic dysfunction in determining baseline clinical capabilities and exercise-intolerance^1,4.

From both functional and clinical standpoints, isolated-LVDD can precede HF. Its severity ranges from: (1) Preclinical asymptomatic, but at-risk (Stage A⁵) or mildly dysfunctional (Stage B⁵, found in 20–30% of general population^6,7,8,9,10, to (2) Symptomatic-LVDD (Stage C⁵, especially exercise-related^6,11, to (3) LVDD-predominate HF (Stage D⁵, with morbidity and mortality resembling systolic HF^6,12). The potential for progression along this pathophysiological and prognostic spectrum is well-recognized^{6,7,8,10,11,13,14,15}.

Preclinical LVDD is associated with aging¹⁶ (especially in individuals ≥ 50 years old and/or of female sex^7,14,17,18. These initially subclinical changes (affecting almost 40% of elderly population¹⁹ and the prospects for symptom development (e.g., dyspnea on exertion)^6,7 are amplified in persons with modifiable predisposing factors (e.g., hypertension⁷, obesity², diabetes⁷, renal failure²⁰.

At the other extreme, LVDD is an important component of HF with preserved LVEF ≥ 50% [HFpEF]¹², an entity accounting for more than half of newly diagnosed HF cases. HFpEF diagnosis relies on evidence of: (1) HF-related signs and symptoms from high mean Left Atrial Pressure [mLAP] with “pulmonary congestion”^6,21; (2) high LV End-Diastolic Pressure [LVEDP] (at rest or only during exercise^6,22 from invasive Right Heart Catheterization [RHCath]^12,23 or non-invasive transthoracic Doppler Echocardiography [DEcho]^24,25,26; and/or (3) elevated serum B-type Natriuretic Peptide [BNP] levels^12,21. Due to LVDD, the following may result: 1. LA enlargement with dysfunction; 2. Pulmonary Venous Hypertension [PVH], potentially achieving isolated post-capillary Pulmonary Hypertension [PH], due to LA-PV continuity; and 3. Combined pre-/post-capillary PH [CpcPH], causing vascular remodeling and eventually failure of the Right Ventricle [RV]^27,28.

RHCath-based increased mean Pulmonary Capillary Wedge Pressure [mPCWP] (a surrogate for directly measured high LVEDP or mLAP^1,4,29,30 indicates the cumulative hemodynamic burden of LVDD on both LA operating compliance^12,31 and atrioventricular coupling^23,32 causing mLAP elevation. Thus, while LVDD-related increases in LVEDP primarily reflect high LV preload (input volume or pressure) and reduced LV diastolic operating compliance (myocardial relaxation and inherent viscoelastic stretching)^1,4,23,33, increased mLAP and mPCWP signify its net hemodynamic impact directed retrograde into the pulmonary circulation, including the tendency for PVH^4,24,34,35. DEcho now dominates LVDD evaluations, with various parameters correlating closely with elevated mLAP predisposing to PVH²⁶; although a multi-parameter DEcho examination delineates LVDD severity, no single parameter reliably indicates elevated LVEDP or mLAP levels per individual^24,26.

Standard screening algorithms for the initial recognition of HF-related LVDD have relied on clinical history, physical findings, electrocardiographic changes, and BNP levels, but to a much lesser extent Chest X-Ray [CXR] results^21,35. Nevertheless, despite very limited objective supportive evidence, a CXR has been consistently included in initial diagnostic approaches to suspected-HF based on expert consensus, with appropriateness justified by perceived value in: (1) Identifying other causes (e.g., emphysema) of HF-like symptoms; and (2) Recognizing undefined “pulmonary congestion”^{21,35,36,37,38}.

CXR “pulmonary congestion” has been evaluated using systematic PVH-Staging (based on active superior redistribution or central distension of pulmonary vasculature, progressing to superimposed interstitial followed by alveolar pulmonary edema)^39,40,41. PVH-Staging has been validated against catheter-determined increased mPCWP (or infrequently LVEDP) within a range of pathophysiological conditions (varying LV-preload, LA-operating, and/or LV-operating conditions or with confounding factors) with inconsistent corroborative results and without definitively establishing a direct relationship with LVDD^{40,42,43,44,45,46,47,48,49}. Due to considerable CXR-interpretation variability, especially for characteristics of mild PVH^50,51,52,53, consensus readings have been proposed^43,44,53.

Thus, CXR-based PVH-Staging has yet to: (1) Be confirmed as directly indicating LVDD, the universal trait of HF^1,2,3, especially in isolated-LVDD where ventricular size is relatively normal⁵⁴ and unlikely a HF marker⁵⁵; or (2) Address the appearance of PVH with secondary PH^27,56,57. In addition, reported PVH-Staging results have not accounted for co-existent confounding causes of LVDD (e.g., conduction delay)^{55,58,59,60,61}, on one hand, or lung conditions (e.g., fibrosis) altering pulmonary vasculature⁶², on the other. Last, PVH-Staging has relied on subjective CXR evaluations leading to intra-/inter-reader variability without delineation of expectations for interpreter qualifications influencing diagnostic success^43,63,64.

Hence, the potential for CXR-based PVH-Staging to make a significant contribution to current diagnostic algorithms^35,36 or scoring systems^21,24,65 for the initial exclusion, versus detection with scaling, of LVDD has remained unanswered; ongoing limitations in screening approaches^36,66 indicate persistent enhancement opportunities, possibly using PVH-Staging. It is particularly opportune, considering that a CXR remains: (1) The most widely accessible and frequently performed imaging examination worldwide; and (2) Inexpensive; these two attributes combined could facilitate early and widespread community-based application of PVH-Staging in suspected-HF, and possibly HF at-risk, cases. Also, despite regional differences in Cardiothoracic Radiology expertise, digital CXR formats and Artificial Intelligence [AI] processing now support more uniform and less human-dependent PVH-Staging^67,68. While CXR-dependent AI models may help predict an elevated BNP level in CXR-identified congestive HF⁶⁹ or help detect an elevated mPCWP value⁷⁰, AI prediction of LVDD absence, versus its presence and/or severity independent of co-existent pathophysiological confounders^{55,58,59,60,61}, has not been described.

Therefore, considering the worldwide epidemic of pre-HF and HF cases^8,9,71, especially those emphasizing LVDD^{6,7,8,10,11,13,14,15} that, if recognized early (along with associated risk factors), might have its progression prevented or reversed^2,28,72, optimized timely and widespread LVDD recognition is paramount. To that end, this study was designed to elucidate the potential for CXR-based PVH-Staging to identify levels of isolated-LVDD. Prevailing questions included: (1) When PVH-Staging is performed only by cardiothoracic radiologists, what intra-/inter-reader reliabilities remain and do they relate to reader experience?; (2) Does expert PVH-Staging confirm LVDD absence or, when detected, correctly track LVDD severity?; and (3) Can AI-assisted PVH predictions of LVDD degree at least match human performance, while eliminating human interpretation-reliability issues?

Results

Study population

The Study Population consisted of 1,682 subjects with or without suspected LVDD [Table 1] in the absence of possible anatomic or physiological imaging confounders or possible confounders of AI-model training (described in detail as exclusion criteria under Methods).

Table 1 Study population.

Full size table

Suspected-LVDD subjects

The 846 subjects with characteristic symptoms (e.g., dyspnea on exertion) had undergone within the same 24-hour period: (1) a DEcho examination for LVDD which confirmed preserved LVEF ≥ 50%; and (2) a CXR examination [Table 1]. DEcho LVDD-Grading had categorized each subject as one of the following^25,26,73:

Grade 0 (aka Normal Filling): N = 250.
Grade 1 (aka Delayed Relaxation): N = 253.
Grade 2 (aka Pseudo-Normal Filling): N = 252.
Grade 3–4 (aka Restrictive Filling, either Grade 3/Reversible or Grade 4/Fixed): N = 91.

Unsuspected-LVDD subjects

A group of 750 asymptomatic and largely healthy subjects [Healthy] (without DEcho examinations), all free of LVDD predisposing factors (e.g., renal failure^2,7,20 was added [Table 1]. In addition, 86 subjects with RHC-confirmed pre-capillary PH [Group 1 PH]^74,75 were also included.

Human-based CXR PVH-staging vs. DEcho LVDD-grading

Reviewer-based assignment of PVP

Description

The digital CXR examination of each study subject was independently reviewed twice (separated in time) by four cardiothoracic-trained radiologists (varying in experience level) for the assignment of one of the following 11 possible Pulmonary Vascular Patterns [PVPs]:

1: Normal.
2–5: PVH-Stages 1, 2-Early, 2-Late, or 3 alone^39,53,62,76.
6–9: PVH-Stages (above) + CpcPH^56,57,76.
10: Group 1 PH^74,75.
11: Uncertain.

Human Ground Truth [HGT] PVP assignments (by consensus between the two most-experienced Expert Reviewers: Reviewer 3 and Reviewer 4) of PVH Stage were used in determining the relationship between PVH-Staging and LVDD-Grading.

Intra-reviewer reliabilities

Intra-Reviewer variability in assigning PVPs by Reviewers 1–4 ranged from 14.0 to 19.4%. Nevertheless, there was significant (p < 0.001) overall intra-Reviewer reliability by good agreement (Intra-Class Correlation [ICC] 0.78–0.81) with PVP-order ignored, and by strong agreement (Reviewers 1–3: Kendall rank correlation coefficient [Ktau] 0.55–0.67) or very strong agreement (Reviewer 4: Ktau 0.74) with PVP-order considered [Table 2].

Table 2 Intra-reviewer reliability and inter-expert reviewer concordance in PVP assignment.

Full size table

For intra-Reviewer reliability related exclusively to a pulmonary vascular abnormality [Table 2], significant (p < 0.001) values, increasing with Reviewer experience, applied to Group 1 PH detection (versus Normal) and its differentiation from PVH Stage 1; respectively, the two and three most-experienced Reviewers demonstrated progressing good-excellent consistency (ICC 0.77–0.95). While significant (p < 0.001), poor-moderate agreement (ICC 0.29–0.56) applied to PVH Stage 1 detection (versus Normal) across Reviewers 1–4.

For intra-Reviewer reliability related to pulmonary edema, a positive experience-to-consistency trend for edema detection (versus Normal, PVH Stage 1, or Group 1 PH) was noted [Table 2]. While significant (p < 0.001), the agreements were moderate (ICC 0.55–0.72) across Reviewers 1–4.

Regarding CpcPH detection, only the most-experienced Reviewer achieved significant (p < 0.001) moderate consistency (ICC 0.64) in the setting of PVH Stage 2 [Table 2].

Inter-expert reviewer concordance

While 18.3% of final PVP assignments by the two Expert Reviewers (i.e., Reviewer 3 and Reviewer 4) were discordant, there was significant (p < 0.001) overall agreement between them at good (ICC 0.76) to strong (Ktau 0.62) levels [Table 2].

Regarding inter-Expert Reviewer concordance related exclusively to a pulmonary vascular abnormality in the absence of edema, significant (p < 0.001) agreement at good (approaching excellent) levels applied to the detection of Group 1 PH (vs. Normal) (ICC 0.88) and its differentiation from PVH Stage 1 (ICC 0.86). Though, poor agreement (ICC 0.09) applied to the detection of PVH Stage 1 (vs. Normal).

However, regarding inter-Expert Reviewer concordance related to pulmonary edema, a significant (p < 0.001) agreement at a good level (ICC 0.78) was demonstrated for rating edema, once detected.

Final PVP assignments

Reviewers 1–4 finally assigned PVPs to be abnormal in 14.6–17.9% of Study Population CXR examinations, while 40.5% (682/1,682) of subjects had pathophysiology confirmed by DEcho or RHCath [Table 3].

Table 3 Final PVP assignments.

Full size table

By consensus HGT assignment [Table 3], the prevalence of abnormal PVPs was 15.5%. Remarkably, HGT assignments of PVH Stage 1 (3.2%) were less common than those individually by Reviewers 1–4 (3.4–9.2%). On the other hand, HGT assignments of PVH Stage 2-Early without superimposed CpcPH (4.9%), were more common than individually by both Expert Reviewers (3: 2.0%, 4: 4.8%), while less common than by the other two Reviewers (1: 7.6%, 2: 5.1%). Similarly, HGT assignments of Group 1 PH (3.2%) were more common than those individually by Reviewers 1–4 (1.0–2.9.0.9%).

Correlating PVH-staging with LVDD-grading

Suspected absence of LVDD

When LVDD was not suspected (by lack of symptoms and predisposing factors) in Healthy subjects, HGT-assigned PVPs included: Normal 98.0%, PVH 1.7%, and Group 1 PH 0.3% [Fig. 1]. The designation of PVH applied to 13 subjects aged 46–83 (median 51) years old.

Increasing LVDD grade

With progression from DEcho LVDD Grade 0 to Grade 3–4, a significant (p < 0.001) positive trend towards increasing HGT-assigned PVH Stage 1 to PVH Stage 3 PVPs was found in symptomatic subjects with suspected LVDD [Fig. 1].

With confirmed absence of LVDD based on Grade 0/Normal Filling, HGT-assigned PVPs indicated Normal in 94.0% and PVH in 5.6% [Fig. 1]. Despite presumed normal mLAP levels, a significantly (p < 0.050) higher PVH prevalence was found in symptomatic subjects with Grade 0 than in unsuspected-LVDD Healthy subjects. Despite this apparent PVH background, the assessment of likelihood of excluding DEcho Grade 1 to Grade 3–4 produced an HGT -LR value of 0.18 (Reviewers 1–4: 0.14–0.38), approximating a strong level of confidence (i.e., < 0.10) in favor of PVH-Staging to rule-out LVDD at rest, even in the presence of characteristic symptoms [Table 4].

Table 4 Likelihoods of identifying DEcho LVDD grade per reviewer or HGT.

Full size table

Although expected to be highly prevalent in Grade 1/Delayed Relaxation due to presumed absence of any mLAP elevation, HGT-assigned PVPs indicated Normal in 87.3% of affected subjects [Fig. 1]; varying PVH PVPs, especially PVH Stage 2 in 9.5%, were assigned in the remaining 12.7%. This relatively higher PVH prevalence in Grade 1, compared to Healthy and/or Grade 0 conditions, was statistically significant (p < 0.050). However, neither -LR nor + LR values provided strong support to PVH-Staging to rule-out or rule-in Grade 1, respectively [Table 4].

With further progression of LVDD to Grade 2/Pseudo-Normal Filling with its anticipated mildly elevated mLAP levels, the prevalence of a HGT-assigned Normal PVP continued to significantly (p < 0.050) decrease to 64.7% with a corresponding additional significant increase in prevalence of PVH PVPs to 34.5%. This rise in PVH prevalence included increasing representation of PVH Stage 2 in 26.6%, with superimposed CpcPH accounting for 4.0%. Again, neither -LR nor + LR values provided strong evidence for PVH-Staging in ruling-out or ruling-in Grade 2, respectively [Table 4].

Last, the progression to Grade 3–4/Restrictive Filling, with its anticipated moderately elevated mLAP levels, was associated with another significant (p < 0.050) increase in PVH prevalence in 55.0% of affected subjects. Once more, the rise in PVH was characterized by increasing representation of PVH Stage 2 in 39.6% (superimposed CpcPH accounting for 15.4%) and PVH Stage 3 in 6.6% (including 2.2% with CpcPH). The assessment of likelihood of detecting the presence of Grade 3–4 produced an HGT + LR value of 55.5 (high individual Reviewer values of 10.1–64.6), indicating strong evidence (i.e., > 10.0) in favor of PVH-Staging to rule-in Grade 3–4 at rest [Table 4].

HGT results in group 1 PH

Correct HGT PVP assignments of Group 1 PH were made in 57.0% (mPAP 26–77/mean 49 mmHg), versus assignments of Normal in 31.4% (mPAP 27–70/mean 45 mmHg), of subjects with previously RHC-confirmed Group 1 PH [Fig. 1]. In the remaining 11.6% of subjects, assignments of PVH Stage 2 were made (despite mPCWP 6–15/mean 10 mmHg), 4.6% represented by Stage 2 alone (mPAP 28–77/mean 60 mmHg) and 7.0% characterized by superimposed CpcPH (mPAP 46–82/mean 59 mmHg). Thus, a HGT assignment of a form of pre-capillary PH was made in 64% of Group 1 PH subjects. The high + LR values by both HGT (160.7) and individual-Reviewers (162.1–423.0) indicated very strong confidence in identifying Group 1 PH by CXR; based on -LR values (0.43–0.80), its exclusion confidence was found to be equivocal [Table 4].

AI-based CXR PVH-ranking vs. DEcho LVDD-grading

AI-based assignment of “PVP”

Description

Our cascading two-component “PVP Identifier” [PVPI] AI model consisted of the: (1) Thoracic-Content Segmentator (for automatic thoracic-cavity segmentation); and (2) PVP Multi-Classifier to differentiate between image-data characteristics based on a four-class physiological pattern, as follows:

Normal.
PVH Stage 1 (Vascular redistribution without edema +/- CpcPH).
PVH “Stage 2+” (Vascular redistribution/congestion with predominantly interstitial edema +/- CpcPH).
Group 1 PH.

The PVPI was used in processing the same 1,682 CXR examinations to predict probabilities for these four classes. PVPI-derived CXR-based PVH-rank predictions were correlated with LVDD-Grading.

AI performance compared to reviewer performance

PVPI activity maps [Fig. 2] demonstrated an increasingly redistributed balance of activity (green) from lung bases to upper-lung regions with worsening HGT-assigned PVH Stage; for each example, the highest PVPI PVH-Ranking prediction (PVH Predict) corresponded well with HGT PVH-Staging assignment. However, with both the pulmonary vasculature and cardiac silhouette reflecting relative inactivity (blue shades), changing physiological “States” rather than PVPs were apparently recognized by the PVPI.

For PVH detection, the PVPI achieved an overall accuracy of 0.89 and balanced accuracy of 0.86 by a 3-class output (i.e., Normal versus PVH Stages 1–2 + versus Group 1 PH), with near-equally strong normalized class accuracies for both Normal at 0.91 and PVH at 0.90 [Table 5].

Table 5 Performance of PVPI relative to HGT assignments of PVP.

Full size table

On the other hand, for basic PVH-Ranking, the PVPI by a 4-class output (i.e., Normal versus PVH Stage 1 versus PVH Stage 2 + versus Group 1 PH) reached a lower overall accuracy of 0.79, and balanced accuracy of 0.72, with normalized class accuracies still strongest for Normal at 0.91, followed by PVH Stage 2 + at 0.84 [Table 5]. The intermediary PVH Stage 1 was predicted with a low normalized accuracy of 0.35, near-equivalent to PVH Stage 2 + misclassification at 0.38.

Correlating PVH-ranking with LVDD-grading

As with HGT PVH-Staging assignments, a significant (p < 0.001) very positive trend towards increasing PVH-Ranking by the PVPI with progression from Grade 0 to Grade 3–4 was found in symptomatic subjects [Fig. 3]; compared to HGT PVH-Staging the PVPI-related trend was significantly (p < 0.001) stronger both overall as well as at each LVDD Grade.

Suspected absence of LVDD

In the absence of suspected LVDD (i.e., in Healthy), the PVPI predicted Normal in 91.6% of subjects [Fig. 3] (vs. HGT 98.0%). Although still representing a very small proportion of this group, 4.8% (vs. HGT: 1.7%) had PVPI-recognized PVH.

Increasing LVDD grade

In contrast, with Grade 0/Normal Filling, the PVPI predicted Normal in only 30.4% (vs. HGT 98.2%) but PVH in 47.6% (including Stage 1: 29.6%, Stage 2+: 18%) [Fig. 3]. This relatively higher PVH prediction in suspected LVDD but Grade 0, compared to unsuspected LVDD (i.e., Healthy), was statistically significant (p < 0.001).

With Grade 1/Delayed Relaxation, PVPI prediction of PVH predominated at 70.7% overall (including Stage 2+: 39.1%, Stage 1: 31.6%), while Normal was predicted in 10.7% [Fig. 3]. Again, PVH prediction by the PVPI has significantly (p < 0.001) higher than that for Grade 0.

Advances to Grade 2/Pseudo-Normal Filling and then Grade 3–4/Restrictive Filling, were tracked by rapid and significant (p < 0.001) additional stepwise increases in PVH predictions at 79.4% and 83.5% overall (including Stage 2+: 55.2% and 68.1%), respectively [Fig. 3].

Trialing in confounded test group

To initally gauge PVPI robustness, it was trialed in a Confounded Test Group of 40 previously excluded suspected-LVDD subjects, each demonstrating on CXR an example of implanted cardiovascular material, while otherwise meeting the inclusion criteria of Study Population subjects.

By Expert Reviewer adjudication, initial auto-segmentations of Frontal-view CXRs produced by the Thoracic-Content Segmentator were deemed adequate in 38 of 40 subjects, resulting in an overall Dice coefficient approaching 1.00 (i.e., 0.9995). For the remaining two auto-segmentations (truncated left hemithorax apex at clavicle containing orthopedic plate-screw device; blunted left costophrenic angle by moderate-sized pleural effusion) in which minor manual modifications were thought to be potentially needed, the Dice coefficients were initially 0.99.

Evaluation of the subsequent 3-class and 4-class output performances by the PVP Multi-Classifier (using the unmodified auto-segmentation set) indicated a negative impact with overall accuracies declining to 0.73 and 0.58, and balanced accuracies to 0.68 and 0.58, respectively [Table 6]. The decrease was primarily attributable to recognition of a Normal “State” at 0.40 and PVH Stage 2+ “State” at 0.70; no association with the represented cardiovascular material types was identified.

Table 6 Performance of PVP multi-classifier in confounded test group.

Full size table

Despite evidence of anatomical-confounder impairment of physiological-“State” differentiation, the PVP Multi-Classifier predictions of LVDD severity remained strong, with CXR PVH-Ranking of DEcho LVDD-Grading demonstrating a significantly positive (p < 0.001) relationship [Fig. 4]. Above Grade 1/Delayed Relaxation, only PVH “States” were detected, including a purely PVH Stage 2+ “State” for Grade 3–4/Restrictive Filling.

Discussion

This work provides overdue insights into the implications of CXR “pulmonary congestion” in suspected HF. We believe it is the first to describe the: (1) Potential for systematic CXR-based PVH-Staging to make evidence-based contributions to current diagnostic algorithms^35,36 or scoring systems^21,24,66 in the initial exclusion versus detection with scaling of LVDD in asymptomatic pre-HF^6,7,8,9,10, symptomatic pre/early HF^6,11, or HFpEF^6,12 conditions; (2) Relationship between PVH-Staging (presumably indicating mLAP reflecting both LA-operating compliance^12,31 and atrioventricular coupling^22,32 and LVDD-Grading (indicating LVEDP reflecting both LV preload and LV-operating compliance^1,4,23; (3) Interpretation reliability of PVH-Staging by cardiothoracic radiologists with differing experience; and (4) Performance of completely AI-assisted CXR-based PVH-Ranking in predicting DEcho-determined LVDD Grade.

Despite the fact that, unlike prior studies^{39,40,41,42,43,44,45,46,47,48,49}, we restricted CXR interpretations to cardiothoracic radiologists, intra-/inter-interpreter variabilities were demonstrated during Reviewer assignments of one of 11 possible PVPs, including 9 PVH-related. Nevertheless, there was significant overall intra-Reviewer reliability. In addition, for Group 1 PH detection and its differentiation from PVH Stage 1, as well for pulmonary edema detection, at least moderate intra-Reviewer reliability was achieved, with enhancement by Reviewer experience, as previously recognized by others⁶³. In addition, comparable inter-Expert Reviewer concordance for rating edema (i.e., early versus late interstitial) was also shown.

Like related studies^50,51,52,53, we found intra-Reviewer and inter-Expert Reviewer identification of pulmonary vascular redistribution without edema (i.e., PVH Stage 1) to be unreliable; only the most-experienced Reviewer achieved moderate consistency. Thus, the application of PVH-Staging, especially in early phases preceding identifiable edema, would be limited by such dependency on experience; prior proposals for consensus interpretations^43,44,51 constitute impractical solutions.

Difficulties recognizing a PVH Stage 1 PVP characterized by redistribution (with PCWP or mLAP 13–17 mmHg), representing a narrow LV-preload transition between a Normal PVP (with PCWP or mLAP 4–12 mmHg) and “pulmonary congestion” plus interstitial edema (with PCWP or mLAP ≥ 18–24 mmHg)^39,53,62,76, have been previously reported^{43,45,47,51,52,55,77}. Considering the diverse prior experience, the absence of a PVH Stage 1 PVP might only help exclude increased LV preload for HF pretest probability < 9%, whereas it might help confirm HF with pretest probability > 91%⁵³; the prospects for PVH Stage 1 “State” recognition in isolated-LVDD with AI assistance are uniquely addressed in this work.

Based on HGT-assigned PVH-related PVPs, significant correlation between CXR PVH-Staging and LVDD-Grading in the setting of preserved LVEF was found.

Remarkably, low levels of PVH were recognized even in asymptomatic unsuspected-LVDD (i.e., Healthy) subject. With their median age of 51 years old, detected PVH in half could have represented age-related (i.e., ≥ 50-year-old) LVDD^7,17,18 but no DEcho data was available to test this conjecture.

Nevertheless, significantly increased PVH prevalence in symptomatic DEcho-confirmed Grade 0/Normal Filling suggested influences by fluctuating hemodynamics from episodic LVDD or atrial fibrillation^6,18,22,78. While false-positive PVH recognition in Healthy and Grade 0 conditions is a possible explanation, generally low Reviewer sensitivities to PVH, especially to PVH Stage 1 (3.3% overall, 8.6% Stage 2) was observed in this study.

Despite presumably normal mLAP levels^39,53,62,76, significantly higher PVH assignments in Grade 1/Delayed Relaxation, implicating hemodynamic lability with intermittent mLAP elevations or LVDD exacerbations^6,18,22,78 and more prolonged or relatively fixed pulmonary vascular manifestations (e.g., distention)⁷⁹. Thus, combined DEcho Grades 0–1 and CXR-indicated PVH might serve as a marker for unstable and/or worsening mild or early LVDD^10,78,80,81.

As expected from mild mLAP elevation, Grade 2/Pseudo-Normal Filling was associated with significantly higher PVH prevalence, reflecting background PVH accentuated by chronically fluctuating hemodynamics^22,78,80. Consequently, PVH on CXR might serve as an adjunct LVDD indicator for Grade 2 at rest, thereby obviating total dependency on stressing for its differentiation from Grade 0^82,83. Last, Grade 3–4/Restrictive Filling, with expected moderate mLAP elevation, demonstrated another significant increase in PVH prevalence involving slightly more than half of affected subjects.

However, while LR-based assessment of confidence in excluding DEcho-confirmed LVDD was approximated at Grade 0, adequate confidence in LVDD recognition was not achieved until Grade 3–4. Therefore, despite significant direct correlation between PVH-Staging and LVDD-Grading, human-based PVH assignments did not definitely predict incremental increases in LVDD severity. Accordingly, we saw justification for investigating the potential of AI to support CXR-based PVH evaluation in facilitating LVDD assessment or complementing DEcho-based LVDD examinations.

Our cascading two-component PVPI model includes the PVP Multi-Classifier which was created to freely identity distinguishing anatomical and/or functional cardiopulmonary differences, hence our reference to a physiological “State”, rather than a PVP, of PVH. Considering that accompanying HGT-assigned increasing PVH PVPs, there were progressing upper-lung redistributions of AI-model inference activity (without correspondence to cardiac or PVP silhouettes), PVP Multi-Classifier-identified parameter(s) likely reflected: (1) Subtle or complex variations in radiodensity not humanly detectable⁸⁴; or (2) physiological indicators of lung blood flow and/or water content^85,86.

Nonetheless, compared to HGT-assigned PVH, the PVPI demonstrated significantly greater: (1) Sensitivity to PVH-related changes, starting in symptomatic proven-absent LVDD (Grade 0); and (2) Correlation between increases in ranked CXR PVH-“State” (especially PVH Stage 2+) and LVDD-Grading. Both should facilitate contributions by CXR-based PVH evaluation to initial LVDD recognition in suspected-HF individuals or those at-risk of HF, as well as help realize opportunities for complementing DEcho LVDD examinations, while eliminating issues in reliability of human-based CXR interpretation.

In Group 1 PH, PVPI performance exceeded that by HGT-assigned PVPs. Low-level PVH prevalences detected by both human-based and AI-based CXR evaluations are at least partly explained by: (1) An inclusion allowance of RHCath-confirmed mPCWP 13–14 mmHg, shared with PVH Stage 1^{39,53,62,74,75,76}; (2) Interdependence of RV systolic function and LV filling⁸⁷; and (3) The prolonged RHCath-CXR periods, allowing interval PVH development. However, unlike HGT assignments, the PVPI exhibited a tendency towards Group 1 PH detection in symptomatic subjects, especially those with Grade 0. While, with few exceptions, affected subjects received HGT-assigned Normal PVP, this PVPI predilection may have reflected recognition of Group 1 PH with normal baseline spatial lung-perfusion heterogeneity, differentiable after vasodilatation⁸⁸.

Limitations in this study are recognized. First, the relatively small numbers of subjects representing PVH Stages and LVDD Grades limited human-/AI-based evaluations, resulting from the purposeful exclusion of over 14,000 potential subjects to avoid possible: (1) anatomical/physiological confounders of DEcho-LVDD or CXR-PVP examinations^{55,58,59,60,61,62,89,90}; or (2) AI model-training confounders^91,92,93. Nevertheless, our stringent inclusion restrictions ensured the: (1) Needed assessment of a relationship between PVH-Staging and LVDD-Grading, and (2) Development of an AI model which focused only on that relationship. Consequently, our Subject Population uniquely reflected suspected HF with isolated-LVDD in the absence of systolic dysfunction. Thus, while a complete understanding of the robustness of our PVPI for AI-assisted objective CXR “pulmonary congestion” assessment awaits real-world deployment allowing for confounding factors, we performed initial trialing in the Confounded Test Group and showed that PVPI prediction remained strong.

In addition, semi-erect positioning during CXR examination was allowed since: (1) Basilar preferential lung flow is maintained even when supine⁹⁴; and (2) Prior reports demonstrated no significant negative impact on PVH assessment with supine positioning⁵⁰.

Last, the DEcho reports reviewed from the greater than 20-year span did not uniformly indicate use of all now-possible LVDD-Grading methodology^25,26,73. However, the fundamental components⁹⁵ were considered adequate for the needed basic LVDD-Grading.

In conclusion, work reveals that: (1) There is, in fact, a previously unverified significant direct relationship between CXR PVH-Staging and LVDD-Grading; (2) CXR PVH-Staging remains, however, clinically limited by experience-dependent intra-/inter-interpreter variabilities; (3) LVDD exclusion by PVH-Staging is adequate but its inclusion is not confidently reached until Restrictive Filling is present, thereby inhibiting its impact on recognition of milder or earlier phases of HF; although (4) Our AI-assisted CXR PVH-Ranking appears to exceed the sensitivity of human performance which may facilitate earlier and widespread pathophysiological assessments in persons with suspected HF or at-risk of HF, potentially further enhanced when blended with other proposed screening indicators (e.g., AI-enabled ECG⁹⁶ in a multi-modal generative AI model⁹⁷ advancing the diagnostic needs of cardiometabolic care⁹⁸.

Methods

All methods were performed in accordance with the relevant guidelines and regulations.

Selection of study population

With approval by the Mayo Clinic Institutional Review Board, enterprise-wide data-mining of consecutive patients (spanning: 02/22/2003–08/15/2023) was completed using its shared Electronic Medical Record [EMR] system (Epic Systems, Verona, WI) to retrospectively identify potential candidates for the Study Population. Due to the retrospective nature of the study, the Mayo Clinic Institutional Review Board waived the need for obtaining informed consent from patients.

Screened personal EMRs reflected the highly multi-racial/cultural population of individuals drawn (locally, regionally, nationally, or internationally) to our integrated healthcare enterprise which consists of a multi-state network with three geographically dispersed quaternary-referral medical centers (located in Upper-Midwestern, Southeastern, and Southwestern regions of the United States), as well as > 70 Midwestern satellite hospitals or ambulatory clinics.

Suspected-LVDD subject identification and characterization

By electronic EMR searching, potential study subjects with suspected isolated-LVDD were first identified by having undergone within the same 24-hour period: (1) a DEcho examination confirming preserved LVEF ≥ 50% during a comprehensive LVDD evaluation^24,25,73 for characteristic symptoms (e.g., dyspnea on exertion)^2,6; and (2) a CXR examination. Preliminary filtering also included early exclusion of individuals with concurrent Chronic Kidney Disease at either Stage-5 or hemodialysis-dependent Stage-4 levels to avoid potentially confounding influences of cardiopulmonary volume-overloading and/or rapid dialysis-related fluctuations in LV-preload volume^89,90. From this initial filtered search, 15,880 potential subjects remained.

Next, the individual EMR of each of these potential subjects was manually screened (Author 1) for the presence of possible anatomical or physiological confounders of either the DEcho evaluation for isolated-LVDD or the CXR assessment of PVPs [Table 7]^{55,58,59,60,61,62,89,90}; those potential subjects affected by any of such possible confounders were excluded from further consideration. Last, the CXR images of still-eligible potential subjects were visually reviewed (Author 1) for: (1) Supine positioning (reducing gravitational contribution to normal pulmonary blood distribution)⁹⁴; or (2) Possible AI model-training confounders [Table 8] leading to spurious associations between characteristic cardiovascular materials and a disease type or severity during model training^91,92,93; the presence of either type of condition caused those affected to also be excluded. Ultimately, this extensive filtering resulted in marked reductions in the number of eligible subjects to 846 study subjects [Table 1], negatively impacting the representation of those with more-severe LVDD Grades, in which exclusion criteria (e.g., atrial fibrillation, indwelling infusion catheters) were more often met.

Table 7 Exclusion criteria - possible imaging anatomical or physiological confounders.

Full size table

Table 8 Exclusion criteria - possible AI model-training confounders.

Full size table

DEcho-examination results evaluation

All comprehensive DEcho-based LVDD evaluations represented were selected from examinations performed enterprise-wide by its Intersocietal Accreditated Commission (Ellicott City, MD)-accredited Echocardiology Laboratories to support clinical standard-of-care over the greater than 20-year span. During this period, varying generations of different DEcho system manufacturers and/or versions were used.

LVDD-Grading based on the final report of each DEcho examination (≤ 1 day from CXR), was confirmed or corrected (Authors 1,7,8) for each of the remaining 846 study subjects with suspected LVDD in the absence of systolic dysfunction. Using well-established criteria for Grading of LVDD, including standard E/A-ratio parameters^24,25,73, each subject was categorized as one of the following [Table 9]:

Table 9 Representative criteria for DEcho grading of LVDD.

Full size table

Grade 0 (aka Normal Filling).
Grade 1 (aka Delayed Relaxation).
Grade 2 (aka Pseudo-Normal Filling).
Grade 3–4 (aka Restrictive Filling): Grade 3 (Reversible E/A ratio with patient Valsalva) and Grade 4 (Fixed E/A ratio despite patient Valsalva) were combined due to independently small subject numbers.

In addition, on an individual subject basis, DEcho determination (often corroborated by RHCath) of the severity of resulting PVH confirmed the concurrence of one of the following [Table 10]^26,74,99: (1) No PVH; (2) Insignificant PVH; Significant PVH (i.e., isolated post-capillary PH); or 4. CpcPH.

Table 10 DEcho and RHCath measures of PVH severity.

Full size table

Unsuspected-LVDD subject identification

After the exclusion of initally considered subjects based on the presence of LVDD predisposing clinical factors (e.g., history of myocardial infarction), or for possible anatomical or physiological confounders of imaging (DEcho or CXR) [Table 7]^{55,58,59,60,61,62,89,90} or possible confounders of AI model-training [Table 8]^91,92,93, the following groups of subjects without suspected LVDD were added.

Healthy subjects

To support the possibility that a CXR-based PVP depicting baseline normal LV diastolic function was differentiatable from a PVP associated symptomatic suspected-LVDD but with DEcho Grade 0, a group of 750 asymptomatic and relatively Healthy subjects (no DEcho performed) was added (Author 1) [Table 1].

Group 1 PH subjects

In addition, to help guarantee that PVPs reflecting CpcPH were distinguishable from the Group 1 PH^74,75 (the other form of cardiopulmonary-derived pre-capillary PH), a group of 86 subjects with previously RHCath-confirmed (frequently corroborated by interval DEcho) Group 1 PH was also included (Authors 1,9) [Table 1]; in these patients, simultaneous PCWP measurements proved the absence of isolated post-capillary PH^26,74,99.

Supporting human-based CXR PVH-staging

CXR examinations and CXR-image reviewing

All CXR examinations of the 1,682 subjects in the Study Population [Table 1] originally promoted clinical standard-of-care by direct-digital or computed radiography using varying generations of nine manufacturers of fixed and/or portable CXR systems operating enterprise-wide over the greater than 20-year span. Each of digital CXR examination consisted of an upright or semi-upright Frontal view (postero-anterior or antero-posterior), 91% (1,524/1,682) were accompanied by a Lateral view.

The total of 3,206 CXR images were downloaded from the enterprise deconstructed Picture Archiving and Communication System, consisting of the: (1) Radiology Information System (Radiant from Epic Systems, Verona, WI); (2) Vendor Neutral Archive system (Synapse from TeraMedica/Fujifilm Medical Systems USA, Inc., Wauwatosa, WI); and (3) Viewer (Visage Imaging from Pro Medicus Ltd, Richmond, Australia) to a secure shared-drive. The shared-drive supported locally developed: (1) Graphical User Interface [GUI]¹⁰⁰ which allowed modifications of the underlying commercial software (MeVisLab from MeVis Medical Solutions AG, Bremen, Germany) for either bulk CXR image-reviewing or image-segmentation prior to AI-model training; and (2) Zero-footprint viewer (“CAII Viewer”)¹⁰¹ functioning on a backend database manager for CXR-image reviews to establish individual-Reviewer or consensus-HGT PVP assignments by the two Expert Reviewers, on one hand, or AI-model inference display for adjudication, on the other.

PVP assigning and PVH-staging

Initially, the de-identified CXR examinations were presented in random order for independent assessments (while blinded to all information regarding corresponding clinical status or DEcho findings) by four cardiothoracic-trained radiologists (Authors 1,4–6) to establish individual-Reviewer PVP assignments, and eventually consensus-HGT PVP assignments by the two most-experienced “Expert Reviewers”. CXR Reviewers 1–4 had different subspecialty experiences related to: (1) Training in CXR-based PVH-Staging during Radiology residency and/or Cardiothoracic Radiology fellowship; as well as (2) Years of post-training Cardiothoracic Radiology practice. These Reviewers included:

Reviewer 1 (Fellowship-level).
- Fellowship: 0.5-year.
- PVH-Staging training: Moderate residency/strong fellowship.
- Practice years: None.
Reviewer 2 (Early-Career):
- Fellowship: 1-year.
- PVH-Staging training: Mild residency/mild fellowship.
- Practice years: 4.
Reviewer 3/HGT Expert Reviewer (Mid-Career).
- Fellowship:1-year.
- PVH-Staging training: Moderate residency/strong fellowship.
- Practice years: 14.
Reviewer 4/HGT Expert Reviewer (Late-Career).
- Fellowship: 2-year.
- PVH-Staging training: Strong residency/strong fellowship.
- Practice years: 37.

While blinded to associated clinical and DEcho information, each Reviewer independently assessed the CXR examinations to assign each with one of the following 11 possible PVPs, including:

1: Normal.
2–5: PVH-Stages 1, 2-Early, 2-Late, or 3 alone^39,53,62,76 [Table 11] [Fig. 5].

Table 11 CXR PVH-staging characteristics.

Full size table

6–9: PVH-Stages (above) + CpcPH (with superimposed main & central pulmonary artery dilatation)^56,57,76 [Fig. 6].

10: Group 1 PH (Main & central pulmonary artery dilatation without PVH, lung disease, etc.)^74,75 [Fig. 6].
11: Uncertain.

Evaluating intra-reviewer reliabilities and inter-expert reviewer concordance

After a four-week “washout” period (with interval re-randomization of examinations), independent CXR assessments were repeated by Reviewers 1–4; this supported the determination of intra-Reviewer reliability per subspecialty experience (training and practice). During a subsequent (≥ 2 weeks later) adjudication by each Reviewer of any personal-assignment inconsistencies between assessments (previous assignments provided for consideration, without restriction to their re-use), Reviewers 1–4 individually committed to their final PVP assignments, thereby facilitating the determination of iner-Reviewer reliability.

Determining final HGT PVP assignments

Last, to achieve HGT assignments of PVPs for the 1,682 CXR examinations, consensus between the Expert Reviewers was reached per examination by: (1) Initial concordance between final assignments; or (2) Subsequent final concordance during “face-to-face” review of discordant final assignments.

Correlating human-based CXR PVH-staging with DEcho LVDD-grading

HGT-based PVH-severity determinations by PVP assignments were used in determining the relationship between CXR-based PVH-Staging and DEcho-based LVDD-Grading.

Supporting AI-based CXR PVH-ranking

Creation of AI model for CXR assessment

Technical infrastructure

AI-model development utilized secure on-site and remote Graphics Processing Unit [GPU] (Nvidia, Santa Clara, CA)-dependent systems. Data curations and initial model development relied on: (1) One dual-GPU workstation (2 RTX 8000) with 96 GB video memory/128 GB system memory/12 TBs disk storage/2 TB SSD drive for operating-system support (Windows 10); and (2) Two single-GPU workstations containing (RTX 8000) with 48 GB video memory/128 GB system memory/18 TBs disk storage/2 TB SSD drive for operating-system support (Windows 10). For base-model training, a node (4xA100 80GB GPUs) from a 32-GPU high-performance cluster was utilized. For subsequent image-classification tasks, involving training, validation, and testing, a DGX A100 System (8xA100 40GB GPUs) was used.

Data curation/annotation

Initially hypothesizing that increasing DEcho LVDD-Grade would be tracked closely by increasing PVH Stage on CXR, relatively equal-sized numbers of symptomatic subjects with Grades 0–4 were anticipated during formation of the Study Population. Accordingly, relatively equivalent numbers of study subjects with DEcho Grades 0–2 were compiled, expecting corresponding incremental increases in milder PVH Stages. However, despite Grade 3 (Reversible) and Grade 4 (Fixed) subjects being combined, the cumulative number of study subjects representing Restrictive Filling remained relatively small, attributable due to greater frequency of confounding conditions causing subject exclusion.

While maintaining clinical significance, subject bundling across PVPs was eventually needed for optimization of multi-classifier creation. This bundling was justified by the following:

Unexpected imbalances in representations of the original 11 possible PVPs, including the segregation of those affected by CpcPH; nevertheless, the prevalence of CpcPH interpretations was recorded.
Very few representations of PVH Stage 3 (n = 7), insufficient for creation of a separate class for an already readily identifiable PVP (i.e., alveolar edema largely obscuring pulmonary vasculature); while included in multi-classifier training and validation, these subjects were omitted from model testing.

PVP identifier model components

Our cascading “PVP Identifier” [PVPI] incorporated the automatic consecutive application of the following two components (Authors 2,10).

“Thoracic-content segmentator” component

To avoid spurious associations related to extra-pulmonary characteristics (e.g., patient identification labels)^91,92,93, the 3,206 (1,682 Frontal + 1,524 Lateral) CXR images were manually segmented by an Expert Reviewer (Author 1) using a modification of the GUI¹⁰⁰. These segmentations followed the inner surface of the ribs and diaphragm to delineate the thoracic-cavity contents (e.g., lungs) from the apex to the posterolateral costophrenic sulci, using the L1-L2 disc space as a reference whenever they were obscured, such as by a pleural effusion¹⁰².

The resulting segmentations of the thoracic contents were used to create by supervised learning the Thoracic-Content Segmentator for automatic thoracic segmentation as the first PVPI component. The Thoracic-Content Segmentator was developed based on the DeepLabV3 semantic segmentation architecture¹⁰³, with ResNet-50 as a backbone¹⁰⁴. The final Dice coefficients^105,106 achieved by the Thoracic-Content Segmentator were 0.98 for Frontal views and 0.97 for Lateral views [Fig. 7].

“PVP multi-classifier” component

The PVP Multi-Classifier, the second PVPI component, was developed using the DINO transformer architecture to: (1) Subdivide CXR-image data into patches of pixels; (2) Flatten patches into a sequence of visual vector tokens by linear transformation; and (3) Pass the token sequence into a transformer¹⁰⁷. During the model creation, a base model was initially developed using the publicly available MIMIC-CXR dataset comprised of 337,100 images¹⁰⁸, where a training loss of 1.7 was achieved with 200 epochs. The base model weights were then utilized to train an image-classification task^107,109 for final PVP Multi-Classifier creation using Thoracic-Content Segmentator output from the 3,206 CXR images.

For creation of the PVP Multi-Classifier, a 5-fold stratified cross-validation approach (i.e., K-fold folded test set)¹¹⁰ was employed to rotate through training (60%), validation (20%), and testing (20%) subsets of input data after balancing of sample size per class via undersampling or oversampling. While supervised from the standpoint of PVP categorization, the PVP Multi-Classifier was developed in a relatively unsupervised fashion, free of any ground-truth expert delineation of the pulmonary vasculature itself. Despite the execution of multiple iterations based on use of non-bisected vs. bisected Frontal-view data and/or Lateral-view data, the non-bisected complete/full-resolution Frontal auto-segmentations alone provided the highest-yield input into the transformer.

PVP-“state” assigning and PVH-ranking

The PVP Multi-Classifier was designed to differentiate between image-data characteristics based on a more basic physiological classification, as follows:

Normal.
PVH Stage 1 (Vascular redistribution without edema +/- CpcPH).
PVH “Stage 2+” (Vascular redistribution/congestion with predominantly interstitial edema +/- CpcPH).
Group 1 PH.

The PVPI was ultimately used to process the Frontal views from same 1,682 CXR examinations to predict probabilities for these classes with either a 3-class output for PVH detection or a 4-class output for basic PVH-Ranking.

Correlating AI-based CXR PVH-ranking with DEcho LVDD-grading

Re-evaluation of unconfounded study poplulation

As with the HGT PVP assignments, PVPI-derived predictions of PVP “State” were used in determining the relationship between CXR-based PVH-Ranking and DEcho-based LVDD-Grading.

Trialing in confounded test group

The PVPI was purposely designed without representation of implanted cardiovascular materials commonly found in the setting of HF (e.g., surgical items) [Table 8]. Therefore, to preliminarily gauge PVPI robustness, it was trialed in a Confounded Test Group of previously excluded subjects, each demonstrating on CXR an example of implanted cardiovascular material while otherwise meeting the same Study Population inclusion criteria.

Based on HGT analysis of CXR examinations, the 40 subjects (23 males and 17 females) in the Confounded Test Group included 10 examples each of Normal, PVH Stage 1, PVH Stage 2+, and Group 1 PH PVPs. Simultaneously, 30 examples of DEcho Grade 1 (N = 13), Grade 2 (N = 15), or Grade 3–4 (N = 2) LVDD, and 10 examples of RHCath-confirmed Group 1 PH, were represented.

For the assessment of overall PVPI performance in the Confounded Test Group, the following component evaluations were completed:

Thoracic-Content Segmentator: The proportion of satisfactory Frontal-view auto-segmentations without versus with perceived-needed expert modification were determined. Overall Dice coefficients for all 40 segmentations, as well as for perceived deficient segmentations, were calculated.
PVP Multi-Classifier: The accuracy of correct (i.e., highest probability) physiological assignment relative to HGT-assigned PVP was established. PVP Multi-Classifier-generated PVH predictions were again correlated with LVDD-Grading.

Evaluations, comparisons, and statistical analyses

No large language models were used in any portion of this research, including the preparation of this report. Statistical analyses were supervised internally (Author 3).

Using the ICC statistic with PVP-order ignored¹¹¹, both intra-Reviewer reliability (between first and second individual assignments) and inter-Expert Reviewer concordance (prior to HGT consensus) in the assignment of the original 11 possible PVPs by Reviewers 1–4 were evaluated. These included calculations of both overall agreement and agreement per PVP-characteristic; strength of agreement was indicated by the following ICC values: < 0.50 (Poor), 0.50–0.74 (Moderate), 0.75–0.89 (Good), or 0.90–1.00.90.00 (Excellent)¹¹¹.

The same evaluations of overall agreement were also performed with PVP-order considered using the Ktau¹¹². The Ktau expresses the following values for strength of agreement: 0.26–0.48 (Moderate), 0.49–0.70 (Strong) or 0.71–1.00.71.00 (Very Strong)¹¹³.

HGT assignments were used in evaluating the relationship between PVH-Staging and LVDD-Grading; for this purpose, Ktau methodology was again applied. In addition, to assess the impact of Reviewer-to-Reviewer PVH-Staging on this relationship, a Positive Likelihood Ratio [+ LR] and a Negative Predictive Value [-LR] were calculated for Reviewers 1–4, as well as for HGT; an LR is the ratio of the probability of the specific test result in cases vs. the probability in cases without disease¹¹⁴, where LRs above 10 and below 0.10 are considered to provide strong evidence to rule in or rule out diagnoses, respectively¹¹⁵.

The overall performance of the PVP Multi-Classifier was expressed as both unbalanced and balanced average accuracies of the model^116,117 with an output of 3 classes (i.e., Normal vs. combined PVH Stages 1 & 2 + vs. Group 1 PH) to emphasize PVH detection, as well as an output of 4 classes (i.e., Normal vs. PVH Stage 1 vs. PVH Stage 2 + vs. Group 1 PH) to concentrate on PVH-severity ranking. For both outputs, normalized accuracies^117,118 were calculated to construct confusion matrices¹¹⁹.

PVP Multi-Classifier-derived PVH prediction (highest probability classification) and DEcho Grades were correlated using Ktau methodology. A comparison of this relationship with the corresponding relationship based on HGT assignments was then made.

A Chi-squared association test was used to confirm statistical significance [p < 0.050] of correspondence between described measures of CXR-based PVH and DEcho-based LVDD¹²⁰. On the other hand, a Fisher’s exact test was used to gauge the statistical significance of differences between the proportions of categories in HGT-derived vs. PVP Multi-Classifier-derived measures of CXR-based PVH and DEcho-based LVDD¹²⁰.

Data availability

Data sets containing representative images and labels (re-sized and de-identified for downloading) used in this study are available in the supplementary zip file. While the full data sets are not publicly available due to institutional restrictions, researchers interested in accessing high-resolution images may contact the Center for Augmented Intelligence in Imaging of the Mayo Clinic Florida (mail to: erdal.barbaros@mayo.edu) for further information.

References

Angeja, B. G. & Grossman, W. Evaluation and management of diastolic heart failure. Circulation 107, 659–663 (2003).
Article PubMed Google Scholar
AlJaroudi, W. A., Thomas, J. D., Rodriguez, L. L. & Jaber, W. A. Prognostic value of diastolic dysfunction: state of the Art review. Cardiol. Rev. 22, 79–90 (2014).
Article PubMed Google Scholar
Brucks, S. et al. Contribution of left ventricular diastolic dysfunction to heart failure regardless of ejection fraction. Am. J. Cardiol. 95, 603–606 (2005).
Article PubMed Google Scholar
Fukuta, H. & Little, W. C. The cardiac cycle and the physiologic basis of left ventricular contraction, ejection, relaxation, and filling. Heart Fail. Clin. 4, 1–11 (2008).
Article PubMed PubMed Central Google Scholar
Yancy, C. W. et al. 2013 ACCF/AHA guideline for the management of heart failure: A report of the American college of cardiology Foundation/American heart association task force on practice guidelines. J. Am. Coll. Cardiol. 62, e147–239 (2013).
Article PubMed Google Scholar
Gaasch, W. H. & Zile, M. R. Left ventricular diastolic dysfunction and diastolic heart failure. Annu. Rev. Med. 55, 373–394 (2004).
Article CAS PubMed Google Scholar
Wan, S. H., Vogel, M. W. & Chen, H. H. Pre-clinical diastolic dysfunction. J. Am. Coll. Cardiol. 63, 407–416 (2014).
Article PubMed Google Scholar
Nayor, M. et al. Left ventricular diastolic dysfunction in the community: impact of diagnostic criteria on the burden, correlates, and prognosis. J. Am. Heart Assoc. 7, e008291 (2018).
Article PubMed PubMed Central Google Scholar
Redfield, M. M. et al. Burden of systolic and diastolic ventricular dysfunction in the community: appreciating the scope of the heart failure epidemic. JAMA 289, 194–202 (2003).
Article PubMed Google Scholar
Kosmala, W. & Marwick, T. H. Asymptomatic left ventricular diastolic dysfunction: predicting progression to symptomatic heart failure. JACC Cardiovasc. Imaging. 13, 215–227 (2020).
Article PubMed Google Scholar
Zile, M. R. & Brutsaert, D. L. New concepts in diastolic dysfunction and diastolic heart failure: part I: diagnosis, prognosis, and measurements of diastolic function. Circulation 105, 1387–1393 (2002).
Article PubMed Google Scholar
Borlaug, B. A., Sharma, K., Shah, S. J. & Ho, J. E. Heart failure with preserved ejection fraction: JACC scientific statement. JACC 81, 1810–1834 (2023).
Article PubMed Google Scholar
Young, K. A., Scott, C. G., Rodeheffer, R. J. & Chen, H. H. Progression of preclinical heart failure: A description of stage A and B heart failure in a community population. Circ. Cardiovasc. Qual. Outcomes. 14, 622–632 (2021).
Article Google Scholar
Young, K. A. et al. Association of impaired relaxation mitral inflow pattern (grade 1 diastolic function) with long-term noncardiovascular and cardiovascular mortality. J. Am. Soc. Echocardiogr. 38, 367–377 (2025).
Article PubMed Google Scholar
Kane, G. C. et al. Progression of left ventricular diastolic dysfunction and risk of heart failure. JAMA 306, 856–863 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bello, H. et al. Hemodynamic determinants of age versus left ventricular diastolic function relations across the full adult age range. Hypertension 75, 1574–1583 (2020).
Article CAS PubMed Google Scholar
Okura, H. et al. Age- and gender-specific changes in the left ventricular relaxation: A doppler echocardiographic study in healthy individuals. Circ. Cardiovasc. Imaging. 2, 41–46 (2009).
Article PubMed Google Scholar
Klein, A. L. et al. Effects of age on left ventricular dimensions and filling dynamics in 117 normal persons. Mayo Clin. Proc. 69, 212–224 (1994).
Article CAS PubMed ADS Google Scholar
Lam, C. S. et al. Cardiac dysfunction and noncardiac dysfunction as precursors of heart failure with reduced and preserved ejection fraction in the community. Circulation 124, 24–30 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vogel, M. W., Slusser, J. P., Hodge, D. O. & Chen, H. H. The natural history of preclinical diastolic dysfunction: A population-based study. Circ. Heart Fail. 5, 144–151 (2012).
Article PubMed PubMed Central Google Scholar
Pieske, B. et al. How to diagnose heart failure with preserved ejection fraction: the HFA-PEFF diagnostic algorithm: A consensus recommendation from the heart failure association (HFA) of the European society of cardiology (ESC). Eur. Heart J. 40, 3297–3317 (2019).
Article PubMed Google Scholar
Omote, K. et al. Central haemodynamic abnormalities and outcome in patients with unexplained dyspnoea. Eur. J. Heart Fail. 25, 185–196 (2023).
Article CAS PubMed Google Scholar
Reddy, Y. N. V., El-Sabbagh, A. & Nishimura, R. A. Comparing pulmonary arterial wedge pressure and left ventricular end diastolic pressure for assessment of left-sided filling pressures. JAMA Cardiol. 3, 453–454 (2018).
Article PubMed Google Scholar
Reddy, Y. N. V., Carter, R. E., Obokata, M., Redfield, M. M. & Borlaug, B. A. A simple, evidence-based approach to help guide diagnosis of heart failure with preserved ejection fraction. Circulation 138, 861–870 (2018).
Article PubMed PubMed Central Google Scholar
Smiseth, O. A. et al. Multimodality imaging in patients with heart failure and preserved ejection fraction: an expert consensus document of the European association of cardiovascular imaging. Eur. Heart J. Cardiovasc. Imaging. 23, e34–e61 (2022).
Article PubMed Google Scholar
Nagueh, S. F. et al. Recommendations for the evaluation of left ventricular diastolic function by echocardiography: an update from the American society of echocardiography and the European association of cardiovascular imaging. J. Am. Soc. Echocardiogr. 29, 277–314 (2016).
Article PubMed Google Scholar
Humbert, M. et al. 2022 ESC/ERS guidelines for the diagnosis and treatment of pulmonary hypertension. Eur. Heart J. 43, 3618–3731 (2022).
Article CAS PubMed Google Scholar
Obokata, M., Reddy, Y. N. V. & Borlaug, B. A. Diastolic dysfunction and heart failure with preserved ejection fraction: Understanding mechanisms by using noninvasive methods. JACC Cardiovasc. Imaging. 13, 245–257 (2020).
Article PubMed Google Scholar
Lundqvist, C. B., Olsson, S. B. & Varnauskas, E. Transseptal left heart catheterization: A review of 278 studies. Clin. Cardiol. 9, 21–26 (1986).
Article Google Scholar
Connolly, D. C., Kirklin, J. W. & Wood, E. H. The relationship between pulmonary artery wedge pressure and left atrial pressure in man. Circ. Res. 2, 434–440 (1954).
Article CAS PubMed Google Scholar
Thomas, L., Marwick, T. H., Popescu, B. A., Donal, E. & Badano, L. P. Left atrial structure and function, and left ventricular diastolic dysfunction: JACC state-of-the-art review. JACC 73, 1961–1977 (2019).
Article PubMed Google Scholar
Terlizzi, V. D. et al. The atrioventricular coupling in heart failure: pathophysiological and therapeutic aspects. Rev. Cardiovasc. Med. 25, 169–181 (2024).
Article PubMed PubMed Central Google Scholar
Nishimura, R. A. & Carabello, B. A. Hemodynamics in the cardiac catheterization laboratory of the 21st century. Circulation 125, 2138–2150 (2012).
Article PubMed Google Scholar
Hemnes, A. R. et al. Features associated with discordance between pulmonary arterial wedge pressure and left ventricular end diastolic pressure in clinical practice: implications for pulmonary hypertension classification. Chest 154, 1099–1107 (2018).
Article PubMed PubMed Central Google Scholar
Heidenreich, P. A. et al. 2022 AHA/ACC/HFSA guideline for the management of heart failure: A report of the American college of Cardiology/American heart association joint committee on clinical practice guidelines. Circulation 145, e895–e1032 (2022).
PubMed Google Scholar
Bozkurt, B. et al. Universal definition and classification of heart failure: A report of the heart failure society of America, heart failure association of the European society of Cardiology, Japanese heart failure society and writing committee of the universal definition of heart failure: endorsed by the Canadian heart failure society, heart failure association of India, cardiac society of Australia and new Zealand, and Chinese heart failure association. Eur. J. Heart Fail. 23, 352–380 (2021).
Article PubMed Google Scholar
Patel, M. R. et al. 2013 ACCF/ACR/ASE/ASNC/SCCT/SCMR appropriate utilization of cardiovascular imaging in heart failure: A joint report of the American college of radiology appropriateness criteria committee and the American college of cardiology foundation appropriate use criteria task force. JACC 61, 2207–2231 (2013).
Article PubMed Google Scholar
McDonagh, T. A. et al. 2021 ESC guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur. Heart J. 42, 3599–3726 (2021).
Article CAS PubMed Google Scholar
Elliott, L. P. Pulmonary vascularity on the plain chest radiograph. Cardiol. Clin. 1, 545–564 (1983).
CAS PubMed Google Scholar
Milne, E. N. Physiological interpretation of the plain radiograph in mitral stenosis, including a review of criteria for the radiological Estimation of pulmonary arterial and venous pressures. Br. J. Radiol. 36, 902–913 (1963).
Article CAS PubMed Google Scholar
Turner, A. F., Lau, F. Y. & Jacobson, G. A method for the Estimation of pulmonary venous and arterial pressures from the routine chest roentgenogram. Am. J. Roentgenol. Radium Ther. Nucl. Med. 116, 97–106 (1972).
Article CAS PubMed Google Scholar
Kostuk, W., Barr, J. W., Simon, A. L. & Ross, J. Jr. Correlations between the chest film and hemodynamics in acute myocardial infarction. Circulation 48, 624–632 (1973).
Article CAS PubMed Google Scholar
Baumstark, A. et al. Evaluating the radiographic assessment of pulmonary venous hypertension in chronic heart disease. Am. J. Roentgenol. 142, 877–884 (1984).
Article CAS Google Scholar
Herman, P. G. et al. Limited correlation of left ventricular end-diastolic pressure with radiographic assessment of pulmonary hemodynamics. Radiology 174, 721–724 (1990).
Article CAS PubMed Google Scholar
Chakko, S. et al. Clinical, radiographic, and hemodynamic correlations in chronic congestive heart failure: conflicting results May lead to inappropriate care. Am. J. Med. 90, 353–359 (1991).
Article CAS PubMed Google Scholar
Sharma, S., Bhargava, A., Krishnakumar, R. & Rajani, M. Can pulmonary venous hypertension be graded by the chest radiograph? Clin. Radiol. 53, 899–902 (1998).
Article CAS PubMed Google Scholar
Dash, H., Lipton, M. J., Chatterjee, K. & Parmley, W. W. Estimation of pulmonary artery wedge pressure from chest radiograph in patients with chronic congestive cardiomyopathy and ischaemic cardiomyopathy. Br. Heart J. 44, 322–329 (1980).
Article CAS PubMed PubMed Central Google Scholar
Costanzo, W. E. & Fein, S. A. The role of the chest X-ray in the evaluation of chronic severe heart failure: things are not always as they appear. Clin. Cardiol. 11, 486–488 (1988).
Article CAS PubMed Google Scholar
Jögi, J., Al-Mashat, M., Rådegran, G., Bajc, M. & Arheden, H. Diagnosing and grading heart failure with tomographic perfusion lung scintigraphy: validation with right heart catheterization. ESC Heart Fail. 5, 902–910 (2018).
Article PubMed PubMed Central Google Scholar
Studler, U. et al. Accuracy of chest radiographs in the emergency diagnosis of heart failure. Eur. Radiol. 18, 1644–1652 (2008).
Article PubMed Google Scholar
Nørgaard, H., Gjørup, T., Brems-Dalgaard, E., Hartelius, H. & Brun, B. Interobserver variation in the detection of pulmonary venous hypertension in chest radiographs. Eur. J. Radiol. 11, 203–206 (1990).
Article PubMed Google Scholar
Henriksson, L., Sundin, A., Smedby, O. & Albrektsson, P. Assessment of congestive heart failure in chest radiographs. Observer performance with two common film-screen systems. Acta Radiol. 31, 469–471 (1990).
Article CAS PubMed Google Scholar
Badgett, R. G., Mulrow, C. D., Otto, P. M. & Ramírez, G. How well can the chest radiograph diagnose left ventricular dysfunction? J. Gen. Intern. Med. 11, 625–634 (1996).
Article CAS PubMed Google Scholar
Borlaug, B. A. & Redfield, M. M. Diastolic and systolic heart failure are distinct phenotypes within the heart failure spectrum. Circulation 123, 2006–2013 (2011).
Article PubMed PubMed Central Google Scholar
Fonseca, C. et al. The value of the electrocardiogram and chest X-ray for confirming or refuting a suspected diagnosis of heart failure in the community. Eur. J. Heart Fail. 6, 807–812 (2004).
Article PubMed Google Scholar
Oudiz, R. J. Pulmonary hypertension associated with left-sided heart disease. Clin. Chest Med. 28, 233–241 (2007).
Article PubMed Google Scholar
Milne, E. N. Forgotten gold in diagnosing pulmonary hypertension: the plain chest radiograph. Radiographics 32, 1085–1087 (2012).
Article PubMed Google Scholar
Vucic, E., Chakhtoura, E. Y., Sohal, S. & Waxman, S. Pathophysiological concepts of constrictive pericarditis in cardiac imaging: back to basics. Circ. Cardiovasc. Imaging. 14, e012136 (2021).
Article PubMed Google Scholar
Grodecki, P. V. & Klein, A. L. Pitfalls in the echo-Doppler assessment of diastolic dysfunction. Echocardiography 10, 213–234 (1993).
Article CAS PubMed Google Scholar
Xiao, H. B., Lee, C. H. & Gibson, D. G. Effect of left bundle branch block on diastolic function in dilated cardiomyopathy. Br. Heart J. 66, 443–447 (1991).
Article CAS PubMed PubMed Central Google Scholar
Dickinson, M. G. et al. Atrial fibrillation modifies the association between pulmonary artery wedge pressure and left ventricular end-diastolic pressure. Eur. J. Heart Fail. 19, 1483–1490 (2017).
Article PubMed Google Scholar
Milne, E. N. Correlation of physiologic findings with chest roentgenology. Radiol. Clin. North. Am. 11, 17–47 (1973).
Article CAS PubMed Google Scholar
Kennedy, S., Simon, B., Alter, H. J. & Cheung, P. Ability of physicians to diagnose congestive heart failure based on chest X-ray. J. Emerg. Med. 40, 47–52 (2011).
Article PubMed Google Scholar
Feldmann, E. J., Jain, V. R., Rakoff, S. & Haramati, L. B. Radiology residents’ on-call interpretation of chest radiographs for congestive heart failure. Acad. Radiol. 14, 1264–1270 (2007).
Article PubMed Google Scholar
Przewlocka-Kosmala, M., Butler, J., Donal, E., Ponikowski, P. & Kosmala, W. Prognostic value of the MAGGIC Score, H₂FPEF Score, and HFA-PEFF algorithm in patients with exertional dyspnea and the incremental value of exercise echocardiography. J. Am. Soc. Echocardiogr. 35, 966–975 (2022).
Article PubMed Google Scholar
van de Bovenkamp, A. A. et al. Validation of the 2016 ASE/EACVI guideline for diastolic dysfunction in patients with unexplained dyspnea and a preserved left ventricular ejection fraction. J. Am. Heart Assoc. 10, e021165 (2021).
Article PubMed PubMed Central Google Scholar
Lavista Ferres, J. M., Fishman, E. K., Rowe, S. P. & Chu, L. C. Lugo-Fagundo, E. Artificial intelligence as a public service. JACR 20, 919–921 (2023).
PubMed Google Scholar
Farina, J. M. et al. Artificial intelligence-based prediction of cardiovascular diseases from chest radiography. J. Imaging. 9, 236–247 (2023).
Article PubMed PubMed Central Google Scholar
Seah, J. C. Y., Tang, J. S. N., Kitchen, A., Gaillard, F. & Dixon, A. F. Chest radiographs in congestive heart failure: visualizing neural network learning. Radiology 290, 514–522 (2019).
Article PubMed Google Scholar
Hirata, Y. et al. Deep learning for detection of elevated pulmonary artery wedge pressure using standard chest x-ray. Can. J. Cardiol. 37, 1198–1206 (2021).
PubMed Google Scholar
Bozkurt, B. et al. HF STATS 2024: heart failure epidemiology and outcomes statistics - An updated 2024 report from the heart failure society of America. J. Cardiac Fail. 31, 66–116 (2025).
Article Google Scholar
Beladan, C. C., Botezatu, S. & Popescu, B. A. Reversible left ventricular diastolic dysfunction-Overview and clinical implications. Echocardiography 37, 1957–1966 (2020).
Article PubMed Google Scholar
Xu, B. & Klein, A. L. Utility of echocardiography in heart failure with preserved ejection fraction. J. Card Fail. 24, 397–403 (2018).
Article PubMed Google Scholar
Galiè, N. et al. 2015 ESC/ERS guidelines for the diagnosis and treatment of pulmonary hypertension: the joint task force for the diagnosis and treatment of pulmonary hypertension of the European society of cardiology (ESC) and the European respiratory society (ERS). Eur. Respir J. 46, 903–975 (2015).
Article PubMed Google Scholar
Hassoun, P. M. Pulmonary arterial hypertension. N Engl. J. Med. 385, 2361–2376 (2021).
Article CAS PubMed Google Scholar
Downey, R., White, R. D. & Krasuski, R. A. Chest (ed ) Chest radiography: What the cardiologist needs to know. Chapter 36, 403–420. In: Griffin, B.P., Rimmerman, C.M., E.J., The Cleveland Clinic Cardiology Board Review. (Philadelphia: Lippincott Williams & Wilkins), (2007).
Cardinale, L., Priola, A. M., Moretti, F. & Volpicelli, G. Effectiveness of chest radiography, lung ultrasound and thoracic computed tomography in the diagnosis of congestive heart failure. World J. Radiol. 6, 230–237 (2014).
Article PubMed PubMed Central Google Scholar
Weil, B. R., Techiryan, G., Suzuki, G., Konecny, F. & Canty, J. M. Jr. Adaptive reductions in left ventricular diastolic compliance protect the heart from stretch-induced stunning. JACC Basic. Transl Sci. 4, 527–541 (2019).
Article PubMed PubMed Central Google Scholar
Fayyaz, A. U. et al. Global pulmonary vascular remodeling in pulmonary hypertension associated with heart failure and preserved or reduced ejection fraction. Circulation 137, 1796–1810 (2018).
Article PubMed Google Scholar
Rosenberg, M. A. & Manning, W. J. Diastolic dysfunction and risk of atrial fibrillation: A mechanistic appraisal. Circulation 126, 2353–2362 (2012).
Article PubMed Google Scholar
Gong, F. F., Campbell, D. J. & Prior, D. L. Noninvasive cardiac imaging and the prediction of heart failure progression in preclinical stage A/B subjects. JACC Cardiovasc. Imaging. 10, 1504–1519 (2017).
Article PubMed Google Scholar
Xie, G. Y. & Smith, M. D. Pseudonormal or intermediate pattern? JACC 39, 1796–1798 (2002).
Article PubMed Google Scholar
Grewal, J., McCully, R. B., Kane, G. C., Lam, C. & Pellikka, P. A. Left ventricular function and exercise capacity. JAMA 301, 286–294 (2009).
Article CAS PubMed PubMed Central Google Scholar
Chen, Z. et al. Exploring explainable AI features in the vocal biomarkers of lung disease. Comput. Biol. Med. 179, 108844. https://doi.org/10.1016/j.compbiomed.2024.108844 (2024).
Article CAS PubMed Google Scholar
Seemann, F. et al. Imaging gravity-induced lung water redistribution with automated inline processing at 0.55 T cardiovascular magnetic resonance. J. Cardiovasc. Magn. Reson. 24, 35–47 (2022).
Article PubMed PubMed Central Google Scholar
James, A. E. Jr., Cooper, M., White, R. I. & Wagner, H. N. Jr. Perfusion changes on lung scans in patients with congestive heart failure. Radiology 100, 99–106 (1971).
Article PubMed Google Scholar
Motoji, Y. et al. Interdependence of right ventricular systolic function and left ventricular filling and its association with outcome for patients with pulmonary hypertension. Int. J. Cardiovasc. Imaging. 31, 691–698 (2015).
Article PubMed Google Scholar
Winkler, T. et al. Perfusion imaging heterogeneity during NO inhalation distinguishes pulmonary arterial hypertension (PAH) from healthy subjects and has potential as an imaging biomarker. Respir Res. 23, 325–340 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lim, K., McGregor, G., Coggan, A. R., Lewis, G. D. & Moe, S. M. Cardiovascular functional changes in chronic kidney disease: integrative physiology, pathophysiology and applications of cardiopulmonary exercise testing. Front. Physiol. 11, 572355, 1–14 (2020).
Article Google Scholar
Loutradis, C., Sarafidis, P. A., Papadopoulos, C. E., Papagianni, A. & Zoccali, C. The ebb and flow of echocardiographic cardiac function parameters in relationship to Hemodialysis treatment in patients with ESRD. J. Am. Soc. Nephrol. 29, 1372–1381 (2018).
Article CAS PubMed PubMed Central Google Scholar
Adeli, E. et al. Representation learning with statistical independence to mitigate bias. IEEE Winter Conf Appl Comput Vis. 2512–2522 (2021). (2021).
Zhao, Q., Adeli, E. & Pohl, K. M. Training confounder-free deep learning models for medical applications. Nat. Commun. 11, 6010, 1–9 (2020).
Article Google Scholar
DeGrave, A. J., Janizek, J. D. & Lee, S. I. AI for radiographic COVID-19 detection selects shortcuts over signal. MedRxiv Preprint Published with. https://doi.org/10.1038/s42256-021-0018-7 (2020). PMID: 32995822; PMCID: PMC7523163.
Article Google Scholar
Galvin, I., Drummond, G. B. & Nirmalan, M. Distribution of blood flow and ventilation in the lung: gravity is not the only factor. Br. J. Anaesth. 98, 420–428 (2007).
Article CAS PubMed Google Scholar
Andersen, O. S. et al. Estimating left ventricular filling pressure by echocardiography. JACC 69, 1937–1948 (2017).
Article PubMed Google Scholar
Lee, E. et al. Artificial intelligence-enabled ECG for left ventricular diastolic function and filling pressure. NPJ Digit. Med. 7 https://doi.org/10.1038/s41746-023-00993-7 (2024). PMID: 38182738; PMCID: PMC10770308.
Rao, V. M. et al. Multimodal generative AI for medical image interpretation. Nature 639, 888–896 (2025).
Article CAS PubMed ADS Google Scholar
Theodorakis, N., Nikolaou, M. & Krentz, A. cardiovascular-endocrine-metabolic medicine: proposing a new clinical sub-specialty amid the cardiometabolic pandemic. Biomolecules 15, 373–399 (2025).
Article CAS PubMed PubMed Central Google Scholar
Abbas, A. E. et al. A simple method for noninvasive Estimation of pulmonary vascular resistance. JACC 41, 1021–1027 (2003).
Article PubMed Google Scholar
Demirer, M. et al. A user interface for optimizing radiologist engagement in image data curation for artificial intelligence. Radiol. Artif. Intell. 1 (e180095), 1–7 (2019).
Google Scholar
White, R. D. et al. Pre-deployment assessment of an AI model to assist radiologists in chest X-ray detection and identification of lead-less implanted electronic devices for pre-MRI safety screening: realized implementation needs and proposed operational solutions. J. Med. Imaging (Bellingham). 9 (054504), 1–34 (2022).
Google Scholar
Restrepo, C. S. et al. The diaphragmatic Crura and retrocrural space: normal imaging appearance, variants, and pathologic conditions. Radiographics 28, 1289–1305 (2008).
Article PubMed Google Scholar
Chen, L. C., Papandreou, G., Schroff, F. & Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv. (2017). https://arxiv.org/abs/1706.05587
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. (2015). https://arxiv.org/abs/1512.03385 (2015).
Zou, K. H. et al. Statistical validation of image segmentation quality based on a Spatial overlap index. Acad. Radiol. 11, 178–189 (2004).
Article PubMed PubMed Central Google Scholar
Wilson, S. M., Bautista, A., Yen, M., Lauderdale, S. & Eriksson, D. K. Validity and reliability of four Language mapping paradigms. Neuroimage Clin. 16, 399–408 (2016).
Article PubMed PubMed Central Google Scholar
Caron, M. et al. Emerging properties in self-supervised vision Transformers. ArXiv https://doi.org/10.48550/ArXiv.2104.14294 (2021).
Article Google Scholar
Johnson, A., Pollard, T., Mark, R., Berkowitz, S. & Horng, S. MIMIC-CXR database. mimic.mit. (2021). https://mimic.mit.edu/docs/iv/modules/cxr/
Fernando Pérez-García, F. et al. RAD-DINO: Exploring scalable medical image encoders beyond text supervision. arXiv. (2024). https://doi.org/10.48550/arXiv.2401.10815
Bradshaw, T. J., Huemann, Z., Hu, J. & Rahmim, A. A guide to cross-validation for artificial intelligence in medical imaging. Radiol. Artif. Intell. 5, e220232 (2023).
Article PubMed PubMed Central Google Scholar
Koo, T. K. & Li, M. Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 15, 155–163 (2016).
Article PubMed PubMed Central Google Scholar
Kendall, M. G. A new measure of rank correlation. Biometrika 30, 81–93. https://doi.org/10.2307/2332226 (1938). JSTOR.
Article Google Scholar
Wicklin, R. Weak or strong? How to interpret a Spearman or Kendall correlation. SAS. (2023). https://blogs.sas.com/content/iml/2023/04/05/interpret-spearman-kendall-corr.html
Sackett, D. L., Straus, S., Richardson, W. S., Rosenberg, W. & Haynes, R. B. (eds) Evidence-Based Medicine. How To Practise and Teach EBM 2nd edn 67–93 (Churchill Livingstone), 2000).
Jaeschke, R., Guyatt, G., Lijmer, J. & Diagnostic tests., 121–140. In: Guyatt, G. & Rennie, D., eds. Users’ Guides to the Medical Literature. (Chicago: AMA Press), (2002).
Brodersen, K. H., Ong, C. S., Stephan, K. E. & Buhmann, J. M. The balanced accuracy and its posterior distribution. IEEE Xplore. 20th International Conference on Pattern Recognition. (2010). https://ieeexplore.ieee.org/abstract/document/5597285 (2010).
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Meyer-Baese, A. & Schmid, V. J. Pattern Recognition and Signal Analysis in Medical Imaging (Elsevier, 2014).
Erickson, B. J. & Kitamura, F. Magician’s corner: 9. Performance metrics for machine learning models. Radiol. Artif. Intell. 3 (e200126), 1–7 (2021).
Google Scholar
Kim, H. Y. Statistical notes for clinical researchers: Chi-squared test and fisher’s exact test. Restor. Dent. Endod. 42, 152–155 (2017).
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Division of Augmented Intelligence in Imaging, Mayo Clinic Florida, Jacksonville, FL, USA
Richard D. White, Mutlu Demirer, Ronnie A. Sebro & Barbaros S. Erdal
Division of Cardiothoracic Imaging, Department of Radiology, Mayo Clinic Florida, Jacksonville, FL, USA
Richard D. White, Isabel O. Cortopassi, Justin T. Stowell & Matthew R. McCann
Department of Radiology, Mayo Clinic Florida, 4456 San Pablo Road, Jacksonville, FL, 32224, USA
Richard D. White
Department of Cardiovascular Medicine, Mayo Clinic Arizona, Phoenix, AZ, USA
Timothy Barry & Christopher P. Appleton
Division of Pulmonary Medicine, Department of Medicine, Mayo Clinic Florida, Jacksonville, FL, USA
Scott A. Helgeson

Authors

Richard D. White
View author publications
Search author on:PubMed Google Scholar
Mutlu Demirer
View author publications
Search author on:PubMed Google Scholar
Ronnie A. Sebro
View author publications
Search author on:PubMed Google Scholar
Isabel O. Cortopassi
View author publications
Search author on:PubMed Google Scholar
Justin T. Stowell
View author publications
Search author on:PubMed Google Scholar
Matthew R. McCann
View author publications
Search author on:PubMed Google Scholar
Timothy Barry
View author publications
Search author on:PubMed Google Scholar
Christopher P. Appleton
View author publications
Search author on:PubMed Google Scholar
Scott A. Helgeson
View author publications
Search author on:PubMed Google Scholar
Barbaros S. Erdal
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors read and contributed to the creation of this manuscript. Individual contributions follow: RW: Project leader (e.g., idea development, study execution, results analysis, manuscript writing)/Primary image reviewerMD: Data manager/Image-review interface developer/AI model processor/Technical text for manuscriptRS: Supervisor and performer of statistical analyses/Statistical text for manuscriptIC: Independent and Ground-Truth image reviewer/Manuscript reviewerJS: Independent early-career image reviewer/Manuscript reviewerMM: Independent trainee image reviewer/Manuscript reviewerTB: Set standards for Echo/LVDD subject ID/Confirmed Echo Results per subject/Echo text for manuscriptCA: Set standards for Echo/LVDD subject ID/Confirmed Echo Results per subject/Echo text for manuscriptSH: Set standards for Right Heart Cath/PH subject ID/Confirmed Right Heart Cath results per subject/PH text for manuscript BE: Primary Radiology-Informatics-AI Lead/Data mining/AI model training supervision/Technical text for manuscript.

Corresponding author

Correspondence to Richard D. White.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download ZIP )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

White, R.D., Demirer, M., Sebro, R.A. et al. Artificial intelligence improves detection and classification of pulmonary venous hypertension related to left ventricular diastolic dysfunction by chest radiography. Sci Rep 15, 38181 (2025). https://doi.org/10.1038/s41598-025-22026-x

Download citation

Received: 19 June 2025
Accepted: 25 September 2025
Published: 31 October 2025
Version of record: 31 October 2025
DOI: https://doi.org/10.1038/s41598-025-22026-x

Subjects

Abstract

Introduction

Results

Study population

Suspected-LVDD subjects

Unsuspected-LVDD subjects

Human-based CXR PVH-staging vs. DEcho LVDD-grading

Reviewer-based assignment of PVP

Description

Intra-reviewer reliabilities

Inter-expert reviewer concordance

Final PVP assignments

Correlating PVH-staging with LVDD-grading

Suspected absence of LVDD

Increasing LVDD grade

HGT results in group 1 PH

AI-based CXR PVH-ranking vs. DEcho LVDD-grading

AI-based assignment of “PVP”

Description

AI performance compared to reviewer performance

Correlating PVH-ranking with LVDD-grading

Suspected absence of LVDD

Increasing LVDD grade

Trialing in confounded test group

Discussion

Methods

Selection of study population

Suspected-LVDD subject identification and characterization

DEcho-examination results evaluation

Unsuspected-LVDD subject identification

Healthy subjects

Group 1 PH subjects

Supporting human-based CXR PVH-staging

CXR examinations and CXR-image reviewing

PVP assigning and PVH-staging

Evaluating intra-reviewer reliabilities and inter-expert reviewer concordance

Determining final HGT PVP assignments

Correlating human-based CXR PVH-staging with DEcho LVDD-grading

Supporting AI-based CXR PVH-ranking

Creation of AI model for CXR assessment

Technical infrastructure

Data curation/annotation

PVP identifier model components

“Thoracic-content segmentator” component

“PVP multi-classifier” component

PVP-“state” assigning and PVH-ranking

Correlating AI-based CXR PVH-ranking with DEcho LVDD-grading

Re-evaluation of unconfounded study poplulation

Trialing in confounded test group

Evaluations, comparisons, and statistical analyses

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1 (download ZIP )

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links