Introduction

Dysfunction of the left ventricle during the typically longer diastolic filling phase of the cardiac cycle (from impaired muscle relaxation and/or chamber distention) is a pathophysiological manifestation of multiple conditions1,2. Left Ventricular Diastolic Dysfunction [LVDD] is ubiquitous in Heart Failure [HF] regardless of the LV Ejection Fraction [LVEF] during the systolic contraction phase1,2,3, and it plays a greater role than systolic dysfunction in determining baseline clinical capabilities and exercise-intolerance1,4.

From both functional and clinical standpoints, isolated-LVDD can precede HF. Its severity ranges from: (1) Preclinical asymptomatic, but at-risk (Stage A5) or mildly dysfunctional (Stage B5, found in 20–30% of general population6,7,8,9,10, to (2) Symptomatic-LVDD (Stage C5, especially exercise-related6,11, to (3) LVDD-predominate HF (Stage D5, with morbidity and mortality resembling systolic HF6,12). The potential for progression along this pathophysiological and prognostic spectrum is well-recognized6,7,8,10,11,13,14,15.

Preclinical LVDD is associated with aging16 (especially in individuals 50 years old and/or of female sex7,14,17,18. These initially subclinical changes (affecting almost 40% of elderly population19 and the prospects for symptom development (e.g., dyspnea on exertion)6,7 are amplified in persons with modifiable predisposing factors (e.g., hypertension7, obesity2, diabetes7, renal failure20.

At the other extreme, LVDD is an important component of HF with preserved LVEF 50% [HFpEF]12, an entity accounting for more than half of newly diagnosed HF cases. HFpEF diagnosis relies on evidence of: (1) HF-related signs and symptoms from high mean Left Atrial Pressure [mLAP] with “pulmonary congestion”6,21; (2) high LV End-Diastolic Pressure [LVEDP] (at rest or only during exercise6,22 from invasive Right Heart Catheterization [RHCath]12,23 or non-invasive transthoracic Doppler Echocardiography [DEcho]24,25,26; and/or (3) elevated serum B-type Natriuretic Peptide [BNP] levels12,21. Due to LVDD, the following may result: 1. LA enlargement with dysfunction; 2. Pulmonary Venous Hypertension [PVH], potentially achieving isolated post-capillary Pulmonary Hypertension [PH], due to LA-PV continuity; and 3. Combined pre-/post-capillary PH [CpcPH], causing vascular remodeling and eventually failure of the Right Ventricle [RV]27,28.

RHCath-based increased mean Pulmonary Capillary Wedge Pressure [mPCWP] (a surrogate for directly measured high LVEDP or mLAP1,4,29,30 indicates the cumulative hemodynamic burden of LVDD on both LA operating compliance12,31 and atrioventricular coupling23,32 causing mLAP elevation. Thus, while LVDD-related increases in LVEDP primarily reflect high LV preload (input volume or pressure) and reduced LV diastolic operating compliance (myocardial relaxation and inherent viscoelastic stretching)1,4,23,33, increased mLAP and mPCWP signify its net hemodynamic impact directed retrograde into the pulmonary circulation, including the tendency for PVH4,24,34,35. DEcho now dominates LVDD evaluations, with various parameters correlating closely with elevated mLAP predisposing to PVH26; although a multi-parameter DEcho examination delineates LVDD severity, no single parameter reliably indicates elevated LVEDP or mLAP levels per individual24,26.

Standard screening algorithms for the initial recognition of HF-related LVDD have relied on clinical history, physical findings, electrocardiographic changes, and BNP levels, but to a much lesser extent Chest X-Ray [CXR] results21,35. Nevertheless, despite very limited objective supportive evidence, a CXR has been consistently included in initial diagnostic approaches to suspected-HF based on expert consensus, with appropriateness justified by perceived value in: (1) Identifying other causes (e.g., emphysema) of HF-like symptoms; and (2) Recognizing undefined “pulmonary congestion”21,35,36,37,38.

CXR “pulmonary congestion” has been evaluated using systematic PVH-Staging (based on active superior redistribution or central distension of pulmonary vasculature, progressing to superimposed interstitial followed by alveolar pulmonary edema)39,40,41. PVH-Staging has been validated against catheter-determined increased mPCWP (or infrequently LVEDP) within a range of pathophysiological conditions (varying LV-preload, LA-operating, and/or LV-operating conditions or with confounding factors) with inconsistent corroborative results and without definitively establishing a direct relationship with LVDD40,42,43,44,45,46,47,48,49. Due to considerable CXR-interpretation variability, especially for characteristics of mild PVH50,51,52,53, consensus readings have been proposed43,44,53.

Thus, CXR-based PVH-Staging has yet to: (1) Be confirmed as directly indicating LVDD, the universal trait of HF1,2,3, especially in isolated-LVDD where ventricular size is relatively normal54 and unlikely a HF marker55; or (2) Address the appearance of PVH with secondary PH27,56,57. In addition, reported PVH-Staging results have not accounted for co-existent confounding causes of LVDD (e.g., conduction delay)55,58,59,60,61, on one hand, or lung conditions (e.g., fibrosis) altering pulmonary vasculature62, on the other. Last, PVH-Staging has relied on subjective CXR evaluations leading to intra-/inter-reader variability without delineation of expectations for interpreter qualifications influencing diagnostic success43,63,64.

Hence, the potential for CXR-based PVH-Staging to make a significant contribution to current diagnostic algorithms35,36 or scoring systems21,24,65 for the initial exclusion, versus detection with scaling, of LVDD has remained unanswered; ongoing limitations in screening approaches36,66 indicate persistent enhancement opportunities, possibly using PVH-Staging. It is particularly opportune, considering that a CXR remains: (1) The most widely accessible and frequently performed imaging examination worldwide; and (2) Inexpensive; these two attributes combined could facilitate early and widespread community-based application of PVH-Staging in suspected-HF, and possibly HF at-risk, cases. Also, despite regional differences in Cardiothoracic Radiology expertise, digital CXR formats and Artificial Intelligence [AI] processing now support more uniform and less human-dependent PVH-Staging67,68. While CXR-dependent AI models may help predict an elevated BNP level in CXR-identified congestive HF69 or help detect an elevated mPCWP value70, AI prediction of LVDD absence, versus its presence and/or severity independent of co-existent pathophysiological confounders55,58,59,60,61, has not been described.

Therefore, considering the worldwide epidemic of pre-HF and HF cases8,9,71, especially those emphasizing LVDD6,7,8,10,11,13,14,15 that, if recognized early (along with associated risk factors), might have its progression prevented or reversed2,28,72, optimized timely and widespread LVDD recognition is paramount. To that end, this study was designed to elucidate the potential for CXR-based PVH-Staging to identify levels of isolated-LVDD. Prevailing questions included: (1) When PVH-Staging is performed only by cardiothoracic radiologists, what intra-/inter-reader reliabilities remain and do they relate to reader experience?; (2) Does expert PVH-Staging confirm LVDD absence or, when detected, correctly track LVDD severity?; and (3) Can AI-assisted PVH predictions of LVDD degree at least match human performance, while eliminating human interpretation-reliability issues?

Results

Study population

The Study Population consisted of 1,682 subjects with or without suspected LVDD [Table 1] in the absence of possible anatomic or physiological imaging confounders or possible confounders of AI-model training (described in detail as exclusion criteria under Methods).

Table 1 Study population.

Suspected-LVDD subjects

The 846 subjects with characteristic symptoms (e.g., dyspnea on exertion) had undergone within the same 24-hour period: (1) a DEcho examination for LVDD which confirmed preserved LVEF 50%; and (2) a CXR examination [Table 1]. DEcho LVDD-Grading had categorized each subject as one of the following25,26,73:

  • Grade 0 (aka Normal Filling): N = 250.

  • Grade 1 (aka Delayed Relaxation): N = 253.

  • Grade 2 (aka Pseudo-Normal Filling): N = 252.

  • Grade 3–4 (aka Restrictive Filling, either Grade 3/Reversible or Grade 4/Fixed): N = 91.

Unsuspected-LVDD subjects

A group of 750 asymptomatic and largely healthy subjects [Healthy] (without DEcho examinations), all free of LVDD predisposing factors (e.g., renal failure2,7,20 was added [Table 1]. In addition, 86 subjects with RHC-confirmed pre-capillary PH [Group 1 PH]74,75 were also included.

Human-based CXR PVH-staging vs. DEcho LVDD-grading

Reviewer-based assignment of PVP

Description

The digital CXR examination of each study subject was independently reviewed twice (separated in time) by four cardiothoracic-trained radiologists (varying in experience level) for the assignment of one of the following 11 possible Pulmonary Vascular Patterns [PVPs]:

  • 1: Normal.

  • 2–5: PVH-Stages 1, 2-Early, 2-Late, or 3 alone39,53,62,76.

  • 6–9: PVH-Stages (above) + CpcPH56,57,76.

  • 10: Group 1 PH74,75.

  • 11: Uncertain.

Human Ground Truth [HGT] PVP assignments (by consensus between the two most-experienced Expert Reviewers: Reviewer 3 and Reviewer 4) of PVH Stage were used in determining the relationship between PVH-Staging and LVDD-Grading.

Intra-reviewer reliabilities

Intra-Reviewer variability in assigning PVPs by Reviewers 1–4 ranged from 14.0 to 19.4%. Nevertheless, there was significant (p < 0.001) overall intra-Reviewer reliability by good agreement (Intra-Class Correlation [ICC] 0.78–0.81) with PVP-order ignored, and by strong agreement (Reviewers 1–3: Kendall rank correlation coefficient [Ktau] 0.55–0.67) or very strong agreement (Reviewer 4: Ktau 0.74) with PVP-order considered [Table 2].

Table 2 Intra-reviewer reliability and inter-expert reviewer concordance in PVP assignment.

For intra-Reviewer reliability related exclusively to a pulmonary vascular abnormality [Table 2], significant (p < 0.001) values, increasing with Reviewer experience, applied to Group 1 PH detection (versus Normal) and its differentiation from PVH Stage 1; respectively, the two and three most-experienced Reviewers demonstrated progressing good-excellent consistency (ICC 0.77–0.95). While significant (p < 0.001), poor-moderate agreement (ICC 0.29–0.56) applied to PVH Stage 1 detection (versus Normal) across Reviewers 1–4.

For intra-Reviewer reliability related to pulmonary edema, a positive experience-to-consistency trend for edema detection (versus Normal, PVH Stage 1, or Group 1 PH) was noted [Table 2]. While significant (p < 0.001), the agreements were moderate (ICC 0.55–0.72) across Reviewers 1–4.

Regarding CpcPH detection, only the most-experienced Reviewer achieved significant (p < 0.001) moderate consistency (ICC 0.64) in the setting of PVH Stage 2 [Table 2].

Inter-expert reviewer concordance

While 18.3% of final PVP assignments by the two Expert Reviewers (i.e., Reviewer 3 and Reviewer 4) were discordant, there was significant (p < 0.001) overall agreement between them at good (ICC 0.76) to strong (Ktau 0.62) levels [Table 2].

Regarding inter-Expert Reviewer concordance related exclusively to a pulmonary vascular abnormality in the absence of edema, significant (p < 0.001) agreement at good (approaching excellent) levels applied to the detection of Group 1 PH (vs. Normal) (ICC 0.88) and its differentiation from PVH Stage 1 (ICC 0.86). Though, poor agreement (ICC 0.09) applied to the detection of PVH Stage 1 (vs. Normal).

However, regarding inter-Expert Reviewer concordance related to pulmonary edema, a significant (p < 0.001) agreement at a good level (ICC 0.78) was demonstrated for rating edema, once detected.

Final PVP assignments

Reviewers 1–4 finally assigned PVPs to be abnormal in 14.6–17.9% of Study Population CXR examinations, while 40.5% (682/1,682) of subjects had pathophysiology confirmed by DEcho or RHCath [Table 3].

Table 3 Final PVP assignments.

By consensus HGT assignment [Table 3], the prevalence of abnormal PVPs was 15.5%. Remarkably, HGT assignments of PVH Stage 1 (3.2%) were less common than those individually by Reviewers 1–4 (3.4–9.2%). On the other hand, HGT assignments of PVH Stage 2-Early without superimposed CpcPH (4.9%), were more common than individually by both Expert Reviewers (3: 2.0%, 4: 4.8%), while less common than by the other two Reviewers (1: 7.6%, 2: 5.1%). Similarly, HGT assignments of Group 1 PH (3.2%) were more common than those individually by Reviewers 1–4 (1.0–2.9.0.9%).

Correlating PVH-staging with LVDD-grading

Suspected absence of LVDD

When LVDD was not suspected (by lack of symptoms and predisposing factors) in Healthy subjects, HGT-assigned PVPs included: Normal 98.0%, PVH 1.7%, and Group 1 PH 0.3% [Fig. 1]. The designation of PVH applied to 13 subjects aged 46–83 (median 51) years old.

Fig. 1
Fig. 1
Full size image

HGT-Assigned CXR-Based PVH Stage vs. DEcho-Based LVDD Grade (or Confirmed Group. For LVDD severity ranging from no suspected LVDD in Healthy subjects through DEcho LVDD Grade 0 to Grade 3–4, the prevalence of an HGT-assigned PVH PVP [dash-outlined box], at Stage 1 [light pink], Stage 2 [medium pink], and/or Stage 3 [red] alone or with superimposed CpcPH [purple], significantly increased while the assignment of a Normal PVP [green] correspondingly declined. For previously confirmed Group 1 PH, the correct PVP assignment [blue] predominated. CXR: Chest X-Ray, CpcPH: Combined pre-/post-capillary Pulmonary Hypertension, DEcho: Doppler Echocardiography, HGT: Human Ground Truth, LVDD: Left Ventricular Diastolic Dysfunction, mmHg: millimeters of mercury, N: Number of subjects, PCWP: Pulmonary Capillary Wedge Pressure, PH: Pulmonary Hypertension, PVH: Pulmonary Venous Hypertension, PVP: Pulmonary Vasculature Pattern.

Increasing LVDD grade

With progression from DEcho LVDD Grade 0 to Grade 3–4, a significant (p < 0.001) positive trend towards increasing HGT-assigned PVH Stage 1 to PVH Stage 3 PVPs was found in symptomatic subjects with suspected LVDD [Fig. 1].

With confirmed absence of LVDD based on Grade 0/Normal Filling, HGT-assigned PVPs indicated Normal in 94.0% and PVH in 5.6% [Fig. 1]. Despite presumed normal mLAP levels, a significantly (p < 0.050) higher PVH prevalence was found in symptomatic subjects with Grade 0 than in unsuspected-LVDD Healthy subjects. Despite this apparent PVH background, the assessment of likelihood of excluding DEcho Grade 1 to Grade 3–4 produced an HGT -LR value of 0.18 (Reviewers 1–4: 0.14–0.38), approximating a strong level of confidence (i.e., < 0.10) in favor of PVH-Staging to rule-out LVDD at rest, even in the presence of characteristic symptoms [Table 4].

Table 4 Likelihoods of identifying DEcho LVDD grade per reviewer or HGT.

Although expected to be highly prevalent in Grade 1/Delayed Relaxation due to presumed absence of any mLAP elevation, HGT-assigned PVPs indicated Normal in 87.3% of affected subjects [Fig. 1]; varying PVH PVPs, especially PVH Stage 2 in 9.5%, were assigned in the remaining 12.7%. This relatively higher PVH prevalence in Grade 1, compared to Healthy and/or Grade 0 conditions, was statistically significant (p < 0.050). However, neither -LR nor + LR values provided strong support to PVH-Staging to rule-out or rule-in Grade 1, respectively [Table 4].

With further progression of LVDD to Grade 2/Pseudo-Normal Filling with its anticipated mildly elevated mLAP levels, the prevalence of a HGT-assigned Normal PVP continued to significantly (p < 0.050) decrease to 64.7% with a corresponding additional significant increase in prevalence of PVH PVPs to 34.5%. This rise in PVH prevalence included increasing representation of PVH Stage 2 in 26.6%, with superimposed CpcPH accounting for 4.0%. Again, neither -LR nor + LR values provided strong evidence for PVH-Staging in ruling-out or ruling-in Grade 2, respectively [Table 4].

Last, the progression to Grade 3–4/Restrictive Filling, with its anticipated moderately elevated mLAP levels, was associated with another significant (p < 0.050) increase in PVH prevalence in 55.0% of affected subjects. Once more, the rise in PVH was characterized by increasing representation of PVH Stage 2 in 39.6% (superimposed CpcPH accounting for 15.4%) and PVH Stage 3 in 6.6% (including 2.2% with CpcPH). The assessment of likelihood of detecting the presence of Grade 3–4 produced an HGT + LR value of 55.5 (high individual Reviewer values of 10.1–64.6), indicating strong evidence (i.e., > 10.0) in favor of PVH-Staging to rule-in Grade 3–4 at rest [Table 4].

HGT results in group 1 PH

Correct HGT PVP assignments of Group 1 PH were made in 57.0% (mPAP 26–77/mean 49 mmHg), versus assignments of Normal in 31.4% (mPAP 27–70/mean 45 mmHg), of subjects with previously RHC-confirmed Group 1 PH [Fig. 1]. In the remaining 11.6% of subjects, assignments of PVH Stage 2 were made (despite mPCWP 6–15/mean 10 mmHg), 4.6% represented by Stage 2 alone (mPAP 28–77/mean 60 mmHg) and 7.0% characterized by superimposed CpcPH (mPAP 46–82/mean 59 mmHg). Thus, a HGT assignment of a form of pre-capillary PH was made in 64% of Group 1 PH subjects. The high + LR values by both HGT (160.7) and individual-Reviewers (162.1–423.0) indicated very strong confidence in identifying Group 1 PH by CXR; based on -LR values (0.43–0.80), its exclusion confidence was found to be equivocal [Table 4].

AI-based CXR PVH-ranking vs. DEcho LVDD-grading

AI-based assignment of “PVP”

Description

Our cascading two-component “PVP Identifier” [PVPI] AI model consisted of the: (1) Thoracic-Content Segmentator (for automatic thoracic-cavity segmentation); and (2) PVP Multi-Classifier to differentiate between image-data characteristics based on a four-class physiological pattern, as follows:

  • Normal.

  • PVH Stage 1 (Vascular redistribution without edema +/- CpcPH).

  • PVH “Stage 2+” (Vascular redistribution/congestion with predominantly interstitial edema +/- CpcPH).

  • Group 1 PH.

The PVPI was used in processing the same 1,682 CXR examinations to predict probabilities for these four classes. PVPI-derived CXR-based PVH-rank predictions were correlated with LVDD-Grading.

AI performance compared to reviewer performance

PVPI activity maps [Fig. 2] demonstrated an increasingly redistributed balance of activity (green) from lung bases to upper-lung regions with worsening HGT-assigned PVH Stage; for each example, the highest PVPI PVH-Ranking prediction (PVH Predict) corresponded well with HGT PVH-Staging assignment. However, with both the pulmonary vasculature and cardiac silhouette reflecting relative inactivity (blue shades), changing physiological “States” rather than PVPs were apparently recognized by the PVPI.

Fig. 2
Fig. 2
Full size image

PVPI activity maps.

For PVH detection, the PVPI achieved an overall accuracy of 0.89 and balanced accuracy of 0.86 by a 3-class output (i.e., Normal versus PVH Stages 1–2 + versus Group 1 PH), with near-equally strong normalized class accuracies for both Normal at 0.91 and PVH at 0.90 [Table 5].

Table 5 Performance of PVPI relative to HGT assignments of PVP.

On the other hand, for basic PVH-Ranking, the PVPI by a 4-class output (i.e., Normal versus PVH Stage 1 versus PVH Stage 2 + versus Group 1 PH) reached a lower overall accuracy of 0.79, and balanced accuracy of 0.72, with normalized class accuracies still strongest for Normal at 0.91, followed by PVH Stage 2 + at 0.84 [Table 5]. The intermediary PVH Stage 1 was predicted with a low normalized accuracy of 0.35, near-equivalent to PVH Stage 2 + misclassification at 0.38.

Correlating PVH-ranking with LVDD-grading

As with HGT PVH-Staging assignments, a significant (p < 0.001) very positive trend towards increasing PVH-Ranking by the PVPI with progression from Grade 0 to Grade 3–4 was found in symptomatic subjects [Fig. 3]; compared to HGT PVH-Staging the PVPI-related trend was significantly (p < 0.001) stronger both overall as well as at each LVDD Grade.

Fig. 3
Fig. 3
Full size image

PVPI CXR-Based PVH Rank vs. DEcho-Based LVDD Grade (or Confirmed Group 1 PH). For LVDD severity, ranging from no suspected LVDD in Healthy subjects through DEcho LVDD Grade 0 to Grade 3–4, the prevalence of an PVPI ranking of PVH level [dash-outlined box] as Stage 1 [light pink] or Stage 2 [medium pink] (but to a lesser degree incorrectly as Group 1 PH [blue]), quickly increased while the ranking as Normal [green] correspondingly declined rapidly. In the setting of previously confirmed Group 1 PH, the correct assignment predominated. CXR: Chest X-Ray, DEcho: Doppler Echocardiography, LVDD: Left Ventricular Diastolic Dysfunction, PH: Pulmonary Hypertension, PVH: Pulmonary Venous Hypertension, PVPI: Pulmonary Vasculature Pattern Identifier.

Suspected absence of LVDD

In the absence of suspected LVDD (i.e., in Healthy), the PVPI predicted Normal in 91.6% of subjects [Fig. 3] (vs. HGT 98.0%). Although still representing a very small proportion of this group, 4.8% (vs. HGT: 1.7%) had PVPI-recognized PVH.

Increasing LVDD grade

In contrast, with Grade 0/Normal Filling, the PVPI predicted Normal in only 30.4% (vs. HGT 98.2%) but PVH in 47.6% (including Stage 1: 29.6%, Stage 2+: 18%) [Fig. 3]. This relatively higher PVH prediction in suspected LVDD but Grade 0, compared to unsuspected LVDD (i.e., Healthy), was statistically significant (p < 0.001).

With Grade 1/Delayed Relaxation, PVPI prediction of PVH predominated at 70.7% overall (including Stage 2+: 39.1%, Stage 1: 31.6%), while Normal was predicted in 10.7% [Fig. 3]. Again, PVH prediction by the PVPI has significantly (p < 0.001) higher than that for Grade 0.

Advances to Grade 2/Pseudo-Normal Filling and then Grade 3–4/Restrictive Filling, were tracked by rapid and significant (p < 0.001) additional stepwise increases in PVH predictions at 79.4% and 83.5% overall (including Stage 2+: 55.2% and 68.1%), respectively [Fig. 3].

Trialing in confounded test group

To initally gauge PVPI robustness, it was trialed in a Confounded Test Group of 40 previously excluded suspected-LVDD subjects, each demonstrating on CXR an example of implanted cardiovascular material, while otherwise meeting the inclusion criteria of Study Population subjects.

By Expert Reviewer adjudication, initial auto-segmentations of Frontal-view CXRs produced by the Thoracic-Content Segmentator were deemed adequate in 38 of 40 subjects, resulting in an overall Dice coefficient approaching 1.00 (i.e., 0.9995). For the remaining two auto-segmentations (truncated left hemithorax apex at clavicle containing orthopedic plate-screw device; blunted left costophrenic angle by moderate-sized pleural effusion) in which minor manual modifications were thought to be potentially needed, the Dice coefficients were initially 0.99.

Evaluation of the subsequent 3-class and 4-class output performances by the PVP Multi-Classifier (using the unmodified auto-segmentation set) indicated a negative impact with overall accuracies declining to 0.73 and 0.58, and balanced accuracies to 0.68 and 0.58, respectively [Table 6]. The decrease was primarily attributable to recognition of a Normal “State” at 0.40 and PVH Stage 2+ “State” at 0.70; no association with the represented cardiovascular material types was identified.

Table 6 Performance of PVP multi-classifier in confounded test group.

Despite evidence of anatomical-confounder impairment of physiological-“State” differentiation, the PVP Multi-Classifier predictions of LVDD severity remained strong, with CXR PVH-Ranking of DEcho LVDD-Grading demonstrating a significantly positive (p < 0.001) relationship [Fig. 4]. Above Grade 1/Delayed Relaxation, only PVH “States” were detected, including a purely PVH Stage 2+ “State” for Grade 3–4/Restrictive Filling.

Fig. 4
Fig. 4
Full size image

PVPI CXR-Based PVH Rank vs. DEcho-Based LVDD Grade (or Confirmed Group 1 PH) in Confounded Test Group. For LVDD severity in the Confounded Test Group, ranging from DEcho LVDD Grade 0 to Grade 3–4, the prevalence of an PVPI ranking of PVH level [dash-outlined box] as Stage 1 [light pink] or Stage 2 [medium pink] (but to a lesser degree incorrectly as Group 1 PH [blue]), quickly increased while the ranking as Normal [green] correspondingly declined rapidly. In the setting of previously confirmed Group 1 PH, the correct assignment predominated. CXR: Chest X-Ray, DEcho: Doppler Echocardiography, LVDD: Left Ventricular Diastolic Dysfunction, PH: Pulmonary Hypertension, PVH: Pulmonary Venous Hypertension, PVPI: Pulmonary Vasculature Pattern Identifier.

Discussion

This work provides overdue insights into the implications of CXR “pulmonary congestion” in suspected HF. We believe it is the first to describe the: (1) Potential for systematic CXR-based PVH-Staging to make evidence-based contributions to current diagnostic algorithms35,36 or scoring systems21,24,66 in the initial exclusion versus detection with scaling of LVDD in asymptomatic pre-HF6,7,8,9,10, symptomatic pre/early HF6,11, or HFpEF6,12 conditions; (2) Relationship between PVH-Staging (presumably indicating mLAP reflecting both LA-operating compliance12,31 and atrioventricular coupling22,32 and LVDD-Grading (indicating LVEDP reflecting both LV preload and LV-operating compliance1,4,23; (3) Interpretation reliability of PVH-Staging by cardiothoracic radiologists with differing experience; and (4) Performance of completely AI-assisted CXR-based PVH-Ranking in predicting DEcho-determined LVDD Grade.

Despite the fact that, unlike prior studies39,40,41,42,43,44,45,46,47,48,49, we restricted CXR interpretations to cardiothoracic radiologists, intra-/inter-interpreter variabilities were demonstrated during Reviewer assignments of one of 11 possible PVPs, including 9 PVH-related. Nevertheless, there was significant overall intra-Reviewer reliability. In addition, for Group 1 PH detection and its differentiation from PVH Stage 1, as well for pulmonary edema detection, at least moderate intra-Reviewer reliability was achieved, with enhancement by Reviewer experience, as previously recognized by others63. In addition, comparable inter-Expert Reviewer concordance for rating edema (i.e., early versus late interstitial) was also shown.

Like related studies50,51,52,53, we found intra-Reviewer and inter-Expert Reviewer identification of pulmonary vascular redistribution without edema (i.e., PVH Stage 1) to be unreliable; only the most-experienced Reviewer achieved moderate consistency. Thus, the application of PVH-Staging, especially in early phases preceding identifiable edema, would be limited by such dependency on experience; prior proposals for consensus interpretations43,44,51 constitute impractical solutions.

Difficulties recognizing a PVH Stage 1 PVP characterized by redistribution (with PCWP or mLAP 13–17 mmHg), representing a narrow LV-preload transition between a Normal PVP (with PCWP or mLAP 4–12 mmHg) and “pulmonary congestion” plus interstitial edema (with PCWP or mLAP 18–24 mmHg)39,53,62,76, have been previously reported43,45,47,51,52,55,77. Considering the diverse prior experience, the absence of a PVH Stage 1 PVP might only help exclude increased LV preload for HF pretest probability < 9%, whereas it might help confirm HF with pretest probability > 91%53; the prospects for PVH Stage 1 “State” recognition in isolated-LVDD with AI assistance are uniquely addressed in this work.

Based on HGT-assigned PVH-related PVPs, significant correlation between CXR PVH-Staging and LVDD-Grading in the setting of preserved LVEF was found.

Remarkably, low levels of PVH were recognized even in asymptomatic unsuspected-LVDD (i.e., Healthy) subject. With their median age of 51 years old, detected PVH in half could have represented age-related (i.e., 50-year-old) LVDD7,17,18 but no DEcho data was available to test this conjecture.

Nevertheless, significantly increased PVH prevalence in symptomatic DEcho-confirmed Grade 0/Normal Filling suggested influences by fluctuating hemodynamics from episodic LVDD or atrial fibrillation6,18,22,78. While false-positive PVH recognition in Healthy and Grade 0 conditions is a possible explanation, generally low Reviewer sensitivities to PVH, especially to PVH Stage 1 (3.3% overall, 8.6% Stage 2) was observed in this study.

Despite presumably normal mLAP levels39,53,62,76, significantly higher PVH assignments in Grade 1/Delayed Relaxation, implicating hemodynamic lability with intermittent mLAP elevations or LVDD exacerbations6,18,22,78 and more prolonged or relatively fixed pulmonary vascular manifestations (e.g., distention)79. Thus, combined DEcho Grades 0–1 and CXR-indicated PVH might serve as a marker for unstable and/or worsening mild or early LVDD10,78,80,81.

As expected from mild mLAP elevation, Grade 2/Pseudo-Normal Filling was associated with significantly higher PVH prevalence, reflecting background PVH accentuated by chronically fluctuating hemodynamics22,78,80. Consequently, PVH on CXR might serve as an adjunct LVDD indicator for Grade 2 at rest, thereby obviating total dependency on stressing for its differentiation from Grade 082,83. Last, Grade 3–4/Restrictive Filling, with expected moderate mLAP elevation, demonstrated another significant increase in PVH prevalence involving slightly more than half of affected subjects.

However, while LR-based assessment of confidence in excluding DEcho-confirmed LVDD was approximated at Grade 0, adequate confidence in LVDD recognition was not achieved until Grade 3–4. Therefore, despite significant direct correlation between PVH-Staging and LVDD-Grading, human-based PVH assignments did not definitely predict incremental increases in LVDD severity. Accordingly, we saw justification for investigating the potential of AI to support CXR-based PVH evaluation in facilitating LVDD assessment or complementing DEcho-based LVDD examinations.

Our cascading two-component PVPI model includes the PVP Multi-Classifier which was created to freely identity distinguishing anatomical and/or functional cardiopulmonary differences, hence our reference to a physiological “State”, rather than a PVP, of PVH. Considering that accompanying HGT-assigned increasing PVH PVPs, there were progressing upper-lung redistributions of AI-model inference activity (without correspondence to cardiac or PVP silhouettes), PVP Multi-Classifier-identified parameter(s) likely reflected: (1) Subtle or complex variations in radiodensity not humanly detectable84; or (2) physiological indicators of lung blood flow and/or water content85,86.

Nonetheless, compared to HGT-assigned PVH, the PVPI demonstrated significantly greater: (1) Sensitivity to PVH-related changes, starting in symptomatic proven-absent LVDD (Grade 0); and (2) Correlation between increases in ranked CXR PVH-“State” (especially PVH Stage 2+) and LVDD-Grading. Both should facilitate contributions by CXR-based PVH evaluation to initial LVDD recognition in suspected-HF individuals or those at-risk of HF, as well as help realize opportunities for complementing DEcho LVDD examinations, while eliminating issues in reliability of human-based CXR interpretation.

In Group 1 PH, PVPI performance exceeded that by HGT-assigned PVPs. Low-level PVH prevalences detected by both human-based and AI-based CXR evaluations are at least partly explained by: (1) An inclusion allowance of RHCath-confirmed mPCWP 13–14 mmHg, shared with PVH Stage 139,53,62,74,75,76; (2) Interdependence of RV systolic function and LV filling87; and (3) The prolonged RHCath-CXR periods, allowing interval PVH development. However, unlike HGT assignments, the PVPI exhibited a tendency towards Group 1 PH detection in symptomatic subjects, especially those with Grade 0. While, with few exceptions, affected subjects received HGT-assigned Normal PVP, this PVPI predilection may have reflected recognition of Group 1 PH with normal baseline spatial lung-perfusion heterogeneity, differentiable after vasodilatation88.

Limitations in this study are recognized. First, the relatively small numbers of subjects representing PVH Stages and LVDD Grades limited human-/AI-based evaluations, resulting from the purposeful exclusion of over 14,000 potential subjects to avoid possible: (1) anatomical/physiological confounders of DEcho-LVDD or CXR-PVP examinations55,58,59,60,61,62,89,90; or (2) AI model-training confounders91,92,93. Nevertheless, our stringent inclusion restrictions ensured the: (1) Needed assessment of a relationship between PVH-Staging and LVDD-Grading, and (2) Development of an AI model which focused only on that relationship. Consequently, our Subject Population uniquely reflected suspected HF with isolated-LVDD in the absence of systolic dysfunction. Thus, while a complete understanding of the robustness of our PVPI for AI-assisted objective CXR “pulmonary congestion” assessment awaits real-world deployment allowing for confounding factors, we performed initial trialing in the Confounded Test Group and showed that PVPI prediction remained strong.

In addition, semi-erect positioning during CXR examination was allowed since: (1) Basilar preferential lung flow is maintained even when supine94; and (2) Prior reports demonstrated no significant negative impact on PVH assessment with supine positioning50.

Last, the DEcho reports reviewed from the greater than 20-year span did not uniformly indicate use of all now-possible LVDD-Grading methodology25,26,73. However, the fundamental components95 were considered adequate for the needed basic LVDD-Grading.

In conclusion, work reveals that: (1) There is, in fact, a previously unverified significant direct relationship between CXR PVH-Staging and LVDD-Grading; (2) CXR PVH-Staging remains, however, clinically limited by experience-dependent intra-/inter-interpreter variabilities; (3) LVDD exclusion by PVH-Staging is adequate but its inclusion is not confidently reached until Restrictive Filling is present, thereby inhibiting its impact on recognition of milder or earlier phases of HF; although (4) Our AI-assisted CXR PVH-Ranking appears to exceed the sensitivity of human performance which may facilitate earlier and widespread pathophysiological assessments in persons with suspected HF or at-risk of HF, potentially further enhanced when blended with other proposed screening indicators (e.g., AI-enabled ECG96 in a multi-modal generative AI model97 advancing the diagnostic needs of cardiometabolic care98.

Methods

All methods were performed in accordance with the relevant guidelines and regulations.

Selection of study population

With approval by the Mayo Clinic Institutional Review Board, enterprise-wide data-mining of consecutive patients (spanning: 02/22/2003–08/15/2023) was completed using its shared Electronic Medical Record [EMR] system (Epic Systems, Verona, WI) to retrospectively identify potential candidates for the Study Population. Due to the retrospective nature of the study, the Mayo Clinic Institutional Review Board waived the need for obtaining informed consent from patients.

Screened personal EMRs reflected the highly multi-racial/cultural population of individuals drawn (locally, regionally, nationally, or internationally) to our integrated healthcare enterprise which consists of a multi-state network with three geographically dispersed quaternary-referral medical centers (located in Upper-Midwestern, Southeastern, and Southwestern regions of the United States), as well as > 70 Midwestern satellite hospitals or ambulatory clinics.

Suspected-LVDD subject identification and characterization

By electronic EMR searching, potential study subjects with suspected isolated-LVDD were first identified by having undergone within the same 24-hour period: (1) a DEcho examination confirming preserved LVEF 50% during a comprehensive LVDD evaluation24,25,73 for characteristic symptoms (e.g., dyspnea on exertion)2,6; and (2) a CXR examination. Preliminary filtering also included early exclusion of individuals with concurrent Chronic Kidney Disease at either Stage-5 or hemodialysis-dependent Stage-4 levels to avoid potentially confounding influences of cardiopulmonary volume-overloading and/or rapid dialysis-related fluctuations in LV-preload volume89,90. From this initial filtered search, 15,880 potential subjects remained.

Next, the individual EMR of each of these potential subjects was manually screened (Author 1) for the presence of possible anatomical or physiological confounders of either the DEcho evaluation for isolated-LVDD or the CXR assessment of PVPs [Table 7]55,58,59,60,61,62,89,90; those potential subjects affected by any of such possible confounders were excluded from further consideration. Last, the CXR images of still-eligible potential subjects were visually reviewed (Author 1) for: (1) Supine positioning (reducing gravitational contribution to normal pulmonary blood distribution)94; or (2) Possible AI model-training confounders [Table 8] leading to spurious associations between characteristic cardiovascular materials and a disease type or severity during model training91,92,93; the presence of either type of condition caused those affected to also be excluded. Ultimately, this extensive filtering resulted in marked reductions in the number of eligible subjects to 846 study subjects [Table 1], negatively impacting the representation of those with more-severe LVDD Grades, in which exclusion criteria (e.g., atrial fibrillation, indwelling infusion catheters) were more often met.

Table 7 Exclusion criteria - possible imaging anatomical or physiological confounders.
Table 8 Exclusion criteria - possible AI model-training confounders.

DEcho-examination results evaluation

All comprehensive DEcho-based LVDD evaluations represented were selected from examinations performed enterprise-wide by its Intersocietal Accreditated Commission (Ellicott City, MD)-accredited Echocardiology Laboratories to support clinical standard-of-care over the greater than 20-year span. During this period, varying generations of different DEcho system manufacturers and/or versions were used.

LVDD-Grading based on the final report of each DEcho examination ( 1 day from CXR), was confirmed or corrected (Authors 1,7,8) for each of the remaining 846 study subjects with suspected LVDD in the absence of systolic dysfunction. Using well-established criteria for Grading of LVDD, including standard E/A-ratio parameters24,25,73, each subject was categorized as one of the following [Table 9]:

Table 9 Representative criteria for DEcho grading of LVDD.
  • Grade 0 (aka Normal Filling).

  • Grade 1 (aka Delayed Relaxation).

  • Grade 2 (aka Pseudo-Normal Filling).

  • Grade 3–4 (aka Restrictive Filling): Grade 3 (Reversible E/A ratio with patient Valsalva) and Grade 4 (Fixed E/A ratio despite patient Valsalva) were combined due to independently small subject numbers.

In addition, on an individual subject basis, DEcho determination (often corroborated by RHCath) of the severity of resulting PVH confirmed the concurrence of one of the following [Table 10]26,74,99: (1) No PVH; (2) Insignificant PVH; Significant PVH (i.e., isolated post-capillary PH); or 4. CpcPH.

Table 10 DEcho and RHCath measures of PVH severity.

Unsuspected-LVDD subject identification

After the exclusion of initally considered subjects based on the presence of LVDD predisposing clinical factors (e.g., history of myocardial infarction), or for possible anatomical or physiological confounders of imaging (DEcho or CXR) [Table 7]55,58,59,60,61,62,89,90 or possible confounders of AI model-training [Table 8]91,92,93, the following groups of subjects without suspected LVDD were added.

Healthy subjects

To support the possibility that a CXR-based PVP depicting baseline normal LV diastolic function was differentiatable from a PVP associated symptomatic suspected-LVDD but with DEcho Grade 0, a group of 750 asymptomatic and relatively Healthy subjects (no DEcho performed) was added (Author 1) [Table 1].

Group 1 PH subjects

In addition, to help guarantee that PVPs reflecting CpcPH were distinguishable from the Group 1 PH74,75 (the other form of cardiopulmonary-derived pre-capillary PH), a group of 86 subjects with previously RHCath-confirmed (frequently corroborated by interval DEcho) Group 1 PH was also included (Authors 1,9) [Table 1]; in these patients, simultaneous PCWP measurements proved the absence of isolated post-capillary PH26,74,99.

Supporting human-based CXR PVH-staging

CXR examinations and CXR-image reviewing

All CXR examinations of the 1,682 subjects in the Study Population [Table 1] originally promoted clinical standard-of-care by direct-digital or computed radiography using varying generations of nine manufacturers of fixed and/or portable CXR systems operating enterprise-wide over the greater than 20-year span. Each of digital CXR examination consisted of an upright or semi-upright Frontal view (postero-anterior or antero-posterior), 91% (1,524/1,682) were accompanied by a Lateral view.

The total of 3,206 CXR images were downloaded from the enterprise deconstructed Picture Archiving and Communication System, consisting of the: (1) Radiology Information System (Radiant from Epic Systems, Verona, WI); (2) Vendor Neutral Archive system (Synapse from TeraMedica/Fujifilm Medical Systems USA, Inc., Wauwatosa, WI); and (3) Viewer (Visage Imaging from Pro Medicus Ltd, Richmond, Australia) to a secure shared-drive. The shared-drive supported locally developed: (1) Graphical User Interface [GUI]100 which allowed modifications of the underlying commercial software (MeVisLab from MeVis Medical Solutions AG, Bremen, Germany) for either bulk CXR image-reviewing or image-segmentation prior to AI-model training; and (2) Zero-footprint viewer (“CAII Viewer”)101 functioning on a backend database manager for CXR-image reviews to establish individual-Reviewer or consensus-HGT PVP assignments by the two Expert Reviewers, on one hand, or AI-model inference display for adjudication, on the other.

PVP assigning and PVH-staging

Initially, the de-identified CXR examinations were presented in random order for independent assessments (while blinded to all information regarding corresponding clinical status or DEcho findings) by four cardiothoracic-trained radiologists (Authors 1,4–6) to establish individual-Reviewer PVP assignments, and eventually consensus-HGT PVP assignments by the two most-experienced “Expert Reviewers”. CXR Reviewers 1–4 had different subspecialty experiences related to: (1) Training in CXR-based PVH-Staging during Radiology residency and/or Cardiothoracic Radiology fellowship; as well as (2) Years of post-training Cardiothoracic Radiology practice. These Reviewers included:

  • Reviewer 1 (Fellowship-level).

    • Fellowship: 0.5-year.

    • PVH-Staging training: Moderate residency/strong fellowship.

    • Practice years: None.

  • Reviewer 2 (Early-Career):

    • Fellowship: 1-year.

    • PVH-Staging training: Mild residency/mild fellowship.

    • Practice years: 4.

  • Reviewer 3/HGT Expert Reviewer (Mid-Career).

    • Fellowship:1-year.

    • PVH-Staging training: Moderate residency/strong fellowship.

    • Practice years: 14.

  • Reviewer 4/HGT Expert Reviewer (Late-Career).

    • Fellowship: 2-year.

    • PVH-Staging training: Strong residency/strong fellowship.

    • Practice years: 37.

While blinded to associated clinical and DEcho information, each Reviewer independently assessed the CXR examinations to assign each with one of the following 11 possible PVPs, including:

  • 1: Normal.

  • 2–5: PVH-Stages 1, 2-Early, 2-Late, or 3 alone39,53,62,76 [Table 11] [Fig. 5].

Table 11 CXR PVH-staging characteristics.
Fig. 5
Fig. 5
Full size image

Frontal Chest X-Ray images from 5 Study Population subjects represent the pulmonary vasculature patterns of Normal physiology and increasing Pulmonary Venous Hypertension [PVH]. They include [Table 11]: Normal: Mid-to-lower lung vascular predominance (A). PVH Stage 1: Vascular redistribution (aka “cephalization”) to upper lungs without pulmonary edema. (B)PVH Stage 2-Early: Vascular redistribution/congestion with mild edema (perihilar peribronchial cuffing/haziness) (C). PVH Stage 2-Late: Central vascular congestion with moderate edema (perihilar haziness plus peripheral interlobular septal thickening (i.e., Kerley B Lines)) (D). PVH Stage 3: Central vascular congestion with severe edema (perihilar and lower-lung alveolar opacification (aka “batwing” pattern)) (E).

  • 6–9: PVH-Stages (above) + CpcPH (with superimposed main & central pulmonary artery dilatation)56,57,76 [Fig. 6].

Fig. 6
Fig. 6
Full size image

Frontal Chest X-Ray images from 2 Study Population subjects represent the PVPs of both forms of cardiopulmonary-derived Pre-Capillary PH, including: CpcPH: Main and central pulmonary artery dilatation superimposed on PVH (Stage 2-Late in this case) (A). Group 1 PH: Main and central pulmonary artery dilatation without PVH, lung disease, etc. (B). CpcPH : Combined pre-/post-capillary Pulmonary hypertension PH: Pulmonary Hypertension, PVH: Pulmonary Venous Hypertension, PVP: Pulmonary Vasculature Pattern.

  • 10: Group 1 PH (Main & central pulmonary artery dilatation without PVH, lung disease, etc.)74,75 [Fig. 6].

  • 11: Uncertain.

Evaluating intra-reviewer reliabilities and inter-expert reviewer concordance

After a four-week “washout” period (with interval re-randomization of examinations), independent CXR assessments were repeated by Reviewers 1–4; this supported the determination of intra-Reviewer reliability per subspecialty experience (training and practice). During a subsequent ( 2 weeks later) adjudication by each Reviewer of any personal-assignment inconsistencies between assessments (previous assignments provided for consideration, without restriction to their re-use), Reviewers 1–4 individually committed to their final PVP assignments, thereby facilitating the determination of iner-Reviewer reliability.

Determining final HGT PVP assignments

Last, to achieve HGT assignments of PVPs for the 1,682 CXR examinations, consensus between the Expert Reviewers was reached per examination by: (1) Initial concordance between final assignments; or (2) Subsequent final concordance during “face-to-face” review of discordant final assignments.

Correlating human-based CXR PVH-staging with DEcho LVDD-grading

HGT-based PVH-severity determinations by PVP assignments were used in determining the relationship between CXR-based PVH-Staging and DEcho-based LVDD-Grading.

Supporting AI-based CXR PVH-ranking

Creation of AI model for CXR assessment

Technical infrastructure

AI-model development utilized secure on-site and remote Graphics Processing Unit [GPU] (Nvidia, Santa Clara, CA)-dependent systems. Data curations and initial model development relied on: (1) One dual-GPU workstation (2 RTX 8000) with 96 GB video memory/128 GB system memory/12 TBs disk storage/2 TB SSD drive for operating-system support (Windows 10); and (2) Two single-GPU workstations containing (RTX 8000) with 48 GB video memory/128 GB system memory/18 TBs disk storage/2 TB SSD drive for operating-system support (Windows 10). For base-model training, a node (4xA100 80GB GPUs) from a 32-GPU high-performance cluster was utilized. For subsequent image-classification tasks, involving training, validation, and testing, a DGX A100 System (8xA100 40GB GPUs) was used.

Data curation/annotation

Initially hypothesizing that increasing DEcho LVDD-Grade would be tracked closely by increasing PVH Stage on CXR, relatively equal-sized numbers of symptomatic subjects with Grades 0–4 were anticipated during formation of the Study Population. Accordingly, relatively equivalent numbers of study subjects with DEcho Grades 0–2 were compiled, expecting corresponding incremental increases in milder PVH Stages. However, despite Grade 3 (Reversible) and Grade 4 (Fixed) subjects being combined, the cumulative number of study subjects representing Restrictive Filling remained relatively small, attributable due to greater frequency of confounding conditions causing subject exclusion.

While maintaining clinical significance, subject bundling across PVPs was eventually needed for optimization of multi-classifier creation. This bundling was justified by the following:

  • Unexpected imbalances in representations of the original 11 possible PVPs, including the segregation of those affected by CpcPH; nevertheless, the prevalence of CpcPH interpretations was recorded.

  • Very few representations of PVH Stage 3 (n = 7), insufficient for creation of a separate class for an already readily identifiable PVP (i.e., alveolar edema largely obscuring pulmonary vasculature); while included in multi-classifier training and validation, these subjects were omitted from model testing.

PVP identifier model components

Our cascading “PVP Identifier” [PVPI] incorporated the automatic consecutive application of the following two components (Authors 2,10).

“Thoracic-content segmentator” component

To avoid spurious associations related to extra-pulmonary characteristics (e.g., patient identification labels)91,92,93, the 3,206 (1,682 Frontal + 1,524 Lateral) CXR images were manually segmented by an Expert Reviewer (Author 1) using a modification of the GUI100. These segmentations followed the inner surface of the ribs and diaphragm to delineate the thoracic-cavity contents (e.g., lungs) from the apex to the posterolateral costophrenic sulci, using the L1-L2 disc space as a reference whenever they were obscured, such as by a pleural effusion102.

The resulting segmentations of the thoracic contents were used to create by supervised learning the Thoracic-Content Segmentator for automatic thoracic segmentation as the first PVPI component. The Thoracic-Content Segmentator was developed based on the DeepLabV3 semantic segmentation architecture103, with ResNet-50 as a backbone104. The final Dice coefficients105,106 achieved by the Thoracic-Content Segmentator were 0.98 for Frontal views and 0.97 for Lateral views [Fig. 7].

Fig. 7
Fig. 7
Full size image

Automatic thoracic-content segmentation on the CXR Frontal view (Left) and the Lateral view (Right). The original expert ground-truth segmentations (yellow) and the model inference segmentations produced by the Thoracic-Content Segmentator (red) corresponded very closely.

“PVP multi-classifier” component

The PVP Multi-Classifier, the second PVPI component, was developed using the DINO transformer architecture to: (1) Subdivide CXR-image data into patches of pixels; (2) Flatten patches into a sequence of visual vector tokens by linear transformation; and (3) Pass the token sequence into a transformer107. During the model creation, a base model was initially developed using the publicly available MIMIC-CXR dataset comprised of 337,100 images108, where a training loss of 1.7 was achieved with 200 epochs. The base model weights were then utilized to train an image-classification task107,109 for final PVP Multi-Classifier creation using Thoracic-Content Segmentator output from the 3,206 CXR images.

For creation of the PVP Multi-Classifier, a 5-fold stratified cross-validation approach (i.e., K-fold folded test set)110 was employed to rotate through training (60%), validation (20%), and testing (20%) subsets of input data after balancing of sample size per class via undersampling or oversampling. While supervised from the standpoint of PVP categorization, the PVP Multi-Classifier was developed in a relatively unsupervised fashion, free of any ground-truth expert delineation of the pulmonary vasculature itself. Despite the execution of multiple iterations based on use of non-bisected vs. bisected Frontal-view data and/or Lateral-view data, the non-bisected complete/full-resolution Frontal auto-segmentations alone provided the highest-yield input into the transformer.

PVP-“state” assigning and PVH-ranking

The PVP Multi-Classifier was designed to differentiate between image-data characteristics based on a more basic physiological classification, as follows:

  • Normal.

  • PVH Stage 1 (Vascular redistribution without edema +/- CpcPH).

  • PVH “Stage 2+” (Vascular redistribution/congestion with predominantly interstitial edema +/- CpcPH).

  • Group 1 PH.

The PVPI was ultimately used to process the Frontal views from same 1,682 CXR examinations to predict probabilities for these classes with either a 3-class output for PVH detection or a 4-class output for basic PVH-Ranking.

Correlating AI-based CXR PVH-ranking with DEcho LVDD-grading

Re-evaluation of unconfounded study poplulation

As with the HGT PVP assignments, PVPI-derived predictions of PVP “State” were used in determining the relationship between CXR-based PVH-Ranking and DEcho-based LVDD-Grading.

Trialing in confounded test group

The PVPI was purposely designed without representation of implanted cardiovascular materials commonly found in the setting of HF (e.g., surgical items) [Table 8]. Therefore, to preliminarily gauge PVPI robustness, it was trialed in a Confounded Test Group of previously excluded subjects, each demonstrating on CXR an example of implanted cardiovascular material while otherwise meeting the same Study Population inclusion criteria.

Based on HGT analysis of CXR examinations, the 40 subjects (23 males and 17 females) in the Confounded Test Group included 10 examples each of Normal, PVH Stage 1, PVH Stage 2+, and Group 1 PH PVPs. Simultaneously, 30 examples of DEcho Grade 1 (N = 13), Grade 2 (N = 15), or Grade 3–4 (N = 2) LVDD, and 10 examples of RHCath-confirmed Group 1 PH, were represented.

For the assessment of overall PVPI performance in the Confounded Test Group, the following component evaluations were completed:

  • Thoracic-Content Segmentator: The proportion of satisfactory Frontal-view auto-segmentations without versus with perceived-needed expert modification were determined. Overall Dice coefficients for all 40 segmentations, as well as for perceived deficient segmentations, were calculated.

  • PVP Multi-Classifier: The accuracy of correct (i.e., highest probability) physiological assignment relative to HGT-assigned PVP was established. PVP Multi-Classifier-generated PVH predictions were again correlated with LVDD-Grading.

Evaluations, comparisons, and statistical analyses

No large language models were used in any portion of this research, including the preparation of this report. Statistical analyses were supervised internally (Author 3).

Using the ICC statistic with PVP-order ignored111, both intra-Reviewer reliability (between first and second individual assignments) and inter-Expert Reviewer concordance (prior to HGT consensus) in the assignment of the original 11 possible PVPs by Reviewers 1–4 were evaluated. These included calculations of both overall agreement and agreement per PVP-characteristic; strength of agreement was indicated by the following ICC values: < 0.50 (Poor), 0.50–0.74 (Moderate), 0.75–0.89 (Good), or 0.90–1.00.90.00 (Excellent)111.

The same evaluations of overall agreement were also performed with PVP-order considered using the Ktau112. The Ktau expresses the following values for strength of agreement: 0.26–0.48 (Moderate), 0.49–0.70 (Strong) or 0.71–1.00.71.00 (Very Strong)113.

HGT assignments were used in evaluating the relationship between PVH-Staging and LVDD-Grading; for this purpose, Ktau methodology was again applied. In addition, to assess the impact of Reviewer-to-Reviewer PVH-Staging on this relationship, a Positive Likelihood Ratio [+ LR] and a Negative Predictive Value [-LR] were calculated for Reviewers 1–4, as well as for HGT; an LR is the ratio of the probability of the specific test result in cases vs. the probability in cases without disease114, where LRs above 10 and below 0.10 are considered to provide strong evidence to rule in or rule out diagnoses, respectively115.

The overall performance of the PVP Multi-Classifier was expressed as both unbalanced and balanced average accuracies of the model116,117 with an output of 3 classes (i.e., Normal vs. combined PVH Stages 1 & 2 + vs. Group 1 PH) to emphasize PVH detection, as well as an output of 4 classes (i.e., Normal vs. PVH Stage 1 vs. PVH Stage 2 + vs. Group 1 PH) to concentrate on PVH-severity ranking. For both outputs, normalized accuracies117,118 were calculated to construct confusion matrices119.

PVP Multi-Classifier-derived PVH prediction (highest probability classification) and DEcho Grades were correlated using Ktau methodology. A comparison of this relationship with the corresponding relationship based on HGT assignments was then made.

A Chi-squared association test was used to confirm statistical significance [p < 0.050] of correspondence between described measures of CXR-based PVH and DEcho-based LVDD120. On the other hand, a Fisher’s exact test was used to gauge the statistical significance of differences between the proportions of categories in HGT-derived vs. PVP Multi-Classifier-derived measures of CXR-based PVH and DEcho-based LVDD120.