Introduction

Pulmonary hypertension (PH) is a life-threatening disease characterized by progressive remodeling of the pulmonary vasculature, leading to right-sided heart failure and ultimately, death1. It affects approximately 1% of the global population and up to 10% of individuals aged >65 years, with a poor prognosis and low 5-year survival rate, underscoring the critical need for early detection2. However, PH diagnosis remains challenging due to its complex etiologies and insidious early symptoms, leading to diagnostic delays of 2.5 to nearly 4 years from symptom onset to confirmed diagnosis3,4.

Conventionally, the diagnosis of PH requires right heart catheterization (RHC) to measure the mean pulmonary artery pressure (mPAP) with a diagnostic threshold of greater than 20 mmHg1,5. While RHC is the gold standard, it is invasive, complex, and expensive, making it unsuitable for large-scale screening, especially in primary healthcare setting. In China, for instance, only 6.9% of hospitals are equipped to perform RHC, further limiting accessibility to definitive diagnosis6. Transthoracic echocardiography (TTE) is the primary noninvasive screening tool, providing estimations of pulmonary artery systolic pressure (PASP)7. However, TTE has limited sensitivity (0.88) and specificity (0.56), leading to potential false negatives and false positives8, which may delay diagnosis or prompt unnecessary RHCs. Its accuracy is further compromised by factors such as the absence of tricuspid regurgitation jet velocity, Doppler beam misalignment, severe tricuspid regurgitation, pulmonic stenosis, high cardiac output states, and arrhythmias9,10,11.

To overcome these challenges, a non-invasive and highly accurate screening method is urgently needed. Efforts have been made to develop such methods. For example, Schreiber et al.12 set 25 mmHg as the mPAP threshold and proposed a model utilizing oximetry and pulmonary function tests, which showed a sensitivity of 0.90 and specificity of 0.29 for PH. The DETECT algorithm, integrating clinical, pulmonary function, electrocardiogram (ECG), and TTE data, improved PH detection in systemic sclerosis patients, recommending RHC in 62% of cases and missing only 4% of PH diagnoses13.

Current approaches rely on single-modality data, which fail to integrate various types of data such as imaging and textual information. The accuracy of these approaches is often insufficient to meet clinical requirements and they lack extensive validation across diverse populations. Research has shown that combining multiple sources of clinical information can enhance diagnostic accuracy. Recent advancements in artificial intelligence (AI) have demonstrated significant potential in medical applications, particularly in the integration of multimodal data14,15,16,17,18. These AI technologies offer promising new approaches for improving PH detection and addressing its associated challenges.

In this study, we propose a multimodal fusion model for PH screening (MMF-PH) that incorporates the elements of non-invasiveness, high accuracy, and multimodality. By leveraging and integrating textual, tabular, and imaging data (Fig. 1, Supplementary Fig. 1), this model has the potential to be a more comprehensive detection tool for PH. In this study, we retrospectively analyzed 2,451 patients who underwent RHC. Training, validation, and testing of the model included comparison with a prospective dataset of 477 patients and an external dataset. In addition, the performance of the conventional TTE screening method was compared to evaluate its robustness across different subgroups and multiple medical centers. We hypothesized that the multimodal deep leaning approach could outperform these methods, particularly TTE, with respect to the identification of PH.

Fig. 1: Schematic illustration of the MMF-PH algorithmic framework for PH detection.
figure 1

The figure outlines the MMF-PH framework, tailored for detecting PH from data collected across multiple healthcare centers. The top tier (a) shows the data collection phase, with datasets compiled from three centers treating patients with suspected PH who underwent RHC, the gold standard for PH evaluation. These datasets comprise tabular data (demographics, clinical features, medical histories, laboratory results), textual data (notes from CXRs, electrocardiography, and echocardiography), and imaging data from CXRs. The middle tier (b) delineates the data preprocessing and feature extraction phase. Here, tabular data is transformed by a transformer encoder, textual data by BERT, and imaging data by ViT, each extracting pertinent features to create embeddings—a standardized format for subsequent analysis. The final tier (c) presents the fusion and model construction phase. Embeddings from various sources are merged using a self-attention mechanism, central to the multimodal fusion process of the MMF-PH. This comprehensive integration leads to the formulation of the MMF-PH. AP anteroposterior, BERT Bidirectional Encoder Representations from Transformers, CXR chest X-ray, MMF-PH multimodal fusion model for pulmonary hypertension screening, MLP multilayer perceptron, RHC right heart catheterization, ViT Vision Transformer.

Results

Study population

In this multicenter study, the test datasets comprised 1004 participants, distributed across retrospective (n = 491), prospective (n = 477), and external (n = 36) test datasets. The clinical characteristics of the study population are summarized in Table 1. Males constituted 27.1–41.7% of the total sample, and the mean age was 43–52 years. Smoking and alcohol use rates were lowest in the prospective dataset at 11.9% and 8.8%, respectively. In the external dataset, a higher proportion of individuals had hypertension (27.8%), while the average N-terminal pro B-type natriuretic peptide (NT-proBNP) level was lower.

Table 1 Clinical characteristics of all datasets

Model performance

Figure 2 summarizes the performance of MMF-PH across different test datasets. As shown in Fig. 2a, b, the area under the receiver operating characteristic curve (AUROC) values were 0.965 (95% confidence interval [CI]: 0.940–0.983) in the retrospective test dataset, 0.939 (95% CI: 0.894–0.971) in the prospective test dataset, and 0.814 (95% CI: 0.669–0.947) in the external dataset. The area under the precision-recall curve (AUPRC) values were 0.997 (95% CI: 0.995–0.999) in the retrospective test dataset, 0.991 (95% CI: 0.981–0.997) in the prospective test dataset, and 0.867 (95% CI: 0.716–0.960) in the external dataset (Table 2).

Fig. 2: Performance of the MMF-PH and TTE across different datasets.
figure 2

This figure presents the a AUROC curves and b AUPRC curves with 95% CIs in the retrospective, prospective, and external test datasets; and c, d the NPV and specificity of the MMF-PH and TTE in the three datasets, respectively. AUROC area under the receiver operating characteristic curve, AUPRC area under the precision-recall curve, CI confidence interval, MMF-PH multimodal fusion model for pulmonary hypertension screening, NPV negative predictive value, PH pulmonary hypertension, TTE transthoracic echocardiography.

Table 2 Performance of the MMF-PH

Comparison with TTE

For the comparison with TTE, we adjusted the model threshold to align its sensitivity with that of TTE and subsequently compared the specificity and negative predictive value (NPV). Across all three test datasets, MMF-PH demonstrated higher specificity and NPV compared to TTE, with the most substantial improvement observed in the prospective dataset (Fig. 2c, d). In the prospective test dataset, MMF-PH achieved a specificity of 0.848 compared to 0.652 for TTE, with an absolute improvement of 19.6% (95% CI: 2.3–34.1%). Similarly, MMF-PH exhibited a higher NPV of 0.557 vs. 0.492 for TTE, with an NPV improvement of 6.5% (95% CI: 0.6–11.9%). Detailed comparisons for all test datasets are provided in Supplementary Table 1. To ensure a fair comparison, we also aligned specificity between MMF-PH and TTE and assessed other metrics. In the prospective dataset, MMF-PH exhibited a 3.2 percentage point higher sensitivity (96.0% vs. 92.8%) and a 14.6 percentage point increase in NPV (63.8% vs. 49.2%) (details in Supplementary Table 2).

Additionally, to quantify overall diagnostic accuracy, MMF-PH demonstrated an odds ratio (OR) of 2.971 (95% CI: 1.134–8.444, p < 0.001) in the prospective dataset, indicating a significantly higher likelihood of correct classification compared to TTE. The ORs for the retrospective and external datasets were 1.615 (95% CI: 0.686–12.221, p < 0.001) and 1.630 (95% CI: 0.374–16.197, p < 0.001), respectively (Supplementary Table 3).

Ablation study

To verify the necessity of each module in the proposed MMF-PH (Denoising Autoencoder-Bidirectional Encoder Representations from Transformers-Vision Transformer [DAE-BERT-ViT]), we conducted an ablation study using an external test dataset, as shown in Fig. 3. We compared the performance of the MMF-PH with those of three baselines: DNN-BERT-ViT, DAE-LSTM-ViT, and DAE-BERT-EfficientNet. Each baseline replaced a specific component of the MMF-PH: DNN replaced DAE, LSTM replaced BERT, and EfficientNet replaced ViT.

Fig. 3: Ablation study of the MMF-PH, replacing individual components.
figure 3

This figure presents the results of the ablation study on the MMF-PH by replacing individual components in external test datasets. The performance metrics shown include the AUROC and the AUPRC with 95% CIs. AUPRC area under the precision-recall curve, AUROC area under the receiver operating characteristic curve, CI confidence interval, MMF-PH multimodal fusion model for pulmonary hypertension screening.

The results clearly show that the proposed MMF-PH achieved a larger AUROC and AUPRC than those of the three baselines. Specifically, the MMF-PH achieved an AUROC of 0.814 and an AUPRC of 0.867, significantly outperforming DNN-BERT-ViT (AUROC: 0.690, AUPRC: 0.746), DAE-LSTM-ViT (AUROC: 0.793, AUPRC: 0.842), and DAE-BERT- EfficientNet (AUROC: 0.777, AUPRC: 0.837). These results underscore the critical contribution of each module to the MMF-PH. Notably, the DAE module played a particularly important role, as reflected by the poor performance of the DNN-BERT-ViT model when the DAE was replaced by the DNN. The superior performance of the MMF-PH highlights the unique contributions of the DAE, BERT, and ViT modules to the overall model performance. Additional performance metrics and results for the other datasets are provided in Supplementary Table 4, which confirms the substantial advantages of the proposed MMF-PH.

Examples of model interpretability

MMF-PH demonstrated lower misdiagnosis rates than TTE across all test datasets, with fewer false positives and false negatives (Supplementary Table 5). We illustrate how MMF-PH processes and integrates multimodal data to arrive at its predictions on two selected patients (Fig. 4).

Fig. 4: Examples of model interpretability.
figure 4

a A patient misdiagnosed by ultrasound (TTE positive, RHC negative). b A patient in whom ultrasound resulted in misdiagnosis (TTE negative, RHC positive). CXR chest X-ray, ECG electrocardiography, MMF-PH multimodal fusion model for pulmonary hypertension screening, NT-proBNP N-terminal pro B-type natriuretic peptide, PA pulmonary artery, RHC right heart catheterization, RA right atrium, RV right ventricle, TTE transthoracic echocardiography.

The first case involved a patient misdiagnosed by ultrasound (i.e., TTE positive, RHC negative). The TTE report indicated mild tricuspid regurgitation and an elevated estimated PASP. However, the MMF-PH predicted a negative result for PH. Figure 4a illustrates the data aggregation granularity, highlighting the contribution of different data modalities to the model predictions. In this case, tabular data made the highest contribution (52%), suggesting that structural information contained the most relevant cues for PH prediction. In specific, the patient’s NT-proBNP level was 110.9 pg/mL. TTE revealed no signs of right-sided heart enlargement or failure. Chest X-ray (CXR) revealed a cardiothoracic ratio of 0.45 and no indications of pulmonary artery prominence, central pulmonary artery dilation, or right ventricular enlargement. These structured features, which are highly indicative of non-PH, provided critical information for the model. The final RHC test confirmed a normal mean PAP. This case demonstrates the ability of the MMF-PH to integrate multiple data sources to provide a more accurate diagnosis and effectively address the limitations of TTE in certain clinical scenarios.

The second case (Fig. 4b) involved a patient in whom ultrasonography failed to diagnose PH, presenting as TTE-negative but RHC-positive. TTE did not detect elevated PASP, but the MMF-PH predicted a high probability of PH. The pie chart in Fig. 4b illustrates the contribution of different data modalities to the model’s prediction. In addition to the significant contributions of textual data, structured and imaging data contributed 19% and 11%, respectively. Regarding textual data, the model focused on descriptions of the patient’s symptoms such as chest tightness, palpitations, and dyspnea. In the TTE report, the model focused on terms such as right heart enlargement and right ventricular wall thickening, which are key clinical indicators of PH. CXR revealed significant right heart enlargement, central pulmonary artery dilation, and pulmonary artery prominence. The structured data, including ECG results, corroborated these findings, providing clear indications of PH. These findings demonstrate the model’s ability to integrate multiple modalities of patient information, mirroring the reasoning process of clinical experts in actual diagnostic scenarios, thus offering a more accurate conclusion than TTE alone.

Subgroup analysis and comparisons with TTE

The performance of the MMF-PH was evaluated for three binary partitions by sex (female, male), age (<65, ≥65 years), and BMI (<25, ≥25 kg/m2), as well as different PH subtypes, to assess its robustness and reliability (Fig. 5). Regarding demographic subgroups (Fig. 5a–c), the performance of the MMF-PH was compared with that of TTE across three categories: sex, age, and body mass index (BMI) classification. The MMF-PH showed a higher specificity across most subgroups when the sensitivities were aligned. For instance, the model maintained robust performance with a specificity improvement of 5–35% compared to TTE in women and patients aged <65 years. In the disease-specific subgroups (Fig. 5d–f), the sensitivity of the MMF-PH was compared to that of TTE across five categories: pre-capillary PH, post-capillary PH, pulmonary arterial hypertension (PAH), PAH associated with congenital heart disease (CHD-PAH), and atypical PAH (i.e., PAH with cardiopulmonary comorbidities). The MMF-PH demonstrated superior sensitivity in detecting PH. For example, in the retrospective test dataset, the model achieved a sensitivity of 97.2% for pre-capillary PH and 94.3% for post-capillary PH, outperforming TTE in both groups. This trend continued in the prospective and external test datasets, with the MMF-PH exhibiting high sensitivity across most subgroups and clinical scenarios, including PAH, CHD-PAH, and atypical PAH. In the prospective dataset, MMF-PH demonstrated a stronger improvement in PH detection in females (OR = 4.304, 95% CI: 1.476–21.177, p < 0.001) and younger patients (<65 years, OR = 2.799, 95% CI: 1.000–9.344, p < 0.001) compared to their respective counterparts (Supplementary Table 3). The detailed performance metrics are provided in Supplementary Tables 6 and 7.

Fig. 5: Subgroup analysis of the MMF-PH and TTE across different datasets.
figure 5

ac Specificity on age, BMI, and sex in retrospective, prospective, and external test datasets. df Sensitivity on pre-capillary PH, post-capillary PH, PAH, CHD-PAH, and PAH with cardiopulmonary comorbidities (CPC-PAH) in retrospective, prospective, and external test datasets. BMI body mass index, CHD-PAH pulmonary arterial hypertension associated with congenital heart disease, CPC-PAH pulmonary arterial hypertension with cardiopulmonary comorbidities, MMF-PH multimodal fusion model for pulmonary hypertension screening, PAH pulmonary arterial hypertension, PH pulmonary hypertension, TTE transthoracic echocardiography.

Prognostic value for long-term outcomes

MMF-PH identifies patients at higher risk for adverse outcomes. In the prospective dataset, rehospitalization or mortality occurred in 40.6% of MMF-PH-positive patients vs. 12.1% of MMF-PH-negative patients at 1 year, increasing to 53.9% vs. 15.5% at 2 years (Supplementary Table 8). Kaplan–Meier analysis confirmed significantly higher risk in MMF-PH-positive patients (hazard ratio [HR] = 3.99 [95% CI: 1.87–8.5] at 1 year; HR = 4.54 [95% CI: 2.33–8.84] at 2 years; both log-rank P < 0.001, Supplementary Fig. 2).

Discussion

The present study highlights the promising accuracy of MMF-PH, a multimodal deep-learning algorithm, in detecting PH by integrating CXR, ECG, TTE, and other electronic health record (EHR) data. MMF-PH outperformed TTE alone, highlighting the advantage of utilizing multimodal data for PH screening. The integration of diverse data sources improves diagnostic accuracy, making MMF-PH a potentially valuable tool for clinical decision-making.

PH remains a diagnostic challenge due to its complex etiology, insidious onset, and high morbidity and mortality. Conventional screening methods, including ECG, CXR, and TTE, have limitations in sensitivity and specificity, contributing to frequent misdiagnosis and delays in diagnosis. Given the invasive nature of RHC, an artificial intelligence (AI)-assisted screening approach provides a promising alternative for early, noninvasive PH detection. Recent studies19,20 have applied machine learning to detect PH, with Kogan et al. 19 and Schuler et al. 20 using XGBoost and Random Forest, achieving AUROCs of 0.92 and 0.94, respectively. In contrast, MMF-PH leverages a transformer-based architecture integrating multimodal data through a DAE for tabular data, BERT for textual data, and ViT for imaging data. BERT enhances text comprehension via bidirectional sequence processing, while ViT captures global image relationships through self-attention, outperforming prior NLP and imaging models21,22,23,24. Ablation studies (Fig. 3, Supplementary Table 4) confirmed the necessity of each component in PH identification. Furthermore, the self-attention fusion mechanism effectively integrates multimodal features, providing comprehensive interpretability. By visualizing the contributions of different modalities, MMF-PH enhances clinical trust and decision-making.

The MMF-PH demonstrated improved performance over TTE in detecting PH. One key advantage is its ability to reduce false positives when sensitivity is aligned. Traditional TTE often produces higher false positives, leading to unnecessary RHC, which is invasive, high-risk, and resource-intensive. By improving specificity, MMF-PH minimizes these unnecessary procedures and enhance clinical efficiency. Furthermore, when specificity is aligned, MMF-PH achieves higher sensitivity compared to TTE. This demonstrates that MMF-PH improves PH detection without increasing false positive rates, reinforcing its diagnostic reliability and reducing missed PH cases. MMF-PH also predicts long-term outcomes, with higher rehospitalization and mortality rates in MMF-PH-positive patients (Supplementary Table 8 and Supplementary Fig. 2). This highlights its potential for both diagnosis and risk stratification.

As PH progresses, characteristic changes appear on CXR25, ECG26, TTE27, CTPA28,29, and cMRI30. ECG may show right axis deviation, right ventricular hypertrophy, and right atrial enlargement, while TTE often reveals increased PASP and right ventricular dilation. Previous models25,26,27,28,29,30 using single modalities achieved AUROCs of 0.70–0.90. By integrating multimodal data, MMF-PH surpasses these models, achieving AUROCs of 0.965, 0.939, and 0.814 in the retrospective, prospective, and external test sets, respectively. This underscores the importance of multimodal integration in enhancing diagnostic performance.

The MMF-PH performed well across various subgroups. The model’s performance across demographic groups (sex, age, and BMI) further underscores its advantages. Higher specificity in these groups suggests that MMF-PH more reliably identifies non-PH patients, reducing unnecessary invasive procedures. However, performance improvements were not uniform across all subgroups. As shown in Fig. 5, sensitivity gains in some subgroups (e.g., panel d) were modest, while others (e.g., panel f) showed more substantial improvement. These variations may be influenced by differences in data distribution, sample sizes, and diagnostic complexity among PH subtypes. Additionally, wider confidence intervals in smaller subgroups, such as male patients, suggest sampling variation, indicating that while the magnitude of improvement should be interpreted with caution. Further validation in larger, more diverse cohorts is needed to confirm these findings.

Despite these promising results, our study has several limitations. First, external validation was based on a relatively small sample, limited to patients who underwent RHC. Expanding validation across diverse medical institutions and populations will help confirm the generalizability of MMF-PH. Notably, its performance remained robust across datasets, even when including TTE-negative cases (Supplementary Table 9). Second, while selection bias may exist in the control group, sensitivity analysis with alternative control definitions confirmed stable diagnostic performance (Supplementary Table 10). While MMF-PH consistently outperformed TTE, some performance metrics had wide confidence intervals, particularly in the external dataset, necessitating cautious interpretation. To complement this, ORs were reported to further support MMF-PH’s diagnostic advantage (Supplementary Table 3). Furthermore, MMF-PH employs an attention-based mechanism for modality contributions instead of SHAP or Grad-CAM, ensuring alignment with model architecture while maintaining computational efficiency. Lastly, the model’s complexity may limit its use in primary hospitals. Simplifying it for broader accessibility while preserving predictive accuracy will be essential. Future research should focus on expanding validation datasets, refining application scenarios, and improving interpretability to optimize clinical integration.

In summary, the MMF-PH algorithm demonstrated high accuracy in detecting PH, surpassing traditional TTE methods, by effectively integrating data from CXR, ECG, TTE, and other EHR information. It excels at reducing false positives and negatives, thereby potentially improving healthcare resource use and patient outcomes. Future work should explore new modalities and fusion techniques, conduct external validation, and evaluate the generalizability of the algorithm to solidify its standing in diverse clinical contexts.

Methods

Study design and population

To train and validate the MMF-PH for PH screening, we constructed a multicenter dataset using the EHRs of patients hospitalized at the Department of Cardiology, Fuwai Hospital, Shenzhen Fuwai, and Yunnan Fuwai Hospitals in China.

For the Fuwai Hospital dataset, our study targeted patients who underwent suspicion-based PH screening between 2019 and 2021. From a starting pool of 3453 adult inpatients, we excluded those with follow-up RHC records but lacked records from their initial procedure (n = 50); RHCs conducted for reasons other than PH diagnosis, such as surgical or interventional procedures (n = 906); and those lacking complete CXR, ECG, TTE, or clinical data (n = 46), leaving 2451 patients. PH diagnosis was based on invasive RHC at rest and the latest criteria (mean pulmonary arterial pressure [mPAP] of >20 mmHg). Based on this criterion, 2288 patients were diagnosed with PH, of whom 163 did not meet the PH criteria. We stratified patients with RHC into training, validation, and testing sets at a ratio of 7:1:2. Considering that most patients undergoing RHC were diagnosed with PH at our hospital, we constructed a control group, which was added to the training and validation sets. Specifically, this control group included 108,043 patients who did not undergo RHC, exhibited no PH symptoms or signs, and had negative routine TTE results for PH. After applying identical exclusion criteria, we randomly selected 1700 patients. Among these, 1487 were included in the training set, and the remaining 213 patients were included in the validation set. We then obtained 3202, 458, and 491 patients for training, validation, and testing, respectively (Fig. 6).

Fig. 6: Flowchart of patient inclusion and dataset distribution.
figure 6

The flowchart depicts the division of patients into training, validation, retrospective test, prospective test, and external test datasets, including exclusion criteria and random selection processes. “C” indicates the control group, which consisted of 1700 randomly selected patients who did not undergo RHC, exhibited no PH symptoms or signs, and had routine TTE results negative for PH. Among these, 1487 patients were added to the training set, and 213 patients were added to the validation set. CXR chest X-ray, EHR electronic health records, MMF-PH Multi-Modal Fusion model for Pulmonary Hypertension screening, PH pulmonary hypertension, RHC right heart catheterization, TTE transthoracic echocardiography.

Furthermore, we incorporated a prospective testing dataset of 477 inpatients who were hospitalized in 2022 at the Department of Cardiology, Fuwai Hospital. These patients were meticulously selected following the same stringent criteria as the initial cohort, to verify the generalizability and effectiveness of the model in detecting PH.

For external validation, we utilized data from 36 patients who underwent RHC at Shenzhen and Yunnan Fuwai Hospitals to ensure consistency in the selection standards. None of the patients appeared in more than one of the training, validation, retrospective testing, prospective testing, or external validation datasets from the participating centers.

The study protocol was registered in the Clinical Trials Registry (ClinicalTrials.gov, NCT05566002, registered on 09/30/2022) and approved by the Ethics Committee of Fuwai Hospital (Approval No. 2022-1739). All patients involved in the study provided written informed consent prior to inclusion. To uphold privacy standards, all data were thoroughly anonymized prior to analysis.

Data collection and multimodal feature extraction

Comprehensive patient information was gathered, including demographics; medical history; laboratory test results; and diagnostic imaging findings from CXR, ECG, and TTE. This dataset was systematically organized into three main categories: tabular, textual, and imaging. The tabular data included patient demographics; medical histories; and quantitative findings from laboratory tests, CXR, ECG, and TTE. The textual data included detailed symptom narratives, ECG interpretations, and CXR and TTE reports. The imaging data consisted solely of CXR images.

In handling the tabular data, we tackled missing categorical variables by allocating a unique token to represent the absence of data, specifically assigning ‘0’ for missing medical history entries. For continuous variables, missing values were imputed using median values. Subsequently, categorical variables were transformed via one-hot encoding, whereas continuous variables underwent min-max normalization to standardize their range.

For textual data processing, we integrated all textual inputs and employed the Jieba package (https://pypi.org/project/jieba/) to segment the text into word sequences, including the removal of stop words from words to enhance the clarity and relevance of the data for subsequent analysis.

In the image data domain, specifically for the CXR images, we normalized the pixel intensity to a scale of 0–255 and resized the images to a uniform dimension of 1024 × 1024 pixels. To augment the training dataset, approximately 50% of the images were subjected to either horizontal or vertical flipping, which enhanced the robustness and variability of the image data for more effective model training.

Development of the MMF-PH

MMF-PH (Fig. 1) is a transformer-based representation learning model. Specifically, the MMF-PH employs various data sources to enhance PH detection accuracy, integrating transformers into a DAE31 for tabular clinical data, BERT for processing textual reports, and ViT for analyzing image data.

To augment the PH classification efficacy, we first pre-trained the DAE, BERT, and ViT. Subsequently, these pretrained models were integrated into the MMF-PH framework. Three distinct networks — tabular data, textual reports, and images — were integrated using a self-attention mechanism, as detailed in Eq. (1):

$$\left\{\begin{array}{c}Q=E\times {W}^{Q},\\ K=E\times {W}^{K},\\ V=E\times {W}^{V},\\ Z={softmax}\left(\frac{Q\times {K}^{T}}{\sqrt{{d}_{k}}}\right)\times V\end{array}\right.$$
(1)

where \(E\in {{\mathbb{R}}}^{(3+d)\times h}\) is a latent representation, which is concatenated by the representation of tabular (denoted as \({e}_{{tab}}\in {{\mathbb{R}}}^{1* h}\)), textual (denoted as \({e}_{{txt}}\in {{\mathbb{R}}}^{d* h}\)), imaging data (denoted as \({e}_{{img}}\in {{\mathbb{R}}}^{1* h}\)), and a representation of a special token (denoted as \({e}_{{CLS}}\in {{\mathbb{R}}}^{1* h}\)). \({W}^{Q}\in {{\mathbb{R}}}^{h* h}\), \({W}^{K}\in {{\mathbb{R}}}^{(3+d)* (3+d)}\), and \({W}^{V}\in {{\mathbb{R}}}^{(3+d)* (3+d)}\) were parameters in the self-attention mechanism. In addition, \({d}_{k}\) was the scaling coefficient set to (\(3+d)\) here. Finally, an output layer with a sigmoid function was added to perform the PH disease classification.

The binary cross-entropy loss function of the MMF-PH was minimized using a stochastic gradient descent with an initial learning rate of 0.0003, maximum epochs of 30, and a batch size of 64. Early stopping was used to prevent overfitting of the validation dataset. The model was developed in Python 3.8.0 with the PyTorch framework (version 2.0.1)32. The specific algorithmic process and framework of the model are provided in the Supplementary Information. The notations, encoder and classifier parameters, detailed input variables, and feature missing rates are summarized in Supplementary Tables 11, 12, 13, and 14, respectively.

Evaluation of model performance

The performance of the MMF-PH was assessed using receiver operating characteristics and precision-recall curves, with the areas under these curves (AUROC and AUPRC) used to quantify diagnostic accuracy. The F1 score, which is a balanced measure of precision and recall, was used to gauge the overall predictive power of the model. The optimal cutoff value for PH prediction was determined by maximizing the F1 score in the validation set. Diagnostic metrics, such as accuracy, sensitivity (recall), specificity, positive predictive value (PPV), and NPV were calculated with 95% CIs established via bootstrapping to account for variations in sampling33.

A robustness analysis was performed, testing the model across various patient demographics and clinical characteristics, including gender (male vs. female), age (<65 vs. ≥65 years), BMI (<25 vs. ≥25 kg/m2), and World Health Organization/New York Heart Association functional classes (I-II vs. III-IV). To demonstrate the advantage of multimodal data integration, we compared the MMF-PH to singular-data models (tabular, textual, and imaging) to highlight the performance gains from multidata type fusion. Additionally, we compared MMF-PH outcomes with those of conventional TTE screening in patients who also underwent RHC to evaluate the model’s diagnostic enhancement over traditional methods. This comparison included sensitivity- and specificity-aligned analyses to ensure a fair assessment of each method’s capabilities. To further assess MMF-PH’s diagnostic advantage, we computed ORs to compare its performance with TTE across the overall datasets and key demographic subgroups. By aligning these approaches, this study aimed to provide a comprehensive understanding of how multimodal data fusion in the MMF-PH enhances PH detection relative to that achieved with TTE. We also assessed MMF-PH’s prognostic value in the prospective dataset by analyzing rehospitalization and mortality over 2 years using Kaplan–Meier survival analysis and HRs.

Statistical analyses

Continuous variables are reported as the mean ± standard deviation or median with interquartile range. Categorical variables are summarized as frequency counts and percentages. We employed a two-sample t-test or Mann–Whitney U test, depending on data normality, to compare continuous variables across the training, validation, and test datasets. Categorical variables were compared using the Chi-squared test. Statistical significance was set at P < 0.05. 3. Python (version 3.8.0) was used for all statistical analyses.