Deep learning approach for screening neonatal cerebral lesions on ultrasound in China

Lin, Zhouqin; Zhang, Haoming; Duan, Xingxing; Bai, Yan; Wang, Jian; Liang, Qianhong; Zhou, Jingran; Xie, Fusui; Shentu, Zhen; Huang, Ruobing; Chen, Yayan; Yu, Hongkui; Weng, Zongjie; Ni, Dong; Liu, Lei; Zhou, Luyao

doi:10.1038/s41467-025-63096-9

Download PDF

Article
Open access
Published: 20 August 2025

Deep learning approach for screening neonatal cerebral lesions on ultrasound in China

Zhouqin Lin ORCID: orcid.org/0000-0001-7059-6899^1,2^na1,
Haoming Zhang ORCID: orcid.org/0009-0008-6442-7739^3,4^na1,
Xingxing Duan ORCID: orcid.org/0000-0002-4094-5127⁵^na1,
Yan Bai⁶^na1,
Jian Wang ORCID: orcid.org/0000-0003-1962-0341^3,7,8,9,
Qianhong Liang¹⁰,
Jingran Zhou^1,2,
Fusui Xie^1,2,
Zhen Shentu²,
Ruobing Huang^3,4,
Yayan Chen¹¹,
Hongkui Yu¹²,
Zongjie Weng¹³,
Dong Ni ORCID: orcid.org/0000-0002-9146-6003^4,8,9,14,
Lei Liu^1,2 &
…
Luyao Zhou ORCID: orcid.org/0000-0002-1828-0878^1,2,15

Nature Communications volume 16, Article number: 7778 (2025) Cite this article

6342 Accesses
15 Altmetric
Metrics details

Subjects

Abstract

Timely and accurate diagnosis of severe neonatal cerebral lesions is critical for preventing long-term neurological damage and addressing life-threatening conditions. Cranial ultrasound is the primary screening tool, but the process is time-consuming and reliant on operator’s proficiency. In this study, a deep-learning powered neonatal cerebral lesions screening system capable of automatically extracting standard views from cranial ultrasound videos and identifying cases with severe cerebral lesions is developed based on 8,757 neonatal cranial ultrasound images. The system demonstrates an area under the curve of 0.982 and 0.944, with sensitivities of 0.875 and 0.962 on internal and external video datasets, respectively. Furthermore, the system outperforms junior radiologists and performs on par with mid-level radiologists, with 55.11% faster examination efficiency. In conclusion, the developed system can automatically extract standard views and make correct diagnosis with efficiency from cranial ultrasound videos and might be useful to deploy in multiple application scenarios.

Unsupervised abnormality detection in neonatal MRI brain scans using deep learning

Article Open access 17 July 2023

Determining regional brain growth in premature and mature infants in relation to age at MRI using deep neural networks

Article Open access 15 August 2023

Deep neural networks allow expert-level brain meningioma segmentation and present potential for improvement of clinical practice

Article Open access 14 September 2022

Introduction

Neonatal cerebral lesions refer to a range of neurological abnormalities, including intraventricular hemorrhage (IVH), periventricular leukomalacia (PVL), and ventriculomegaly. These conditions are particularly common in preterm and low birth weight (BW) infants, as well as neonates who experience hypoxia-ischemia during delivery^1,2. Among them, IVH is the most common and extensively studied form of brain injury³. The severity of IVH is typically categorized into four grades using the Papile grading system, with grades III and IV considered severe^4,5. Severe IVH often occurs alongside complications such as hydrocephalus and PVL, significantly increasing neonatal mortality and the risk of adverse neurodevelopmental outcomes⁶. For example, grade III–IV IVH is associated with nearly a five-fold increase in the risk of cerebral palsy (odds ratio: 4.98, 95% CI: 4.13–6.00)⁷. These conditions may also lead to long-term survivors experiencing intellectual disabilities, cognitive impairments, visual disturbances, or social issues⁸. Consequently, the timely identification and treatment of cerebral lesions in high-risk neonates are crucial for their healthy development.

Cranial ultrasound (CUS) is the primary modality for diagnosing and monitoring neonatal cerebral lesions due to its portability, non-invasive nature, and lower cost compared to MRI^9,10,11. The standard anterior fontanelle approach in CUS provides clear visualization of critical brain regions, including the lateral ventricles, thalamus, and periventricular white matter—areas where severe cerebral lesions are commonly found^12,13. The recommended CUS examinations for preterm infants or neonates suspected of having IVH should be conducted on days 1, 3, 7, 14, 28, and at term-equivalent age¹⁴. Moreover, repeating CUS around the 28th day of life can detect parenchymal abnormalities and ventriculomegaly and provide prognostic information for further management¹⁵. Conducting high-quality CUS examinations requires experienced radiologists who can not only perform thorough scans but also interpret and diagnose CUS images or videos accurately. Unfortunately, a global shortage of skilled radiologists limits access to reliable screening and examination. This shortage impacts high-income countries such as the United States and the United Kingdom¹⁶ and presents even greater challenges in low- and middle-income regions, where the prevalence of severe brain injuries in infants is often higher^2,17. Therefore, there is an urgent need for an automated, reliable, and scalable tool that can assist in identifying high-risk cases.

Recent advancements in deep learning (DL) offer a transformative opportunity to address these challenges. Studies have demonstrated that DL models can detect diseases from medical images, comparable to human experts^18,19. Unlike human radiologists who rely on subjective experience, DL models analyze images by identifying deep and complex patterns, which allows for faster, more consistent, and objective diagnoses. This capability could help alleviate the shortage of trained radiologists. In recent years, the development of DL for fetal cranial imaging^20,21 has refined prenatal diagnosis, laying the foundation for early intervention in neonatal brain disorders. For instance, a recent study has focused on binary classification of CUS images (e.g., normal vs. abnormal) in very preterm infants²², but the clinical value of such models is limited. This work overlooks the distinction between mild and severe abnormalities of neonatal cerebral lesions. Mild abnormalities, such as low-grade IVH and small cysts, often resolve spontaneously, whereas severe cerebral lesions require urgent attention due to their potential for long-term damage²³. Therefore, a more detailed severity classification is essential to accurately identify and prioritize the most critical cases.

To address this gap, we proposed a neonatal cerebral lesions screening system (NCLS) to identify neonates at high risk for severe cerebral lesions. The system includes a view extraction module to detect key anatomical structures and extract standard views from CUS videos and a diagnostic module that integrates multiple views from the same neonate to predict lesion severity. These modules were trained and developed using an internal development set consisting of 8757 neonatal CUS images, retrospectively collected from a single hospital, corresponding to 1518 cases. To evaluate the performance of NCLS, we employed two distinct test datasets. The first test dataset was an internal video test set, comprising 199 cases prospectively collected from the same hospital. The second test dataset was an external video dataset, which included 356 cases prospectively collected from three other centers, enabling us to assess the clinical potential and generalizability of NCLS.

Result

Dataset and readers

As shown in Fig. 1a, we collected 1518 cases (gestational age (GA) 35.00 (95% Confident Interval, CI: 25.14–40.86) weeks; from January 2021 to December 2022) from Shenzhen Children’s Hospital (SZCH) to form the development dataset, which was further split into training and test sets in an 8:2 ratio. Additionally, we collected 199 cases (GA 36.14 (95% CI: 27.23–41.00) weeks; from January 2023 to June 2023) from the same hospital to form the internal test dataset. The external test dataset was collected from three centers (Guangzhou Panyu District Maternal and Child Health Care Hospital, GZMCH; Sichuan Provincial Maternity and Child Health Care Hospital, SCMCH; Changsha Hospital for Maternal and Child Health Care, CSMCH) and included 356 cases (GA 37.00 (95% CI: 26.57–40.86) weeks; from October 2023 to April 2024). Each case contained four to six CUS standard views and two CUS videos. The CUS standard views included the anterior horn view (AHV), third ventricle view (TVV), and body view (BV) in the coronal plane, as well as the midsagittal view (MSV) and right parasagittal view (RPSV), and left parasagittal view (LPSV) in the sagittal plane. The CUS videos included both coronal and sagittal sweeping videos. Table 1 provides a summary of the demographic information of included infants, and Supplementary Fig. 1 shows the distribution of different diseases, GA groups, and probe types in the datasets. Supplementary Tables 1 and 2 provide more information regarding imaging details, including transducer frequency, manufacturer, model name, depth, and more.

**Fig. 1: Dataset, NCLS workflow, and study design.**

Table 1 Baseline characteristics of the development dataset, internal and external test datasets

Full size table

As shown in Fig. 1b, the workflow of NCLS consisted of two stages. Stage 1 focused on the anatomical structure detection task, with the goal of extracting standard views from CUS videos based on the detection results. Stage 2 focused on the diagnostic task, aiming to classify whether the case has severe cerebral lesions based on the multiple standard views. We trained and validated our system using five-fold cross-validation in the development dataset and evaluated its performance in the internal and external test datasets. Moreover, twenty-five radiologists with varying levels of experience in CUS diagnosis were recruited for the study. Among them, nine junior radiologists had about 1–2 years of experience in clinic CUS diagnosis, eleven mid-level radiologists had 3–7 CUS diagnosis experience, and five senior radiologists had more than 8 years of CUS diagnosis experience, including two authoritative senior radiologists with over 15 years of experience. These five senior radiologists contributed to the annotation for the development dataset, and the two authoritative senior radiologists were responsible for annotating the internal and external test datasets. Note that each CUS image in the development set was independently diagnosed as normal, IVH, ependymal cyst, ventricular dilation, hydrocephalus, or PVL, based solely on the content presented in the image⁹. These individual image diagnoses were then combined to form a case diagnosis. In contrast, the diagnostic labels for cases in the internal and external test sets were determined based on CUS videos, which aligns with the clinical diagnostic workflow. Cases with grade III or IV IVH, PVL, or hydrocephalus were classified as severe abnormalities and labeled as 1^5,23,24. Cases with grade I or II IVH, ependymal cysts, or minor ventricular dilatation, along with normal cases, were categorized as mild abnormalities and labeled as 0.

Comparison of NCLS and radiologists

To assess the diagnostic performance of the NCLS, we compared it with nine junior radiologists and eleven mid-level radiologists on the internal and external test datasets (Fig. 1c). Table 2 shows the comparison results. In the internal test dataset, the NCLS achieved a sensitivity of 0.875 (95% CI: 0.687–1.000), specificity of 0.934 (95% CI: 0.897–0.967), AUC of 0.982 (95% CI: 0.958–0.997), and an F1-score of 0.667. In comparison, the average performance of the junior radiologists was as follows: sensitivity of 0.875 (95% CI: 0.810–0.924), specificity of 0.851 (95% CI: 0.833–0.868), AUC of 0.863, and F1-score of 0.489. The Fleiss’ Kappa value was 0.3005 (P value < 0.001), indicating low agreement among the junior radiologists. For the mid-level radiologists, the average sensitivity was 0.813 (95% CI: 0.747–0.867), specificity was 0.986 (95% CI: 0.980–0.991), AUC was 0.900, and F1-score was 0.827. The Fleiss’ Kappa value was 0.7615 (P value < 0.001), indicating strong agreement within the group. Figure 2a displays the AUC for each radiologist and the ROC curve of the NCLS. The sensitivity of NCLS was no less than that of the radiologist group, showing a comparable diagnostic performance to that of radiologists in severe cases. The specificity of the junior radiologists was significantly lower than that of the NCLS, indicating that junior radiologists tended to diagnose an excessive number of false-positive cases, which could lead to unnecessary additional work and increased patient burden. In terms of AUC and F1-score, the diagnostic performance of the NCLS overall surpassed that of the junior radiologists.

**Fig. 2: Performance of the NCLS and radiologists.**

Table 2 Comparison of NCLS and radiologists

Full size table

On the external test set, the NCLS achieved a sensitivity of 0.962 (95% CI: 0.869–1.000), specificity of 0.927 (95% CI: 0.899–0.953), AUC of 0.944 (95% CI: 0.894–0.973), and an F1-score of 0.667. In comparison, the average performance of the junior radiologists included a sensitivity of 0.718 (95% CI: 0.656–0.775), specificity of 0.924 (95% CI: 0.913–0.933), AUC of 0.821 (95% CI: 0.792–0.850), and an F1-score of 0.534. The Fleiss’ Kappa value was 0.3950 (P value < 0.001), indicating low agreement within the group. For the mid-level radiologists, the average sensitivity was 0.717 (95% CI: 0.661–0.768), specificity was 0.976 (95% CI: 0.970–0.980), AUC was 0.846 (95% CI: 0.830–0.872), and the F1-score was 0.707. The Fleiss’ Kappa value was 0.6072 (P value < 0.001), indicating strong agreement within the group. The NCLS showed significantly higher sensitivity than both the junior and mid-level radiologists, while maintaining a high level of specificity. Moreover, while the performance of the radiologists varied considerably between the two datasets, the NCLS maintained consistently high diagnostic performance and reliability. Performance of each radiologist was shown in Supplementary Tables 3–5. For diagnostic speed (including extracting standard views and diagnosis), NCLS took an average of 28.97 s per case, junior radiologists took an average of 60.13 s per case, and mid-level radiologists took an average of 63.42 s per case. The majority of time spent by NCLS was on detecting the anatomical structures in each frame of the video, whereas the number of frames in CUS videos was uncertain.

We further analyzed cases of missed diagnoses and false positives, as shown in Fig. 2b. The NCLS missed four cases, all of which were PVL. This can likely be attributed to the data imbalance, as PVL cases represented only 1.5% of the training set. Figure 3 shows the standard views extracted by the system on misdiagnosed cases. These views reveal the cystic white matter injury, allowing radiologists to easily identify the abnormalities. Moreover, the majority of false positives occurred in grade I or II IVH cases, likely due to their visual similarity to grade III IVH in imaging. For the radiologists, the distribution of diagnostic errors was more scattered compared to the NCLS, indicating greater variability among radiologists. Figure 4 shows the heatmaps generated from multiple views using Finer-CAM²⁵, and it reveals that the NCLS focuses on areas around the anterior horn and body of the lateral ventricles. Supplementary Fig. 2 shows the regions of interest (ROIs) annotated by the radiologist and the heatmaps, demonstrating a high degree of overlap between the areas of focus identified by the NCLS and those identified by the radiologist. We also assessed the classification performance of the NCLS for both binary classification (normal vs. abnormal) and three-tier classification (normal vs. mild vs. severe) across all test sets, with detailed information provided in Supplementary Tables 6–8. Additionally, we conducted subgroup analyses based on probe types and different GA groups, and the corresponding results can be found in Supplementary Tables 9 and 10. Furthermore, we also conducted an analysis based on the Papile and Volpe grading systems for IVH. Supplementary Fig. 3 shows the distribution changes of the original Papile graded IVH data after being reclassified according to the Volpe grading system. Supplementary Table 11 provides the evaluation results of the NCLS using both grading systems. Additionally, Supplementary Fig. 4 illustrates the prognosis of IVH cases under both grading systems.

**Fig. 3: Extracted standard views of misdiagnosed cases in the internal and external test sets.**

Performance of junior radiologists with AI assistance

From the analysis in the previous section, we concluded that the NCLS outperformed the junior radiologist group. We further evaluated the role of the system in assisting in diagnostic decision-making. Specifically, the junior radiologists were instructed to re-evaluate the internal and external sets with the assistance of the NCLS 1 month after their initial evaluation. During this process, the NCLS provided not only the diagnosis result but also standard views of each case (Fig. 1d). Table 2 presents the diagnostic results of junior radiologists with and without AI assistance. With AI assistance, in the internal test set, the average sensitivity of the junior radiologist group improved by 9.72% (95% CI: 2.08%–18.06%). Additionally, the average specificity improved by 4.25% (95% CI: −5.35% to 13.30%). In the external test set, the average sensitivity improved by 19.24% (95% CI: 3.85%–35.05%), and the average specificity showed a mean difference of −1.11% (95% CI: −5.87% to 4.24%). The result demonstrated that junior radiologists could significantly improve the accuracy of diagnosing severe cases with AI assistance, without compromising specificity. Figure 5a, b illustrates the changes in the metric of each radiologist. We observed that for seven out of nine junior radiologists in the internal test set and all junior radiologists in the external test set, sensitivity has improved, and the AUC values of all junior radiologists have increased as well. These findings indicated that our system could effectively assist radiologists in making more efficient and reliable diagnoses.

**Fig. 5: Comparison of junior radiologists without and with AI assistance.**

We further analyzed the impact of AI assistance on junior radiologists. A total of 91.18% (62 out of 68) of the cases initially misdiagnosed by junior radiologists were corrected. Among 17 cases where junior radiologists made correct independent diagnoses but NCLS provided incorrect diagnoses, none were modified. Before incorporating AI assistance, a total of nine junior radiologists misdiagnosed 33 cases of grade III IVH and 24 cases of PVL. Additionally, six radiologists misdiagnosed 17 cases of grade IV IVH, and one radiologist misdiagnosed 3 cases of hydrocephalus. With AI assistance, only two junior radiologists misdiagnosed 2 cases of grade III IVH, two radiologists misdiagnosed 2 cases of grade IV IVH, one radiologist misdiagnosed 1 case of hydrocephalus, and nine radiologists misdiagnosed 20 cases of PVL. These results indicate that, with the assistance of our system, the diagnostic abilities of junior radiologists in severe IVH cases were improved, though the support provided in diagnosing PVL cases remains limited. Figure 5c shows the number of severe and non-severe diagnoses made by the junior radiologists with AI assistance for each specific condition. Figure 5d shows the Cohen’s kappa values and Fleiss’ kappa value within the group. The Fleiss’ kappa value of the junior radiologist group with AI assistance was 0.7034 and 0.7526 on two test sets (P value < 0.001), showing a significant improvement compared to the Fleiss’ kappa value of the junior radiologist group without AI assistance, indicating that AI assistance enhanced inter-rater agreement among the junior radiologists.

Blind, randomized trial of junior radiologists versus AI

In order to evaluate the impact of the NCLS on the diagnostic decisions of radiologists and to objectively assess its diagnostic accuracy while minimizing potential biases introduced by the system, we implemented a blind and randomized trial. A total of nine junior radiologists were randomly and evenly categorized into three groups, and each group was assigned to one of three senior radiologists. For each case in both the internal and external test datasets, diagnostic materials were randomly selected by either a junior radiologist or the NCLS and presented to the senior radiologists under blind review. The senior radiologists were first tasked with determining whether the materials were AI-generated and then performed secondary diagnoses based on the provided information (Fig. 1e). Figure 6 illustrates the evaluation of the identification of the source of diagnostic materials. The senior radiologists correctly identified the source in 407 cases (26.8%), made incorrect identifications in 378 cases (24.9%), and were uncertain in 734 cases (48.3%). The degree of blinding in the trial was assessed using Bang’s Blinding Index²⁶, a metric where 0 represents perfect blinding and values of −1 or 1 indicate complete unblinding. Typically, a blinding index between −0.2 and 0.2 is considered to reflect good blinding. In this study, the blinding index was 0.007 for the AI group and −0.031 for the junior radiologist group. Additionally, the bootstrapped confidence interval remained consistently within the range of −0.2 to 0.2, with no statistical significance (P value > 0.49), further supporting the adequacy of blinding in the trial.

**Fig. 6: The metric of source identification.**

Table 3 shows the outcome of secondary diagnoses. The senior radiologists revised the initial diagnoses for 42 cases (5.5%) in the AI group compared with 158 cases (20.7%) in the junior radiologist group (difference −15.2%, 95% CI: −18.5% to −11.9%, P value < 0.001 for superiority). This indicated that under blinded and randomized conditions, the AI group demonstrated better diagnostic accuracy. Moreover, the secondary diagnoses aligned with the original gold standard in 738 cases (97.5%) in the AI group compared with 746 cases (97.9%) in the junior radiologist group (difference −0.4%, 95% CI: −2.1% to 1.2%, P value = 0.595 for superiority). The three senior radiologists took an average of 6.00 s (IQR: 5.20–7.05) to make secondary diagnoses based on the materials provided by the AI, and 8.00 s (IQR: 6.45–11.55) for those based on the materials from the junior radiologist group. The mean difference in time between the two groups was −2.79 s (95% CI: −3.87 to −1.70 s, P value < 0.001). The shorter time required for diagnoses based on the materials provided by the AI group can be attributed to the fact that NCLS is able to provide more precise diagnostic materials.

Table 3 Outcome of secondary diagnoses in a blind, randomized trial

Full size table

Evaluation of an AI system on blind sweeping data

To simulate a clinical environment with a shortage of experienced radiologists, we further validated our system using blind sweeping data. We prospectively collected 111 cases of blind sweeping data, where operators performed blind scans using a pre-set US probe intensity and were not allowed to view the US screen to adjust their sweeping technique during the process (Fig. 1f). All blind sweeping CUS videos were independently reviewed by two senior radiologists and their diagnoses results were used as the gold standard, which includes 7 severe cases and 4 cases that were excluded due to being unsuitable for diagnosis. Figure 7a, b shows examples of the extracted standard views. Blind sweeping CUS videos were input to NCLS and extracted multiple standard views. Specifically, 84 cases had six standard views extracted, 24 cases had five standard views, 2 cases had four standard views, and 1 case had three standard views, which was also among the 4 cases excluded from diagnosis. Two senior radiologists were also recruited to provide qualitative assessments of the extracted standard views. The scoring criteria are outlined in Supplementary Table 12, which includes factors such as view correctness, detection accuracy, anatomical recognition, and clinical value. For Radiologist 1, the average scores were 3.8, 4.1, 4.1, and 4.0, respectively, for these criteria. For Radiologist 2, the corresponding average scores were 3.2, 3.3, 3.5, and 3.4. Statistically significant differences were observed for each score (P value < 0.001). Figure 7c shows the qualitative scores made by the senior radiologists, and these findings suggest that the standard views extracted by the system from the blind sweeping data generally meet the requirements for clinical diagnosis.

**Fig. 7: Performance of the NCLS on blind sweeping data.**

Figure 7d shows the ROC curve, and Fig. 7e presents the confusion matrix for the NCLS on the blind sweeping data. Our system predicted the cases deemed unsuitable for diagnosis in the gold standard as non-severe, and achieved a sensitivity of 1.000 (8 out of 8), specificity of 0.961 (99 out of 103), F1-score of 0.800 and an AUC of 0.991 in the remaining blind sweeping data, indicating its potential for clinical applications, particularly in low-resource settings.

Discussion

In this study, we presented NCLS, a pioneering system for identifying severe neonatal cerebral lesions that may require further intervention among neonates at high risk of brain injury, based on dynamic CUS images. Key findings of our study are discovered as follows: Firstly, the NCLS can automatically extract standard views of neonatal brain images as well as detect severe cerebral lesions with high accuracy from CUS videos; Secondly, the system also helps junior radiologists improve their diagnostic performance while reducing time consumption; Thirdly, the senior radiologists are not able to differentiate the standard views of CUS images obtained by the NCLS from those obtained by sonographers; Last but not least, the NCLS can potentially determine the standard views of CUS images and make correct diagnosis via blind sweeping US videos. All these findings are very important because they demonstrate that this tailored system can potentially streamline the screening process of detecting severe cerebral lesions among neonates at high risk of brain injury, as well as reduce the workload of radiologists. Thus, with the help of the NCLS, the healthcare of neonates at high risk of brain injury might be improved, especially in low- and middle-income countries or regions with limited medical resources.

Previous studies have confirmed the effectiveness of using DL models for automated diagnosis of CUS images. Dongwoon et al.²⁷ employed simple models like AlexNet and VGG16 for binary classification (normal vs. abnormal) of CUS coronal and sagittal images. Kim et al.²⁸ utilized DL algorithms to perform a binary classification task for IVH detection based on MSV images. Tahani et al.²² used DL models for binary classification of normal and abnormal CUS images across multiple coronal views of extremely premature infants at different developmental stages. However, these studies have some limitations: first, they focused on classifying individual images as normal or abnormal, yet many mild abnormalities can resolve on their own²⁴. Therefore, the focus should be on identifying severe abnormalities, such as high-grade IVH. Additionally, a single image is limited in providing a comprehensive diagnosis of neonatal cerebral lesions. Second, the performance of these DL models relies on high-quality CUS images and lacks prospective validation. In contrast, we propose the NCLS, which uses a multi-view combined diagnostic strategy, combining four to six standard coronal and sagittal views from the same neonate for a comprehensive diagnosis. Moreover, we introduce a classification scheme that further distinguishes between severe and non-severe cases based on the presence or absence of brain injury and its severity⁵. The NCLS has been validated with prospective clinical data and blind-sweep data as well, proving its reliability and effectiveness.

Several advantages of the NCLS should be emphasized. First, the NCLS showed robust diagnostic performance in detecting severe cerebral lesions across diverse test sets. This might be due to the fact that we did not restrict the scanners and probes that were used to acquire the developed set. We did not restrict specific sonographers who performed the CUS either, which might also increase the generality of our system. Notably, the NCLS identified all grade III and IV IVH in both internal and external test sets. In contrast, both junior and mid-level radiologists had several cases of misdiagnosis or missed diagnoses in grade II and grade III IVH cases, suggesting a lack of clear diagnostic standards among them. As a screening tool, this is very important for the NCLS not to miss any severe IVH, which is dramatically associated with long-term neurodevelopment disorders^7,29. In addition, the performance of the NCLS in detecting severe cerebral lesions is superior to that of all junior radiologists in the external set. Furthermore, with the assistance of NCLS, the diagnostic sensitivities of all nine junior radiologists were greatly elevated in both test sets. This is very critical because a higher sensitivity might benefit both neonates suspected with severe cerebral lesions and their parents for timely intervention and early preparation for the potential negative outcomes. All these findings underscore the potential usefulness of the NCLS in screening severe cerebral lesions among neonates at high risk of brain injury in real clinical settings.

Second, the NCLS is interpretable because it not only provides standard views of CUS images with the diagnosis but also generates heatmaps for all standard views. The NCLS showed a powerful capacity to automatically extract standard views from CUS videos, even those obtained through blind sweeping. These standard views contain sufficient information for radiologists to make accurate diagnoses. Furthermore, the heatmaps consistently highlighted the anterior horn and body of the lateral ventricles across various disease types, aligning with critical regions for clinical diagnosis. Also, the heatmaps show good alignment with the areas marked by the radiologist, particularly in the MSV and coronal BV. With this capacity, even for those cases that the NCLS missed, the radiologists could make the right diagnosis by reviewing the standard views provided by the NCLS. Therefore, just as our results showed, the NCLS can not only help the junior radiologists improve the diagnostic accuracy but also increase the agreement of them (Kappa values from 0.3005 and 0.3950 to 0.7034 and 0.7526).

Our study also showed that the standard views provided by the NCLS are quite similar to those provided by junior radiologists, and the senior radiologists cannot distinguish between them. Thus, the NCLS might replace the junior radiologists or sonographers to provide standard views from CUS videos for further review with much less time consumption. Given that lots of time and energy are required for the tedious manual processing of CUS videos for junior radiologists or sonographers in their routine work, the results achieved by our system are encouraging.

Our study also demonstrated that the NCLS is able to process blind sweeping videos effectively, even under challenging conditions like blurriness, shaking, anatomical distortion, and fast probe movement. Moreover, the system was capable of providing accurate diagnoses based on these standard views. As we know, there is a lack of sonographers in low- and middle-income countries where medical resource is limited, such as those located in Africa or the Middle East³⁰. It is said that there is a severe shortage of sonographers even in the United States¹⁶. The NCLS provides a potential pathway to solve this problem in such conditions. With the application of NCLS, we can employ novices to obtain CUS videos by blind sweeping, which are then given to the NCLS for further automatic analysis. A similar study has confirmed the usefulness of the DL algorithm in solving gestational age estimation via blind US sweeps in low-resource settings^31,32.

In this study, we used the Papile grading system rather than the more recent Volpe grading system, primarily considering the widespread applicability of the Papile grading system in China and the consistency of multi-center validation standards. However, we also performed validation on the Volpe grading system data to assess the robustness of NCLS. The outcomes showed that the prognosis under both grading systems is consistent with clinical prognosis.

Several limitations exist in this study. First, the NCLS still missed some severe cases, especially PVL. To address this issue, we plan to expand data collection and explore additional methods to improve screening effectiveness from a multimodal perspective, incorporating demographic metadata and Doppler US images. Existing work has also demonstrated that metadata used in traditional machine learning can yield good results in predicting the severity of IVH and mortality rates^33,34. Second, the system is currently limited to binary classification, distinguishing only between intervention and non-intervention cases. It is not yet capable of diagnosing more specific conditions. Future exploratory plans will focus on identifying the specific diseases of newborns diagnosed for intervention. Additionally, since the training set was exclusively sourced from SZCH, we plan to collect a more diverse and generalized set from multiple centers in the next phase. Moreover, due to resource constraints, pathological gold standards and MRI confirmations were not consistently accessible. Therefore, consensus interpretation of pediatric radiologists was used as a practical, albeit less ideal, ground truth. In the future, we will continue to iterate and improve this CUS diagnostic system to enhance its performance, with plans to embed it into portable US devices for testing. Last but not least, this study did not reveal detailed information about the individuals who acquired the CUS images. Variations in operator experience could impact the generalizability of the NCLS.

In conclusion, we developed a robust deep learning system (NCLS) that can automatically extract standard views and make diagnoses from CUS videos. We demonstrated that this system could assist junior radiologists to elevate their diagnostic performance in screening severe cerebral lesions among neonates at high risk of brain injury. This system also showed potential capacity to replace junior radiologists to provide standard views from even blind sweeping CUS videos. Our proposed system might be useful to deploy in multiple application scenarios, such as intelligently detecting critical cerebral lesions in hospitals where medical resources are limited, and replacing sonographers to extract standard views of CUS images in hospitals where there is a shortage of sonographers. Future work will focus on addressing the identified limitations and conducting well-designed prospective studies to evaluate the effectiveness of our proposed system in screening neonatal cerebral lesions in real clinical settings.

Methods

Data collection and preprocessing

This study was conducted as a retrospective and prospective analysis, approved by the institutional review boards of each participating institution. The retrospective development set and prospective internal test set were obtained from SZCH (committee number 202312702), while the prospective external test set was collected from Guangzhou Panyu District Maternal and Child Health Care Hospital (IIT2023-17-02), Sichuan Provincial Maternity and Child Health Care Hospital (20240205-013), and Changsha Hospital for Maternal and Child Health Care (EC-20240102-09). The prospective section was registered at the Chinese Clinical Trial Registry (Registration number: ChiCTR2400079819). Consent was obtained from the guardians of all participants.

All CUS data were acquired by sonographers, junior radiologists, or trainees with 1–3 years of experience in pediatric ultrasound and stored in Digital Imaging and Communications in Medicine format. We first converted the images and videos to PNG and MP4 formats, respectively. Next, we applied a maximum connected domain image processing technique to extract the ROI from the CUS data, removing any information bars and other unrelated identifiers. Prior to training and validation, all data were anonymized.

We have collected neonatal CUS data since 2021, covering conditions such as normal, IVH (Grades I–IV), ependymal cyst, ventricular dilation, PVL, and hydrocephalus. The exclusion criteria for the CUS data were as follows: (1) incomplete CUS examinations; (2) extremely low-quality or unusable CUS data; (3) unclear diagnostic results; and (4) data from other ultrasound modalities, such as Color Doppler images. We also recorded key information for each examination, including the case ID, examination date, sex, age at examination, GA, BW, Apgar score, and the ultrasound equipment and probe. The ultrasound equipment used for the CUS data acquisition included GE HealthCare, EDAN, Mindray, Wisonic, Voluson, Philips Medical Systems, Toshiba, Canon, and Siemens. All this information is summarized in Table 1, and Supplementary Tables 1 and 2.

Development of NCLS

The development of the NCLS was carried out in two stages. Stage 1 focuses on building the view extraction module, which includes a detection model and a candidate frame scoring algorithm. Stage 2 involves developing the diagnostic module, which includes a model capable of processing multi-view data.

Stage 1: Building the view extraction module

We established the view extraction module that employs rotated bounding box object detection to identify anatomical structures within each frame of CUS videos. Based on the detection results, we implemented the candidate frame scoring algorithm to facilitate the extraction of standard views.

Development of the object detection model

We used an advanced end-to-end DL model, known as the Real-Time Detection Transformer (RT-DETR)³⁵, as shown in Supplementary Fig. 4a. This model utilizes ResNet50³⁶ as its backbone and incorporates a hybrid encoder architecture that effectively separates intra-scale interactions from cross-scale features, thereby enabling rapid and precise real-time detection. To better detect anatomical structures that may appear rotated in the CUS images, we incorporated a rotational angle dimension into the bounding box representation. This representation consists of five parameters: center coordinates, width, height, and rotation angle (which ranges from −90° to 90°). The output from the encoder underwent a top-k selection process, and the resulting memory was used as the initialized anchors and contents, which were then passed into the decoder. The decoder consisted of a matching part and a denoising part. The matching part ensured that the predicted detection boxes corresponded to the anatomical structures, and the denoising part enhanced the robustness of the model. We used Varifocal Loss for the classification loss to better align the classification and regression tasks, and L1 Loss along with the DIoU Loss for bounding box regression to improve localization precision.

Following the official RT-DETR recommended hyperparameter settings, all CUS images were resized to 640 × 640. The training strategy—including learning rates, learning rate schedule, and most data augmentations—adhered to these defaults to ensure training stability and optimal performance. To further enhance data diversity and model robustness, we incorporated Mosaic and Mixup augmentations. Training was performed using PyTorch 2.1 on four Nvidia A6000 GPUs, with a batch size of 16 (4 samples per GPU).

Candidate frame scoring algorithm

We also developed the candidate frame scoring algorithm specifically to extract standard views from CUS videos based on the detected anatomical structures. Supplementary Fig. 4b shows the process of the candidate frame scoring algorithm, and Supplementary Table 13 shows the candidate frame selection criteria and base score of each structure. Specifically, each frame could either be sent to the candidate queue for the relevant standard view or left without further processing. Subsequently, we calculated the score of each frame in the queue based on all detected anatomical structures of this frame, and formulate as follows:

$${{\rm{Score}}}=\sum {{\rm{Anatomical}}}\; {{\rm{base}}}\; {{\rm{score}}}\times {{\rm{confidence}}}$$

Moreover, the Density-Based Spatial Clustering of Applications with Noise³⁷ algorithm was applied to ascertain the scanning direction of the CUS videos, and to eliminate candidate frames that did not satisfy the established criteria, as shown in Supplementary Fig. 5. We clustered candidate frames based on the Euclidean distance between score and frame index and identified the scanning direction of the video based on clusters in different standard planes. After filtering the frames, we sorted the frames in each candidate frame queue by score and selected the frame with the highest score as the standard view frame for the CUS video. Supplementary Fig. 6 displays the detection results and the candidate frame scoring process.

Stage 2: Developing the diagnostic module

We further developed the multi-view combined diagnostic module to screen for cerebral lesions.

Diagnostic model development

The diagnostic model framework is shown in Supplementary Fig. 7. We used ConvNext³⁸ as the feature extraction backbone. Multiple standard views from the same neonate were combined and fed into the backbone. The extracted features were, on the one hand, input into the image diagnostic classification head to perform multi-label classification for single-image prediction; on the other hand, these features were padded and concatenated with a class token and fed into a multi-head self-attention block for feature fusion. The first position of the fused output was then fed into the case diagnostic classification head to obtain the diagnostic result of whether cases were severe or not.

We implemented an ensemble diagnostic strategy to integrate outcomes from both the multi-label and binary classification heads. The multi-label head identified a neonate as severe if at least one image exhibited PVL, hydrocephalus, or a combination of IVH and ventriculomegaly. Conversely, the binary head provided a direct case-level prediction of severity. When the binary head predicted non-severe but the multi-label head suggested severe findings, we applied an additional rule-based refinement. Specifically, we counted the number of abnormal images in the case—defined as any image not predicted to be exclusively normal. If the multi-label predictions included severe findings and the number of abnormal images was two or more, the case was ultimately classified as severe.

For the multi-label classification of image predictions, we used binary cross-entropy loss. For the binary classification task of patient diagnosis, we adopted a class-balanced cross-entropy loss³⁹ to address the extreme data imbalance by re-weighting. Additionally, we utilized a regularization called RegMixup⁴⁰ to mitigate data uncertainty in the training period. We also implemented Test-Time Augmentation to enhance prediction robustness by averaging results from multiple augmented versions of the input images in the inference period.

We performed a grid search on the first fold of the validation set to identify the optimal combination of hyperparameters, including image size, number of epochs, learning rate, and batch size. The top 10 configurations are presented in Supplementary Table 14. Based on the best-performing setup, all CUS images were resized to 256 × 256, and various augmentations were applied during training, including flipping, grayscale conversion, rotation, and Gaussian blur. The model was trained for 45 epochs with a batch size of 16 using the AdamW optimizer, combined with a warm-up strategy and cosine learning rate decay. Training was conducted on a single Nvidia A6000 GPU using PyTorch version 2.1.

Model validation

Ablation study

We conducted an ablation study on the development test set to evaluate the efficacy of our proposed methods. The baseline model is a ConvNext architecture with multi-view fusion followed by binary classification. As shown in Supplementary Table 15, the integration of the multi-label classification head resulted in a significant improvement in diagnostic accuracy. This gain is attributed to the enhanced ability of the model to discern fine-grained diagnostic information during training. In contrast, the diagnostic fusion strategy provided limited improvements. Nevertheless, this module was retained in the final model to mitigate the potential for missed diagnoses.

SOTA comparison

We compared several classic and advanced classification models and employed the hyperparameters as recommended in the original publications. As shown in Supplementary Table 15, our proposed model with incorporating ensemble diagnosis achieved superior performance, yielding the highest AUC of 0.9725, SEN of 0.9317, and NPV of 0.9885. These findings demonstrate the outstanding ability to distinguish between cases of severe brain injuries.

Evaluation metrics

For stage 1, the detection performance of NCLS was evaluated by assessing mAP, FPS, and the qualitative score assessment of extracted standard views, where mAP represents the average precision across different recall levels. For stage 2, the performance of NCLS and that of all radiologists were evaluated by assessing the sensitivity, specificity, PPV, NPV, F1-score, and AUC with two-sided 95% CIs. Formulas as follows:

$$\begin{array}{c}{{\rm{Sensitivity}}}=\frac{{{\rm{TP}}}}{{{\rm{TP}}}+{{\rm{FN}}}}\\ \begin{array}{c}{{\rm{Specificity}}}=\frac{{{\rm{TN}}}}{{{\rm{FP}}}+{{\rm{TN}}}}\\ {{\rm{PPV}}}=\frac{{{\rm{TP}}}}{{{\rm{TP}}}+{{\rm{FP}}}}\\ \begin{array}{c}{{\rm{NPV}}}=\frac{{{\rm{TN}}}}{{{\rm{TN}}}+{{\rm{FN}}}}\\ {{\rm{F}}}1-{{\rm{score}}}=2\cdot \frac{{{\rm{Precision}}}\times {{\rm{Recall}}}}{{{\rm{Precision}}}+{{\rm{Recall}}}}\end{array}\end{array}\end{array}$$

We will also record the time taken by the NCLS for the entire screening process and compare it with the time required by radiologists.

Statistical analyses

In this study, categorical variables were expressed as counts and percentages, and continuous variables were represented as median (95% CI), with the 95% CIs of all continuous variables calculated using the Clopper-Pearson method. For comparisons between different groups in the development dataset, the chi-square test was used for the significance testing of categorical variables, and the Mann–Whitney U test was used for continuous variables. For comparisons of AUC performance between AI and radiologists, as well as between junior radiologists with and without AI assistance, a paired, one-sided Wilcoxon signed-rank test was used based on reader-level ΔAUC values. In the blind and randomized trial, Bang’s blinding index was used to assess the degree of blinding. The superiority of the correction rate and gold standard consistency was tested using two-sided Pearson chi-square tests, and secondary diagnostic time was compared using a one-sided Welch t-test under the directional hypothesis that AI assistance reduces diagnostic time. For subgroup analyses, differences in AUC between probe types were tested using the independent-sample DeLong test (two-sided), and pairwise comparisons across GA groups were conducted using the independent-sample DeLong test with Benjamini–Hochberg correction for multiple comparisons. Additionally, in the assessment under Papile and Volpe grading systems, AUC differences across GA groups were evaluated using two-sided independent-sample DeLong tests, also corrected using the Benjamini–Hochberg method. Agreement between AI and different radiologists was assessed using Cohen’s kappa, and inter-rater consistency within the radiologist group was evaluated using Fleiss’ kappa. In the qualitative scoring assessment, a paired t-test was used to evaluate the significance of differences between two radiologists across four scoring items.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The ultrasound images and videos used in this study are not publicly available due to hospital regulations and ethical restrictions protecting patient privacy. Qualified researchers may request access to de-identified data for non-commercial academic use. Requests should be submitted to the corresponding author and will require a data use agreement approved by the institutional ethics board of Shenzhen Children’s Hospital. Access will only be granted to researchers affiliated with recognized institutions, with documented expertise in medical imaging or related fields, and with a clear scientific purpose aligned with the original study. The minimum dataset required to interpret and replicate the key findings is available under these access conditions. Access requests will be reviewed within 2 months of submission. Source data are provided with this paper.

Code availability

The codes are available for scientific research and non-commercial use on GitHub at https://github.com/Je1zzz/Neonatal_cerebral_lesions_screening_NCLS. A citable version with a DOI is available via Zenodo at https://doi.org/10.5281/zenodo.15772754⁴¹.

References

Wang, B., Zeng, H., Liu, J. & Sun, M. Effects of prenatal hypoxia on nervous system development and related diseases. Front. Neurosci. 15, 755554 (2021).
Article PubMed PubMed Central Google Scholar
Ramenghi, L. A. Late preterm babies and the risk of neurological damage. Acta Bio Med. Atenei Parm. 86, 36–40 (2015).
Google Scholar
El-Atawi, K. et al. Risk factors, diagnosis, and current practices in the management of intraventricular hemorrhage in preterm infants: a review. System 16, 17 (2016).
Google Scholar
Papile, L.-A., Burstein, J., Burstein, R. & Koffler, H. Incidence and evolution of subependymal and intraventricular hemorrhage: a study of infants with birth weights less than 1,500 gm. J. Pediatr. 92, 529–534 (1978).
Article PubMed CAS Google Scholar
Dorner, R. A., Burton, V. J., Allen, M. C., Robinson, S. & Soares, B. P. Preterm neuroimaging and neurodevelopmental outcome: a focus on intraventricular hemorrhage, post-hemorrhagic hydrocephalus, and associated brain injury. J. Perinatol. 38, 1431–1443 (2018).
Article PubMed PubMed Central Google Scholar
UNICEF, World Health Organization & others. Levels and Trends in Child Mortality: Report 2023—Estimates Developed by the United Nations Inter-Agency Group for Child Mortality Estimation. (UNICEF, New York, 2024).
Rees, P. et al. Preterm brain injury and neurodevelopmental outcomes: a meta-analysis. Pediatrics 150, (2022).
Hwang, M. et al. Ultrasound imaging of preterm brain injury: fundamentals and updates. Pediatr. Radiol. 52, 817–836 (2022).
Richer, E. J., Riedesel, E. L. & Linam, L. E. Review of neonatal and infant cranial US. Radiographics 41, E206–E207 (2021).
Article PubMed Google Scholar
Singh, Y. et al. International evidence-based guidelines on Point of Care Ultrasound (POCUS) for critically ill neonates and children issued by the POCUS Working Group of the European Society of Paediatric and Neonatal Intensive Care (ESPNIC). Crit. Care 24, 1–16 (2020).
Article Google Scholar
World Health Organization. Standards for improving the quality of care for small and sick newborns in health facilities. World Health Organization, Geneva (2020).
Reddy, P. V. Role of Neurosonography in evaluation of brain abnormalities in neonates. Perspect. Med. Res. 6, 26–30 (2021).
Article Google Scholar
American Institute of Ultrasound in Medicine, American College of Radiology AIUM practice guideline for the performance of neurosonography in neonates and infants. J. Ultrasound Med. J. Am. Inst. Ultrasound Med. 33, 1103–1110 (2014).
Google Scholar
Parodi, A., Govaert, P., Horsch, S., Bravo, M. C. & Ramenghi, L. A. Cranial ultrasound findings in preterm germinal matrix haemorrhage, sequelae and outcome. Pediatr. Res. 87, 13–24 (2020).
Article PubMed PubMed Central Google Scholar
Guillot, M., Chau, V. & Lemyre, B. Routine imaging of the preterm neonatal brain. Paediatr. Child Health 25, 249–255 (2020).
Article PubMed PubMed Central Google Scholar
Konstantinidis, K. The shortage of radiographers: a global crisis in healthcare. J. Med. Imaging Radiat. Sci. 55, 101333 (2024).
Article PubMed Google Scholar
Inagaki, D., Nakahara, S., Chung, U., Shimaoka, M. & Shoji, K. Need for improvements in medical device management in low-and middle-income countries: applying learnings from Japan’s experience. JMA J. 6, 188–191 (2023).
Article PubMed PubMed Central Google Scholar
Zhang, H. & Qie, Y. Applying deep learning to medical imaging: a review. Appl. Sci. 13, 10521 (2023).
Article CAS Google Scholar
Ait Nasser, A. & Akhloufi, M. A. A review of recent advances in deep learning models for chest disease detection using radiography. Diagnostics 13, 159 (2023).
Article PubMed PubMed Central Google Scholar
Xie, B. et al. Computer-aided diagnosis for fetal brain ultrasound images using deep convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 15, 1303–1312 (2020).
Article PubMed Google Scholar
Krishna, T. B. & Kokil, P. Standard fetal ultrasound plane classification based on stacked ensemble of deep learning models. Expert Syst. Appl. 238, 122153 (2024).
Article Google Scholar
Ahmad, T. et al. Can deep learning classify cerebral ultrasound images for the detection of brain injury in very preterm infants? Eur. Radiol. 35, 1948–1958 (2025).
Tréluyer, L. et al. Intraventricular hemorrhage in very preterm children: mortality and neurodevelopment at age 5. Pediatrics 151, e2022059138 (2023).
Article PubMed Google Scholar
Robinson, S. Neonatal posthemorrhagic hydrocephalus from prematurity: pathophysiology and current treatment concepts: a review. J. Neurosurg. Pediatr. 9, 242–258 (2012).
Article PubMed Google Scholar
Zhang, Z. et al. Finer-CAM: Spotting the difference reveals finer details for visual explanation. In 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9611–9620 (2025).
Bang, H., Ni, L. & Davis, C. E. Assessment of blinding in clinical trials. Control. Clin. Trials 25, 143–156 (2004).
Article PubMed Google Scholar
Hyun, D. & Brickson, L. Classification of neonatal brain ultrasound scans using deep convolutional neural networks. In Stanford CS229: Machine Learning Final Projects. https://cs229.stanford.edu/proj2016/report/HyunBrickson-ClassificationofNeonatalBrainUltrasoundScansUsingDeepConvolutionalNeuralNetworks-report.pdf (2016).
Kim, K. Y., Nowrangi, R., McGehee, A., Joshi, N. & Acharya, P. T. Assessment of germinal matrix hemorrhage on head ultrasound with deep learning algorithms. Pediatr. Radiol. 52, 533–538 (2022).
Article PubMed Google Scholar
Jańczewska, I., Wierzba, J., Jańczewska, A., Szczurek-Gierczak, M. & Domżalska-Popadiuk, I. Prematurity and low birth weight and their impact on childhood growth patterns and the risk of long-term cardiovascular sequelae. Children 10, 1599 (2023).
Article PubMed PubMed Central Google Scholar
Rosman, D. A. et al. Imaging in the Land of 1000 Hills: Rwanda radiology country report. J. Glob. Radiol. 1, 5 https://doi.org/10.7191/JGR.2015.1004 (2015).
Pokaprakarn, T. et al. AI estimation of gestational age from blind ultrasound sweeps in low-resource settings. NEJM Evid. 1, EVIDoa2100058 (2022).
Article Google Scholar
Stringer, J. S. et al. Diagnostic accuracy of an integrated AI tool to estimate gestational age from blind ultrasound sweeps. JAMA 332, 649–657 (2024).
Article PubMed PubMed Central Google Scholar
Kim, H. H., Kim, J. K. & Park, S. Y. Predicting severe intraventricular hemorrhage or early death using machine learning algorithms in VLBWI of the Korean Neonatal Network Database. Sci. Rep. 14, 11113 (2024).
Article ADS PubMed PubMed Central CAS Google Scholar
Yang, Y.-H. et al. Predicting early mortality and severe intraventricular hemorrhage in very-low birth weight preterm infants: a nationwide, multicenter study using machine learning. Sci. Rep. 14, 10833 (2024).
Article ADS PubMed PubMed Central CAS Google Scholar
Zhao, Y. et al. Detrs beat Yolos on real-time object detection. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16965–16974 (2024).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 (2016).
Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), 226–231 (1996).
Liu, Z. et al. A ConvNet for the 2020s. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11976–11986 (2022).
Cui, Y., Jia, M., Lin, T.-Y., Song, Y. & Belongie, S. Class-balanced loss based on effective number of samples. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9268–9277 (2019).
Pinto, F., Yang, H., Lim, S. N., Torr, P. & Dokania, P. Using mixup as a regularizer can surprisingly improve accuracy & out-of-distribution robustness. Adv. Neural Inf. Process. Syst. 35, 14608–14622 (2022).
Google Scholar
zhm. Je1zzz/Neonatal_cerebral_lesions_screening_NCLS: release for Zenodo. Zenodo, https://doi.org/10.5281/ZENODO.15772754 (2025).

Download references

Acknowledgements

This study was supported by Guangdong High-level Hospital Construction Fund at Shenzhen Children’s Hospital high-level hospital medical platform project (Shenzhen Children’s Hospital Medical Science Education [2023] No. 7; awarded to L.Z.), National Natural Science Foundation of China (No. 82271996; awarded by L.Z.), National Natural Science Foundation of China (No. 12326619; awarded to D.N.), Frontier Technology Development Program of Jiangsu Province (No. BF2024078; awarded to D.N.), Natural Science Foundation of Hunan province (No. 2022JJ30318; awarded to X.D.) and Scientific Research Project of Hunan Provincial Health Commission (No. W20243151; awarded to X.D.). We sincerely appreciate all members of the Department of Ultrasound, and Neonatology, Shenzhen Children’s Hospital; Department of Ultrasonography, Changsha Hospital for Maternal and Child Health Care; Sichuan Provincial Woman’s and Children’s Hospital/The Affiliated Women’s and Children’s Hospital of Chengdu Medical College; Panyu Maternal and Child Care Service Centre of Guangzhou; Ultrasound Department of Longhua District Maternal and Child Healthcare Hospital; Department of ultrasound, Shenzhen Baoan Women’s and Children’s Hospital; Department of Medical Ultrasonics, Fujian Maternity and Child Health Hospital; and School of Biomedical Engineering, Shenzhen University.

Author information

These authors contributed equally: Zhouqin Lin, Haoming Zhang, Xingxing Duan, Yan Bai.

Authors and Affiliations

Department of Medical Ultrasonics, Shenzhen Children’s Hospital, Shenzhen, PR China
Zhouqin Lin, Jingran Zhou, Fusui Xie, Lei Liu & Luyao Zhou
Shenzhen Pediatrics Institute of Shantou University Medical College, Shenzhen, PR China
Zhouqin Lin, Jingran Zhou, Fusui Xie, Zhen Shentu, Lei Liu & Luyao Zhou
School of Biomedical Engineering, Shenzhen University, Shenzhen, PR China
Haoming Zhang, Jian Wang & Ruobing Huang
Medical Ultrasound Image Computing (MUSIC) Lab, School of Biomedical Engineering, Medical School, Shenzhen University, Shenzhen, PR China
Haoming Zhang, Ruobing Huang & Dong Ni
Department of Ultrasonography, Changsha Hospital for Maternal and Child Health Care, Changsha, PR China
Xingxing Duan
Sichuan Provincial Women’s and Children’s Hospital/The Affiliated Women’s and Children’s Hospital of Chengdu Medical College, Chengdu, PR China
Yan Bai
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, PR China
Jian Wang
School of Artificial Intelligence, Shenzhen University, Shenzhen, PR China
Jian Wang & Dong Ni
National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, PR China
Jian Wang & Dong Ni
Panyu Maternal and Child Care Service Centre of Guangzhou, Guangzhou, PR China
Qianhong Liang
Ultrasound Department of Longhua District Maternal and Child Healthcare Hospital, Shenzhen, PR China
Yayan Chen
Department of ultrasound, Shenzhen Baoan Women’s and Children’s Hospital, Shenzhen, PR China
Hongkui Yu
Department of Medical Ultrasonics, Fujian Maternity and Child Health Hospital, College of Clinical Medicine for Obstetrics & Gynecology and Pediatrics, Fujian Medical University, Fuzhou, PR China
Zongjie Weng
School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, PR China
Dong Ni
Medical School, Shenzhen University, Shenzhen, PR China
Luyao Zhou

Authors

Zhouqin Lin
View author publications
Search author on:PubMed Google Scholar
Haoming Zhang
View author publications
Search author on:PubMed Google Scholar
Xingxing Duan
View author publications
Search author on:PubMed Google Scholar
Yan Bai
View author publications
Search author on:PubMed Google Scholar
Jian Wang
View author publications
Search author on:PubMed Google Scholar
Qianhong Liang
View author publications
Search author on:PubMed Google Scholar
Jingran Zhou
View author publications
Search author on:PubMed Google Scholar
Fusui Xie
View author publications
Search author on:PubMed Google Scholar
Zhen Shentu
View author publications
Search author on:PubMed Google Scholar
Ruobing Huang
View author publications
Search author on:PubMed Google Scholar
Yayan Chen
View author publications
Search author on:PubMed Google Scholar
Hongkui Yu
View author publications
Search author on:PubMed Google Scholar
Zongjie Weng
View author publications
Search author on:PubMed Google Scholar
Dong Ni
View author publications
Search author on:PubMed Google Scholar
Lei Liu
View author publications
Search author on:PubMed Google Scholar
Luyao Zhou
View author publications
Search author on:PubMed Google Scholar

Contributions

L.Z., D.N., and L.L. conceived the idea and designed the experiments. Z.L., H.Z., J.W., R.H., and L.Z. wrote and revised the paper. Z.L., H.Z., J.Z., F.X., and Z.S. collected the data. H.Z. and J.W. developed the deep learning model and validation software. Z.L., H.Z., J.W., R.H., and L.Z. analyzed the data and experimental results. X.D. and Y.B. contributed to the methodology and were responsible for guiding the multi-center data collection. Q.L., Y.C., H.Y., and Z.W. collected the multi-center data.

Corresponding authors

Correspondence to Dong Ni, Lei Liu or Luyao Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Transparent Peer Review file

Source data

Source Data 1

Source Data 2

Source Data 3

Source Data 4

Source Data 5

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lin, Z., Zhang, H., Duan, X. et al. Deep learning approach for screening neonatal cerebral lesions on ultrasound in China. Nat Commun 16, 7778 (2025). https://doi.org/10.1038/s41467-025-63096-9

Download citation

Received: 11 December 2024
Accepted: 06 August 2025
Published: 20 August 2025
Version of record: 20 August 2025
DOI: https://doi.org/10.1038/s41467-025-63096-9

Subjects

Abstract

Similar content being viewed by others

Introduction

Result

Dataset and readers

Comparison of NCLS and radiologists

Performance of junior radiologists with AI assistance

Blind, randomized trial of junior radiologists versus AI

Evaluation of an AI system on blind sweeping data

Discussion

Methods

Data collection and preprocessing

Development of NCLS

Stage 1: Building the view extraction module

Development of the object detection model

Candidate frame scoring algorithm

Stage 2: Developing the diagnostic module

Diagnostic model development

Model validation

Ablation study

SOTA comparison

Evaluation metrics

Statistical analyses

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links