Introduction

Steatotic liver disease (SLD) is a chronic condition associated with an increased risk for cancer-related mortality, mainly for hepatocellular carcinoma1,2,3. SLD is defined as abnormal triglyceride accumulation within the hepatocyte, known as hepatic steatosis2. The term SLD, implemented in June 2023, encompasses a variety of disease subcategories, including metabolic-associated steatotic liver disease (MASLD), metabolic and alcohol-associated liver disease (MetALD), alcohol-associated liver disease (ALD), and several rarer subtypes with specific or cryptogenic etiologies3. Previously known as non-alcoholic fatty liver disease (NAFLD), MASLD is the most prevalent form of SLD4.

Over 25%-30% of the global population has MASLD5,6. Chronic conditions such as obesity, type 2 diabetes, hypertension, and cardiometabolic conditions are associated with MASLD7. Given the increasing rates of obesity and type 2 diabetes in Latin America, the prevalence of MASLD is believed to be underestimated and underreported8. High rates of MASLD have been reported in Chile, with prevalence estimates ranging from 23% in 2000 among those over 18 years old 9 to 47.5% in 2019 among those 38–74 years old10.

The gold standard for diagnosing hepatic steatosis is liver biopsy11. However, its invasive nature and high cost make it less viable for primary healthcare12. Current guidelines in the United States, Europe, and Latin America recommend using alternative imaging technology, such as elastography, to diagnose liver steatosis6,13,14. Unfortunately, this technology is not routinely available worldwide due to its high cost14. Ultrasound is the most commonly recommended alternative to elastography due to its practicality, low cost, safety, and wide availability13,14,15. Even though ultrasound is a feasible alternative, this technique has limitations. Ultrasound is highly operator-dependent, explaining the variability in sensitivity (between 60 and 94%) and specificity (between 84 and 95%), which are also influenced by the degree of steatosis in the liver12. Additionally, the detection of steatosis by ultrasound is affected by visceral fat, hindering the accuracy of this technology among obese patients12,14. Training and quality controls are necessary to achieve a precise technique, as the interpretation of the image can affect the diagnosis of steatosis.

Chile has a high prevalence of MASLD10 and high rates of gallstone disease16, which has been associated with MASLD17,18. A previous report in a Chilean population of adults aged 38 to 74 found the prevalence of gallstones to be twice as high in women compared to men (40% vs 20%)10. Additionally, studies have observed that women with MASLD over the age of 50 have a higher risk of developing advanced fibrosis than men19. The Chile Biliary Longitudinal Study (Chile BiLS), a prospective cohort of women with gallstones, collected ultrasound images that can be used to assess the prevalence of hepatic steatosis in this population18. Chile BiLS used radiology technicians, rather than radiologists, for ultrasound interpretation, following the recommendations of most governments and the World Health Organization to involve highly qualified health workers, such as radiology technicians, in clinical diagnosis, given the lack of specialized physicians in low- and middle-income countries20. Despite the implementation of this approach in diverse countries, concerns about the precision of the exams remain20. Consequently, we reviewed a subset of baseline ultrasound images to ascertain the presence/absence of SLD and to evaluate interobserver and intraobserver agreement in image assessments between the radiology technicians from the Chile BiLS cohort and a university-based radiologist with experience in liver images in the United States.

Methods

Study population and design

This cross-sectional analysis used data from Chile BiLS, an ongoing prospective cohort initiated in 201618. The cohort includes 4338 women aged 50–74 residing in the Cautín Province, Araucanía region. Participants underwent baseline, year-2, and year-4 visits, during which demographic, history of type 2 diabetes or use of medication, blood pressure, history of hypertension or use of medication, physical examination (height, weight, and waist circumference), blood samples, and ultrasound data were collected18. Chile BiLS radiology technicians conducted ultrasound examinations; these technicians had a specialty in radiology and underwent formal ultrasonography training conducted by a radiologist who worked in the cohort. This training was based on the standardized radiological guidelines for ultrasound interpretation21,22. All ultrasound reports were reviewed and signed by a radiologist.

Using computer-generated randomization, we selected a random subset of 425 (~ 10%) participants from the 4032 participants with baseline images available in 2019 (Fig. 1) to assess agreement in identifying liver steatosis. The evaluation involved comparing the findings for hepatic steatosis from the original ultrasound examinations done by eight Chile BiLS radiology technicians (original readings) with the interpretation of a radiologist from Baylor College of Medicine (observer 1), who reviewed still digital ultrasound images taken during the examination. Blinded duplicate images from 34 randomly selected participants (10% of the original subset, which was slightly lower than the 425 finally included) were included to assess reliability. This duplicated set of images was re-evaluated by observer 1 and by two Chile BiLS radiology technicians (observers 2 and 3). Observers 2 and 3 were part of the eight radiology technicians who generated the original readings and contributed four of the readings (three from observer 2 and one from observer 3) to the random set of 425. However, they did not conduct any of the examinations for the participants included in the set of 34 with blinded duplicate images. Finally, observer 2 re-evaluated the duplicated set of 34 images to assess intraobserver agreement (Fig. 2).

Fig. 1
Fig. 1
Full size image

(a) Still ultrasound image (static photograph) of the liver demonstraing that the liver and the kidney were captured on the same plane. (b) A Chile BiLS technician performing an ultrasound on a participant.

Fig. 2
Fig. 2
Full size image

Patient selection for inter and intra-observer comparison flowchart.

Diagnosis of liver steatosis

Liver steatosis was ascertained through hepatobiliary ultrasonography using Siemens ACUSON P500™ and P300™ portable ultrasound machines. During the examination, when the original images were taken, the radiology technician documented and recorded both biliary and liver findings. The radiology technicians were available to explore different angles to determine biliary and liver findings, obtaining a set of still images for each patient. When a radiology technician had concerns (e.g., ultrasound could not be performed due to abdominal adiposity or other reasons, or radiology technicians were not certain of the observed features), findings were discussed with a Chilean radiologist. In the current analysis, the radiology technicians and radiologist identified liver steatosis by comparing the echogenicity of the liver related to the kidney. Observers reviewed the liver parenchyma, and when it was more echogenic than the renal cortex (“brighter”), then steatosis was confirmed23. To determine the degree of steatosis, observers compared whether the right hemidiaphragm could be resolved separately from the liver dome. If the structures were resolvable, they were classified as “mild,” and if they could not be distinguished, they were classified as “moderate/severe”23. None of the images had abnormally echogenic kidneys, and participants had no history of renal disease. In the original readings, hepatic steatosis was categorized into one of four categories: absent, mild, moderate, or severe. Upon re-review, the radiologist and both Chilean radiology technicians categorized the degree of hepatic steatosis into three categories: none, mild, moderate/severe.

Statistical analysis

Intra- and interobserver agreement were evaluated according to the presence versus absence of steatosis (categorized as yes/no) and the severity of steatosis (none, mild, moderate/severe). Interobserver agreement for observer 1 (radiologist) was performed by comparing the review of the 425 randomly selected images (407 in the final review, as described below and in Fig. 2) versus the original readings. For observers 2 and 3 (radiology technicians), review of the duplicate set of 34 participants against the original readings. Concordance between the original readings and observers 1, 2, and 3 was quantified by both percent agreement and kappa values, using Cicchetti-Allison weights24. Kappa was interpreted as 0, “no agreement”, 0.10–0.20, “slight”, 0.21–0.40, “fair”, 0.41–0.60, “moderate”, 0.61–0.80, “substantial”, and 0.81–0.99, “almost perfect agreement”25. A secondary outcome assessed the intraobserver concordance for observers 1 and 2, who re-read the subset of 34 participants. All analyses were performed using “vcd” and “grid” packages in R Version 1.4.1106 © 2009–2021 RStudio, PBC26.

Results

Comparing the randomly selected subset of 425 participants with the whole cohort (4,338), we observed that the health characteristics of the subset were similar to those of the cohort (Table 1). Of note, we observed high rates of all SLD-related chronic diseases for both groups: overweight/obesity (61.5% in the cohort and 65.2% in the subset), severe obesity (28.5% and 25.6%, respectively), diabetes (25.7% and 25.1%, respectively), hypertension (81.3% and 79.7%, respectively), and an elevated percentage of high waist circumferences (94.9% and 95.7%, respectively, with ≥ 80 cm waist circumference), as expected since women with gallstones are a high-risk population. In particular, we found high rates of moderate/severe ultrasound-detected liver steatosis (Table 1).

Table 1 Health profile of the Chile Bils participants and the re-evaluated subset.

Of 425 selected participants, 18 (4.2%) were excluded by observer 1 because the kidney and liver were not on the same plane in the ultrasound image. Among the 407 participants compared in the classification of presence versus absence of steatosis, the agreement between observer 1 and the original reading was slight, with a kappa of 0.12 (95% CI 0.08–0.16, p < 0.001) and a percent agreement of 41.0% (95% CI 36.2–46%) (Fig. 3a). Significant discrepancies were identified in 239 (58.7%) individuals whom observer 1 classified as having “absence” of steatosis, while the original readings indicated “presence” of steatosis (Fig. 3a).

Fig. 3
Fig. 3
Full size image

Inter-observer agreement for presence/absence of liver steatosis between the observers. (a) Agreement of observer 1 (radiologist) against 407 original readings (observer 1 excluded 18 images). (b) Agreement between observer 2 (first Chilean observer) and the subset of 34 images. (c) Comparison of the observer 3 (second Chilean technician) and the subset of 33 images (observer 3 excluded one image).

For the duplicate set of 34 participants, we compared the inter-observer agreement for the presence versus absence of steatosis between the two Chilean radiology technicians (observers 2 and 3) and the original readings. Observer 2 had a fair agreement (kappa: 0.29 (95% CI 0.11–0.58, p < 0.05), with 64.7% (95% CI 46.5–80.3%) agreement (Fig. 3b). Observer 3, who excluded 1 (2.9%) image because the kidney and liver were not on the same plane, had a fair agreement (kappa: 0.32, 95% CI 0.06–0.58, p < 0.05), with 63.6% (95% CI 45.1–79.6%) agreement (Fig. 3c). Both Chilean radiology technicians had discrepancies compared to the original readings, categorizing 10 (29.4%) and 11 (33.3%) individuals, respectively, as having no liver steatosis, while the original readings classified them as having liver steatosis (Figs. 3b–c).

The findings for steatosis level (none, mild, moderate/severe) were similar to those for presence/absence: observer 1 had a slight agreement (weighted kappa: 0.09, 95% CI 0.06–0.11, p < 0.001; percent agreement: 27.6%, 95% CI 23.5–32.4%, Fig. 4a) compared with the original readings. For observers 2 and 3, agreement was fair (weighted kappa: 0.28, 95% CI 0.07–0.49, p < 0.05; percent agreement: 44.1% (95 CI 27.19–62.11%); and weighted kappa: 0.25, 95% CI 0.04–0.47, p < 0.05; percent agreement: 42.4%, 95% CI 25.48–60.78%, respectively) (Fig. 4b–c). Observer 1 classified 100 (24.6%) participants as “none” who were initially classified as “mild” and 139 (32.9%) participants as “none” who were initially classified as “moderate/severe” (Fig. 4a). For observers 2 and 3, discrepancies were related to the classification of “none” versus “mild” [5 (14.7%) and 5 (15.2%), respectively] and “none” versus “moderate/severe” [5 (14.7%) and 6 (18.2%), respectively] (Fig. 4b–c).

Fig. 4
Fig. 4
Full size image

Inter-observer agreement of steatosis severity (none, mild, moderate/severe) between the observers. (a) The agreement of observer 1 (radiologist) against the 407 original readings (observer 1 excluded 18 images). (b) Agreement between observer 2 (first Chilean observer) and the 34 duplicated subsets. (c) Comparison of the observer 3 (second Chilean technician) and the original readings (observer 3 excluded one image).

Regarding the intra-observer agreement, observer 1 excluded 2 (5.9%) images of the 34 participants included for re-review. For the 32 participants re-reviewed, agreement for presence vs. absence of liver steatosis was moderate, with a kappa of 0.45 (95% CI 0.08–0.82, p < 0.05) and a percent agreement of 81.3% (95% CI 63.56–92.79%) (Fig. 5a). Observer 2 excluded 1 (2.9%) image. Of the 33 participants re-reviewed, intra-observer agreement was substantial, with a kappa of 0.64 (95% CI 0.37–0.90, p < 0.001) and a percent agreement of 81.8% (95 CI 64.54–93.02%) (Fig. 5b). Discrepancies for both observers 1 and 2 were related to the categories of “none” versus “mild” (Fig. 6a–b).

Fig. 5
Fig. 5
Full size image

Intra-observer agreement for presence/absence of liver steatosis for radiologist (Observer 1) and Chilean technician (Observer 2). (a) Agreement between the first interpretation of the subset and the second interpretation of observer 1 (radiologist, excluded 2 images). (b) Comparison of the first and second interpretations of the subset ultrasound of the observer 2 (Chilean technician, excluded 1 image).

Fig. 6
Fig. 6
Full size image

Intra-observer agreement of steatosis severity (none, mild, moderate/severe) of radiologist (observer 1) and Chilean technician (observer 2). (a) Agreement between the first interpretaion of the subset and the second interpretaion of observer 1 (radiologist, excluded 2 images). (b) Comparison of the first and second interpretaions of the subset ultrasound of the observer 2 (Chilean technician, excluded 1 image).

Discussion

The increase in SLD rates worldwide and their complications, such as fibrosis, cirrhosis, and cancer27, raises the importance of early detection and management. While elastography (e.g., FibroScan) is recommended, it is not available in many countries or regions due to its high costs28, highlighting that ultrasonography is one of the most accessible and recognized screening techniques for SLD diagnoses12,29, particularly in low and middle-income countries such as Chile.

Early detection of hepatic steatosis is clinically crucial for timely risk stratification, preventive measures, and appropriate clinical interventions.

However, ultrasound has limitations in its interpretation, mainly because it is highly operator-dependent; comparing two or more interpreters is essential to ensure a correct diagnosis. Our findings indicate poor inter-observer agreement, especially between the radiologist (observer 1) and the original readings, particularly in differentiating between “absence” versus “presence” of steatosis. This variability is clinically relevant as accurate early detection impacts timing and clinical interventions. In the early stages of SLD, especially in MASLD, where patients have mild steatosis, management is less invasive; lifestyle modifications, such as diet and exercise, can improve hepatic steatosis in most cases15. As the disease progresses, not only does steatosis progress, but also fibrosis appears, identifying it as metabolic dysfunction-associated steatohepatitis (MASH). MASH has less effective treatments. Studies have shown that improvement in MASH can be achieved if patients lose ≥ 10% of their body mass15,30,31, and due to the difficulties of losing that amount of weight, only 10–20% of the patients can achieve it15,31. On the other hand, more aggressive treatment, such as medication, has not yet proven to be highly effective in MASH32. This highlights the importance of a correct and early diagnosis; since no effective intervention has been demonstrated for patients with MASH, we need to identify the high-risk patients who can significantly benefit from an early, effective, non-invasive intervention.

Across the three observers, discrepancies were predominantly noted between the “none” and “mild” classifications. Such findings align with previous studies reporting lower ultrasound sensitivity in discriminating between none and mild steatosis12, with low sensitivity and specificity (60% and 84%, respectively). This reduced accuracy can be attributed to lower liver fat levels, making it difficult to determine if liver steatosis is present29,33. Recognizing this limitation is clinically significant as mild steatosis may often be underestimated, thereby delaying necessary preventive interventions.

Multiple factors likely contribute to discrepancies between observer 1 and the original findings. First, the cohort had a high prevalence of obesity (25.6% of the participants in the subset have a BMI of over 35 kg/m2, and 96% have a high waist circumference), which can interfere with the image quality. Studies have shown that obese patients have a more significant amount of abdominal adiposity, affecting ultrasound performance12,34. Heinitz et al. 2023 showed that ultrasound has a worse image quality in patients with a BMI above 35 kg/m2, which, in consequence, could affect image interpretation35. Secondly, observer 1 reviewed still images the radiology technicians took in the field instead of videos or conducting a real-time examination. Stored images limited the reviewer to examining one single plane, preventing further findings and exploration that would help the disease diagnosis. This limitation is highlighted by a study of interobserver agreement between three radiologists with over eight years of experience in ultrasonography. In this study, the radiologists reviewed still ultrasound images and did not produce high interobserver agreement, but only fair to moderate agreement36. Thus, the accuracy of the diagnosis could be affected not only by the degree of steatosis, which is a known limitation of ultrasound, but also by the material available for review, since real-time ultrasound gives a better perspective of the steatosis in the liver. A study comparing liver steatosis diagnosis in real-time ultrasound vs. liver biopsy showed that when liver steatosis exceeded 20%, the sensitivity and specificity of ultrasound in real-time increased to 100% and 90%, respectively33.

On the other hand, we hypothesize that the Chilean observers had fewer discrepancies with the original readings because all Chile BiLS radiology technicians, including those who conducted the original ultrasound examinations, received the same standardized training. Using the same ultrasound instruments, regularly performing the examinations in the same environments, following the same protocol, and capturing the images using established procedures could affect image interpretation. Although the intra-observer agreement for observers 1 and 2 was higher than the inter-observer agreement, observer 1’s (radiologist) readings showed moderate agreement between the first and second interpretations. In contrast, observer 2 (Chilean radiology technician) obtained substantial agreement. This result is not surprising since, as shown in a previous study, the chances of discrepancies in interpretation are higher between individuals than with oneself36.

Even though interobserver agreements for the three observers were not higher than moderate, the health profile of the Chile BiLS participants suggests a notably high prevalence of steatotic liver disease (SLD) in the cohort. Epidemiological studies have shown that subcategories of SLD, such as MASLD and MetALD, are the most prevalent and are strongly related to metabolic syndrome3,27. Main risk factors associated with MASLD and MetALD are obesity, diabetes, and hypertension37,38. Studies have shown that 65% of obese patients have MASLD, while 70% of patients with a diagnosis of type 2 diabetes have MASLD39. In Chile, previous reports from the 2017 Chilean National Health Survey showed that women over age 50 had high rates of overweight (43.6%) and obesity (41.7%), high hypertension (27.7%), and type 2 diabetes (14.0%)40. These data are similar to what we found in the Chile BiLS cohort. Given these substantial burdens of obesity, diabetes, and related metabolic conditions, the high prevalence of SLD identified by Chile BiLS radiology technicians seems reasonable. It underscores the importance of targeted screening and monitoring strategies in high-risk populations. Studies have shown the importance of SLD screening in populations with a high burden of obesity and diabetes, because the coexistence of this condition with SLD has a higher risk of disease progression41. Although the Chile BiLS cohort was initially designed to study gallbladder disease and cancer, its detailed characterization of metabolic risk factors and comprehensive ultrasound assessments provide an invaluable opportunity to explore SLD screening and its clinical implications within a high-risk group, thereby contributing important insights that may inform preventive strategies in similarly vulnerable populations worldwide.

Finally, we acknowledge several limitations of our study. First, the primary purpose of the Chile BiLS cohort was to study gallbladder disease and cancer, not SLD. Although SLD was a secondary outcome, the radiology technicians were trained to fully assess the hepatobiliary system. Still, it is possible that some ultrasound images may have focused more on gallbladder visualization than on optimal liver imaging. Second, our intra-observer (and for observers 2 and 3, and inter-observer) agreement analyses were based on a relatively small sample, leading to imprecision in the estimates. However, as shown in other studies and guidelines42, this sample size is sufficient to perform a robust statistical analysis and offers useful insight into the challenges of observer agreement42. Despite these limitations, our findings offer meaningful insight into the consistency and challenges of ultrasound-based diagnosis of SLD in a high-risk population and reinforce the importance of standardized training and quality control in implementing imaging-based screening strategies, particularly in resource-limited settings.

Conclusion

In summary, our study emphasizes the importance of evaluating inter- and intra-observer agreement to optimize the reliability of ultrasound-based diagnosis of SLD, particularly given the critical role of ultrasound in early detection of SLD in resource-limited settings. The notably high prevalence of SLD identified by radiology technicians in the Chile BiLS cohort aligns closely with the participants’ high-risk metabolic profile, supporting the validity of ultrasound as an effective screening tool when standardized training is applied. Our findings reinforce the necessity of implementing targeted SLD screening programs with rigorous quality control and suggest that broader integration of trained ultrasound operators could substantially improve early detection efforts.