Quantitative analysis of the comprehensiveness and granularity of biomedical terminology systems

Chang, Eunsuk; Sung, Sumi

doi:10.1038/s41598-025-17737-0

Download PDF

Article
Open access
Published: 03 October 2025

Quantitative analysis of the comprehensiveness and granularity of biomedical terminology systems

Eunsuk Chang¹ &
Sumi Sung²

Scientific Reports volume 15, Article number: 34525 (2025) Cite this article

437 Accesses
Metrics details

Subjects

Abstract

Modern healthcare interoperability demands objective methods for quantitatively evaluating the coverage and granularity of biomedical terminology systems to support evidence-based selection and integration decisions. We introduce novel metrics—structural size (an integrated measure of width and depth), mapping burden ratio (a measure of relative granularity between systems), and content overlap—to quantitatively evaluate the semantic integration potentials of five major terminology systems: SNOMED CT; Logical Observation Identifiers Names and Codes (LOINC); International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM); Gene Ontology (GO); and Current Procedural Terminology. The Unified Medical Language System Metathesaurus was employed to establish semantic equivalency between concepts from different systems. SNOMED CT exhibited superior granularity across most clinical domains, with some exceptions (ICD-10-CM in “Qualifier value,” GO in “Observable entity,” and LOINC in “Staging and scales.”) These findings address the challenge of semantic degradation in health information exchange by quantifying the degree to which meaning might be lost when translating between terminology systems. The proposed metrics empower healthcare organizations to develop targeted extensions or integration strategies that maintain semantic consistency across systems, providing objective tools for terminology system selection, integration planning, and semantic interoperability assessment.

Study on the predictive value of preoperative CT features for the mitotic index of GIST based on the nomogram

Article Open access 13 March 2025

Evaluating large language models on medical evidence summarization

Article Open access 24 August 2023

Comparative evaluation of large language models performance in medical education using urinary system histology assessment

Article Open access 29 August 2025

Introduction

Interoperability—the ability of disparate information systems, devices, or applications to meaningfully connect, exchange, and interpret shared data—has emerged as a cornerstone of modern healthcare systems, particularly in the context of rapidly advancing digital health technologies¹. Interoperability facilitates the efficient exchange and meaningful integration of health data across diverse platforms and institutions, ensuring that healthcare providers can access relevant information at the point of need². It plays a pivotal role in improving clinical outcomes by addressing data silos, minimizing redundancies, and supporting data-driven decision-making³.

In the context of terminology systems, interoperability implies that clinical ideas represented by one system should be able to be interpreted without loss of meaning by another. The recent emergence of health information exchange and common data models has highlighted a critical challenge: the heterogeneity of healthcare information technology systems, diverse data standards, and the complexities of terminology harmonization⁴. This is exemplified by Fast Healthcare Interoperability Resources (FHIR)⁵ and Observational Health Data Sciences and Informatics (OHDSI)⁶, which designate specific terminology systems for different domains. For example, FHIR recommends Logical Observation Identifiers Names and Codes (LOINC)⁷ as the terminology system for its Observation resource to express patient observations⁸, whereas the OHDSI has designated SNOMED CT⁹ as the standardized concept system for multiple domains, including Condition, Procedure, Device, Observation, and Measurement¹⁰. When migrating clinical observation-related data from FHIR to OHDSI, LOINC-coded terms must to be translated into SNOMED CT.

Moreover, terminology systems continuously evolve, sometimes incorporating other systems to expand their coverage and enhance their expressiveness. A notable example is the ongoing collaboration between SNOMED International and LOINC, which aims to enhance interoperability and reduce redundancy in healthcare terminology¹¹. The primary goal was to integrate LOINC’s laboratory and pathology content with SNOMED CT’s comprehensive clinical terminology, enabling easier exchange and collection of data across systems.

Data migration and system incorporation involve complex terminology integration processes. Success depends on understanding the relative coverage and granularity of terminology systems—knowledge that currently relies on subjective assessment rather than objective measurement. To date, no systematic evaluation has compared the relative strengths in granularity or content coverage of each international standard terminology such as RxNorm¹², Current Procedural Terminology (CPT)¹³, International Classification of Diseases (ICD)¹⁴, and Gene Ontology (GO)¹⁵. This limitation creates significant challenges for healthcare organizations seeking to select the best data migration strategy or to understand the implications of mapping between different systems. Without quantitative metrics, healthcare organizations will face potential semantic degradation—the loss of meaning that occurs when clinical concepts are translated between terminology systems with different levels of granularity or coverage. This semantic degradation can lead to ambiguity, reduced clinical accuracy, and compromised data quality in health information exchange, ultimately affecting patient care and clinical decision-making.

Addressing heterogeneity challenges requires establishing frameworks that eliminate ambiguity and reduce errors in data exchange, creating a more accurate and secure healthcare ecosystem. Quantitative metrics for the relative granularity and content coverage of terminology systems can help identify conceptual gaps, predict potential semantic preservation or degradation during cross-terminology mapping, and reveal opportunities for strategic terminology expansion.

Therefore, this study aimed to develop and demonstrate novel quantitative metrics to measure terminology system characteristics and apply them to evaluate the content coverage and granularity of major biomedical terminologies across clinical domains. We introduce three key metrics: structural size, mapping burden ratio (MBR), and content overlap. These metrics provide objective, quantitative tools for evaluating terminology systems, supporting evidence-based decisions in healthcare interoperability initiatives, and addressing semantic challenges in health information exchange and integration.

Related work

Previous works have emphasized the need to evaluate the quality of terminology systems across various dimensions. Cimino¹⁶ suggested the following requirements for standard terminology: vocabulary content, concept orientation, concept permanence, non-semantic concept identifiers, poly-hierarchy, formal definitions, rejection of “not elsewhere classified” terms, multiple granularities, multiple consistent views, context representation, graceful evolution, and recognized redundancy. Similarly, Rosenbloom et al.¹⁷ proposed an evaluation model for interface terminologies, which includes parameters such as concept accuracy, term expressivity, degree of semantic consistency for term construction and selection, adequacy of assertional knowledge supporting concepts, degree of complexity of pre-coordinated concepts, and the terminology’s human readability.

In another work, Rosenbloom et al.¹⁸ demonstrated that SNOMED CT had excellent content coverage for MEDCIN and Categorical Health Information Structured Lexicon, while highlighting that concepts alone are not sufficient if the relationships between them cannot be properly represented. They introduced “degrees of freedom” as a quantitative measurement of terminology complexity, providing a standardized method to assess the representational burden of different interfaces. They also identified cases where concepts exist but terminological synonymy prevents proper mapping, suggesting that concept-level semantic matching yields more meaningful results than term-level exact matching for effective comprehensiveness evaluation.

He et al.¹⁹ measured the density (the amount of concept information represented, even in the same sub-domains) of SNOMED CT conceptual content by comparing trapezoid topological patterns between SNOMED CT and reference terminologies. They defined reference terminologies as a subset of English terminologies in the Unified Medical Language System (UMLS)²⁰ that had ‘PAR’ relationships labeled with ‘IS_A’ and over 10% overlap with one or more SNOMED CT hierarchies. Their methodology involved identifying specific topological patterns between terminology pairs and quantifying density differences based on path length ratios. These “trapezoids” occurred when the same concepts (identified by UMLS Concept Unique Identifiers) appeared in both SNOMED CT and a reference terminology but had different numbers of intermediate concepts. The researchers exhaustively analyzed 1:k and k:1 trapezoids (where k ranged from 1 to 9), revealing significant variations in conceptual density across SNOMED CT hierarchies.

Methods

Measuring terminology structural size

In this study, we propose metrics that provide quantitative descriptions of integration or expansion potential that are universally applicable to any terminology system. Before defining these metrics, we formally describe the hierarchical structures of terminology systems.

A knowledge graph is a directed graph in which nodes represent entities and edges represent relationships between them²¹. A terminology system, which is interchangeably referred to as “ontology” in this paper, is a specialized knowledge graph that describes concepts within a specific domain and their relationships. Our analysis focuses exclusively on vertical relations (i.e., is_a relations) rather than horizontal relations (e.g., attribute relations) to avoid dealing with cyclic structures in the knowledge graph.

With this approach, a terminology system takes a triangular shape, with its root concept at the apex and leaf concepts at the bottom, which we term a terminology system triangle (Fig. 1). We correlate the number of leaf concepts to the width of the triangle base, representing the breadth of the terminology system. This measurement captures the terminology’s horizontal coverage—how comprehensively it spans diverse medical concepts at its most granular level. A wider base indicates a terminology system that addresses more distinct medical concepts, similar to how a broader foundation supports a more extensive structure.

$$\:Width\:of\:terminology\:system\:\left(w\right)\:\approx\:\:Total\:number\:of\:leaf\:concepts\:$$

(1)

The depth dimension (representing the average number of hierarchical levels) complements this by measuring the vertical elaboration of concepts—how specifically the terminology defines relationships between general and detailed concepts.

$$\:Depth\:of\:terminology\:system\:\left(d\right)\:\approx\:\:Average\:of\:depth\:levels\:to\:each\:leaf\:concept$$

(2)

When a concept has multiple parent concepts (multi-hierarchical structure), we calculate its depth level as the average number of nodes along all possible distinct routes from that concept to the root concept (Fig. 2). Formally,

$$\:Depth\:of\:terminology\:system\:\left(d\right)\:\approx\:\:\frac{1}{w}{\sum\:}_{j=1}^{w}\left(\frac{1}{n}{\sum\:}_{i=1}^{n}{L}_{j}^{i}\right)$$

(3)

Here, w is the number of leaf concepts, n is the number of possible routes from the root to the leaf, and $\:{L}_{j}^{i}$ is the number of concepts on the j^th route from the root concept to the i^th leaf concept.

The area of the triangle corresponds to the structural size of the terminology, representing an integrated measure of a terminology system’s width (w) and depth (d; Fig. 1):

$$\:Structural\:size\:of\:terminology\:system\:\left(A\right)\:\approx\:\:\frac{1}{2}\times\:w\times\:d$$

(4)

The structural size of a terminology system quantifies how extensively it maps the conceptual space of a domain in both breadth and specificity. This triangular model effectively visualizes how terminology systems structurally expand across the medical knowledge domain, with wider systems covering more distinct concepts and deeper systems providing more nuanced hierarchical relationships between those concepts.

Comparing granularity among terminology systems

Some terminology systems provide more granular concepts than others. For example, the ICD-10-CM leaf concept I49.01 Ventricular fibrillation has no child concepts but corresponds semantically to the SNOMED CT concept 71908006 Ventricular fibrillation (disorder), which has four child concepts (Fig. 3). Therefore, SNOMED CT provides more granularity than ICD-10-CM in defining the concept of ventricular fibrillation.

We adopt the idea of mapping burden to quantitatively measure a terminology’s granularity. The concept mapping burden is defined as the number of concepts from other terminology systems that map to that concept. For example, as shown in Fig. 3, when mapping ICD-10-CM concepts to SNOMED CT concepts, four SNOMED CT concepts (Idiopathic ventricular fibrillation not Brugada, Paroxysmal familial ventricular fibrillation, Sustained ventricular fibrillation, and Ventricular fibrillation and flutter) and Ventricular fibrillation itself map to the ICD-10-CM concept Ventricular fibrillation, because ICD-10-CM’s Ventricular fibrillation is the most granular concept available to accommodate the semantics of these five SNOMED CT concepts. In this example, the mapping burden of the ICD-10-CM concept I49.01 Ventricular fibrillation to SNOMED CT is 5.

Formally,

$$\:ConceptMappingBurden\left(\widehat{r}\right)=NumberOfDescendants\left(s\right)+1,\:\widehat{r}\equiv\:s$$

(5)

where $\:\widehat{r}$ is a leaf concept of terminology system R (receiving concept), s is a non-leaf concept of terminology system S (source concept), and $\:\widehat{r}$ and s are semantically equivalent (synonymous).

We use the UMLS to define the semantic equivalency between concepts from varying terminology systems. The UMLS provides the Metathesaurus, a semantic network from nearly 200 biomedical vocabulary systems²². The UMLS’s Concept Unique Identifiers (CUIs) link all synonymous concepts from several source terminology systems²². Therefore, we consider two concepts from different terminology systems to be semantically equivalent if they share the same CUI.

Notably, the mapping burden can be measured only for leaf concepts. We cannot measure the mapping burden of non-leaf concepts because different terminology systems may divide their children in individual, yet equally valid, ways. For example, SNOMED CT provides 24 child concepts for “Malignant neoplasm of breast,” whereas ICD-10-CM provides only nine children for the semantically corresponding concept. This variation in classification across terminology systems led us to examine mapping burden only for leaf concepts.

The relative mapping burden of a terminology system compared to another can be measured by domain to identify which terminology system provides superior granularity in each domain. We can calculate the mapping burden relative to a source terminology system by summing the mapping burdens of all leaf concepts in the receiving terminology system. Formally,

$$\:MappingBurden(R\leftarrow\:S)=\frac{1}{n}{\sum\:}_{i=1}^{n}\left(ConceptMappingBurden\left({\widehat{r}}_{S}^{i}\right)\right)$$

(6)

where $\:{\widehat{r}}_{S}^{i}$ is the i^th leaf concept of the receiving terminology system R that has a semantically equivalent non-leaf concept from the source terminology system S, and n is the number of such leaf concepts in the receiving terminology system R. Therefore, the mapping burden measures how much detail might be lost when translating between two terminology systems. When mapping from a more detailed system to a less detailed one, multiple specific concepts must sometimes be combined into a single broader concept, creating a “burden” of lost specificity.

The crude MBR is the ratio of the mapping burden of the receiving terminology system to that of the source terminology system:

$$\:CrudeMBR(R:S)=\frac{MappingBurden(R\leftarrow\:S)}{MappingBurden(S\leftarrow\:R)}$$

(7)

To moderate extreme values and set the middle point at 0, we take the log10 of the crude MBR to yield the MBR:

$$\:MBR(R:S)=log\left(\frac{MappingBurden(R\leftarrow\:S)}{MappingBurden(S\leftarrow\:R)}\right)$$

(8)

We can only calculate the relative mapping burden between two terminology systems, since there is no gold standard biomedical terminology system that perfectly covers all necessary concepts in either the real or theoretical world to provide the basis for measuring other terminology systems’ mapping burdens. Instead, the terminology system with the greatest structural size based on Eq. (4) could serve as a “pseudo-gold standard” to measure the mapping burden of other terminology systems.

$$\:StandardMBR(R:SS)=log\left(\frac{MappingBurden(R\leftarrow\:SS)}{MappingBurden(SS\leftarrow\:R)}\right)$$

(9)

Here, SS is a pseudo-gold standard terminology system with the greatest structural size measurement. The negative values of the standard MBR indicate greater granularity in terminology system R than in SS, while positive values indicate that SS provides more granular concepts. While the standard MBR could be applied to compare entire terminology systems, its domain-specific application reveals critical granularity variations that directly impact semantic interoperability. Designating a pseudo-gold standard terminology system enables a standardized comparison of granularity among various terminology systems by measuring the mapping burden against one established reference terminology system.

Content overlap between two terminology systems

Not all concepts from one terminology system can be mapped to domains defined by another system. For instance, many GO concepts, being focused on genetic and molecular functions, do not align with SNOMED CT’s clinically oriented domains. This misalignment can lead to the underestimation of GO’s mapping burden when compared across SNOMED CT domains. Therefore, to enable fair comparison, it is important to quantify the degree of content overlap between terminology systems when interpreting domain-specific MBRs.

We define the proportion of terminology system S overlapping on R as the proportion of leaf concepts in R that are mapped from or map to S:

$$Overlap(S\:on\:R) = (SDR + RDS) \div NR$$

(10)

Here, SDR is the number of R’s leaf concepts that have descendants in S (R’s leaf concepts mapped to non-leaf concepts of S), RDS is the number of R’s leaf concepts whose ancestor is a leaf concept of S, and NR is the total number of leaf concepts in R. A lower overlap proportion indicates less shared conceptual coverage between the two terminology systems.

Figure 4 illustrates this calculation. In the example, terminology systems R and S have 12 (labeled uppercase A-L) and 13 (labeled lowercase a-m) leaf concepts, respectively. For system S’s overlap on R, two leaf concepts of R (H, J) have descendants in S, and four leaf concepts (I, J, K, L) have ancestors that are leaf concepts in S. After removing duplicates (J appears in both sets), the total number of overlapping concepts is five. The proportion of S on R is therefore 5/12 = 0.42.

Selection of terminology systems

To demonstrate a real-world application of our metrics, we tested our method on selected UMLS source terminologies. Bales et al.²³ classified UMLS source terminologies based on scale-free and small-world properties. Scale-free networks tend to follow a power-law distribution in average node degree (average number of links per node). In a scale-free terminology system, only a few nodes (hubs) are highly connected to neighbors, while most nodes have only a few neighbors. Small-world terminology systems have highly clustered edges, represented by attribute or horizontal relationships within the terminology system.

To demonstrate the generalizability of our method across terminology systems with distinct topological characteristics, we selected terminology systems that differ in terms of scale-free and small-world features (Table 1). SNOMED CT, GO, and the CPT represent scale-free systems, whereas ICD-10-CM and LOINC represent non-scale-free systems. Regarding clustering features, SNOMED CT, LOINC, and GO are small-world, whereas ICD-10-CM and CPT are non-small-world. We also selected terminology systems that differ by domain coverage. While SNOMED CT covers a broad spectrum of clinical medicine, LOINC, ICD-10-CM, and CPT have specific focus areas (i.e., laboratory measurements, disease names, and procedures, respectively). GO is highly specialized for genes.

Table 1 Scale-free and small-world characteristics of the terminology systems tested in the current study.

Full size table

Execution of UMLS

We examined semantic equivalency between terminology systems using the UMLS Metathesaurus, a semantic network that links semantically similar concepts from nearly 200 terminology systems²⁴. We loaded the MRCONSO and MRHIER tables from the UMLS23AB Rich Release Format data files to a local MySQL database. When exploring the hierarchies of each terminology system, we used the following term types:

Designated Preferred Term (PT) for SNOMED CT and GO;
LOINC Parts Display Name (LPDN), LOINC Official Fully Specified Name (LN), Display Name (DN), and Hierarchical Class (HC) for LOINC;
PT and Hierarchical Term (HT) for ICD-10-CM;
PT, HT, Preferred Names of Modifiers (MP), Place of Service (POS), and Global Period (GLP) for CPT.

We examined synonymity between two terminology systems using CUIs. If two concepts shared a common CUI but only one had child concepts, we calculated the mapping burden of the other (a leaf concept) by counting the number of descendant concepts of its synonymous counterpart. Table 2 shows some examples of synonym pairs mapped by the UMLS (with common CUIs) and the number of descendant SNOMED CT concepts to the pair. We calculated the mapping burden across every leaf concept of a terminology system against the other terminology system and vice versa.

Table 2 Examples of leaf concepts from various terminology systems mapping to SNOMED CT concepts that have multiple descendants.

Full size table

We inspected links between concepts using the MRHIER table of the UMLS Metathesaurus. We considered the hierarchical is_a links invalid if they did not exist in the MRHIER table. We did not consider relations other than is_a, such as occurs_in and component_of, in this study.

Selection of a pseudo-gold standard terminology system and domain categorization

To enable standardized comparisons across terminology systems, we designated the terminology system with the greatest structural size as our pseudo-gold standard. Structural size—the product of a terminology’s width (comprehensive concept foundation) and depth (hierarchical specificity)—provides a robust criterion for this selection, as it identifies systems with both a broad foundation structure and detailed hierarchical relationships. This approach ensured sufficient conceptual overlap with specialized systems while providing adequate hierarchical complexity for meaningful comparisons of granularity.

Using this pseudo-standard terminology, we also adopted its top-level concepts as domain categories for cross-system analysis, excluding any system-specific organizational concepts. This approach ensured conceptual consistency between our reference standard and domain categorization framework, enabling more coherent comparative analysis. The reference system thus functioned as a standardized “coordinate system” against which other terminologies could be positioned according to their relative granularity within specific clinical domains, without implying inherent superiority for all purposes.

Results

Table 3 shows the basic distributional characteristics of the examined terminology systems. Among the five terminology systems tested, SNOMED CT excelled in the number of connected concepts and leaf concepts, average depth levels to leaf concepts, and structural size.

Table 3 Distributional characteristics of the examined terminology systems and MBR to SNOMED CT.

Full size table

To calculate the standard MBR either at a system level or by domain, we designated SNOMED CT as the reference terminology system because it had the greatest structural size. We used the 17 topmost concepts of SNOMED CT as the domains for calculating the standard MBR: Body structure, Clinical finding, Environment or geographical location, Event, Observable entity, Organism, Pharmaceutical/biologic product, Physical force, Physical object, Procedure, Qualifier value, Situation with explicit context, Social context, Specimen, Staging and scales, Substance, and Record artifact. We did not consider Model component and Special concept as domains because they are specific to SNOMED CT.

Comparing system-level MBRs to SNOMED CT revealed substantial variations in granularity across terminology systems (Table 3). While most systems showed MBR values greater than 0, indicating that SNOMED CT provided greater granularity in shared conceptual spaces, GO stood out with an MBR of − 0.87, substantially below 0. This finding demonstrated that GO has considerably more granular concepts than SNOMED CT in areas of conceptual overlap. On the other hand, the relatively high positive MBR values for LOINC (0.88) and CPT (0.57) indicated that SNOMED CT provides significantly more detailed concepts in their shared domains, suggesting potential semantic compression when mapping from SNOMED CT to these systems. ICD-10-CM’s more moderate MBR (0.14) indicated less dramatic granularity differences compared to SNOMED CT, although SNOMED CT still provided more detail overall.

Fig. 5 shows the standard MBR of the LOINC, ICD-10-CM, GO, and CPT against SNOMED CT by domain. The standard MBR of 1.57 of ICD-10-CM to SNOMED CT in the Event domain indicated that, on average, ICD-10-CM’s mapping burden was 37.15 (= 10^1.57) times greater than SNOMED CT’s in this domain. This demonstrated that SNOMED CT provides significantly more granular event-related concepts than ICD-10-CM. Consequently, when mapping from SNOMED CT to ICD-10-CM in this domain, substantial semantic loss is likely to occur, as multiple specific SNOMED CT concepts must be condensed into fewer, broader ICD-10-CM concepts. Conversely, in the Observable entity domain, GO’s mapping burden was 0.0019 (= 10^–2.72) times that of SNOMED CT, indicating that GO offers much greater granularity in this area. Thus, mapping from GO to SNOMED CT would result in significant semantic information loss as GO’s highly specific observable entity concepts would need to be generalized to fit SNOMED CT’s broader categories.

When analyzed by domain, LOINC and ICD-10-CM showed greater granularity than SNOMED CT in the Environment or geographical location (standard MBR = − 5.96) and Qualifier value (standard MBR = − 1.91) domains, respectively. Non-SNOMED CT concepts that had more granular descendants than their SNOMED CT synonymous counterparts can be found in the Supplementary Materials (filter the “SCT provides more granular descendants” column to the value “0”). However, SNOMED CT generally provided superior granularity across various domains. SNOMED CT concepts that had more granularity than their synonymous non-SNOMED CT counterparts are detailed in the Supplementary Materials (filter the “SCT provides more granular descendants” column to the value “1”).

Post-hoc analysis revealed that LOINC’s greater granularity in the Environmental or geographical location domain resulted from an erroneous UMLS mapping. The LOINC concept 29693-6 Laboratory was incorrectly mapped to SNOMED CT’s 261904005 Laboratory (environment) rather than the semantically appropriate 15220000 Laboratory test (procedure). This misalignment became evident when examining the descendants of the LOINC concept, which represent laboratory test-related entities such as Coagulation and HNA. To validate this finding, we corrected the mapping by linking LOINC’s 29693-6 Laboratory to SNOMED CT’s 15220000 Laboratory test (procedure). This correction shifted the MBR of LOINC relative to SNOMED CT in the Procedure domain to − 0.99 while eliminating the spurious granularity advantage in the Environmental or geographical location domain.

Analysis of overlapping proportions revealed varying degrees of conceptual alignment between terminology systems (Table 4). ICD-10-CM and GO showed notably low overlap with SNOMED CT (13.5% and 17.7% respectively), suggesting significant unique content in these systems. Thus, incorporating concepts from ICD-10-CM and GO could substantially expand SNOMED C’s coverage into new domains. In contrast, CPT demonstrated high overlap with SNOMED CT (60.5%) despite sharing only three domains (Procedure, Qualifier value, and Substance). This high overlap suggests that SNOMED CT effectively covers most CPT concepts within these shared domains, and the contribution of CPT to the enrichment of SNOMED CT is minimal, making SNOMED CT a potential substitute for CPT.

Table 4 Content overlap of the examined terminology systems with SNOMED CT.

Full size table

Discussion

In this study, we proposed and demonstrated novel quantitative metrics for evaluating terminology systems, providing a systematic approach for quantitative assessment of their comprehensiveness and granularity. The MBR quantifies the relative granularity between terminology systems, predicting potential semantic degradation during cross-system translation, while content overlap measures shared conceptual space between systems.

These metrics can be combined to assess both enrichment potential and semantic preservation risk during information exchange at the system level, which are key dimensions of semantic interoperability. The expansion potential of one terminology system using another system can be estimated by examining granularity alignment (i.e., MBR) alongside content overlap. For interoperability purposes, a system may be considered sufficiently granular if its MBR indicates adequate alignment with the target system’s level of detail. It may be considered comprehensive enough if there is sufficient content overlap to ensure conceptual coverage. High absolute MBR values combined with low content overlap indicate significant enrichment opportunities; however, they also suggest a greater risk of semantic degradation when mapping in the reverse direction. This creates an inverse relationship between enrichment potential and semantic preservation during bidirectional exchange, indicating limited semantic compatibility.

Importantly, the proposed metrics of MBR and content overlap are not only useful for theoretically evaluating existing terminology systems but also have practical value in supporting interoperability between local and reference terminologies. In real-world scenarios, such as hospital system migrations, where differences in coding schemes necessitate large-scale mapping, these metrics can help estimate the scope of work, prioritize domains for mapping based on expected granularity, and guide the selection of appropriate reference terminologies, ultimately yielding actionable insights for system integration across heterogeneous healthcare environments. In such settings, these metrics enable system alignment by identifying domain-specific mismatches in granularity or scope and highlighting areas where terminologies may need harmonization, manual curation, or extension. For instance, high-MBR/low-overlap domains can be flagged for custom modeling or focused validation, while low-MBR/high-overlap domains may support more automated or lossless integration. The ongoing integration of SNOMED CT and LOINC exemplifies how domain-specific granularity and content overlap metrics can inform terminology enrichment and semantic interoperability efforts at scale¹⁵. A negative MBR for LOINC in specific domains signals areas where LOINC provides more granular concepts that could enrich SNOMED CT, particularly regarding laboratory findings and clinical measurements. Conversely, positive MBR values indicate domains where SNOMED CT offers superior granularity, suggesting potential areas for LOINC enhancement through SNOMED CT integration. At the semantic integration level, domain-specific and concept-level MBR reporting provides stakeholders with quantitative assessments of information preservation during cross-terminology mapping. Healthcare organizations can use these metrics to estimate semantic loss when migrating between coding systems, identify domains requiring manual review or supplementary coding, and develop quality assurance protocols for terminology mapping projects.

As new healthcare domains emerge (e.g., digital therapeutics, remote patient monitoring, and precision medicine), new terminology systems are likely to be developed by authorized entities or social computing²⁵. Our content overlap and mapping burden metrics offer objective criteria for assessing these systems’ relative strengths and limitations. This evaluation approach not only aids healthcare organizations in making informed decisions about adopting new terminology systems but also validates our proposed metrics and evaluation methodology.

While our findings demonstrated that SNOMED CT provides superior granularity to other terminology systems across many clinical domains, they should be interpreted within the context of each terminology system’s intended purpose and modeling approach. For example, unlike SNOMED CT, LOINC is not based on an ontology-driven model but follows a predefined tabular structure to describe clinical procedures and instruments in six parts (component, property, time aspect, system, scale, and method), making it more effective for representing the clinical concepts of laboratory measures and clinical instruments than SNOMED CT. In the same vein, the low overlap between SNOMED CT and terminologies like ICD-10-CM and GO reveals a significant limitation in using SNOMED CT as a gold standard for evaluating granularity. While SNOMED CT might demonstrate superior granularity in certain shared domains, it only captures a fraction of ICD-10-CM’s or GO’s full conceptual scope due to the fundamental differences in design purpose: ICD-10-CM prioritizes the statistical classification of diseases for epidemiological and billing purposes, while GO was developed specifically for annotating gene products across species. These systems likely have greater granularity in domains that lie outside SNOMED CT’s coverage, making domain-specific granularity comparisons potentially misleading.

Our study was constrained to using SNOMED CT’s topmost concepts as domain categories due to the absence of standardized domains for evaluating terminology systems. This limitation underscores a critical need in healthcare informatics: the establishment of universal domain standards that span the spectrum of biomedical knowledge, from molecular biology to social medicine. Such standardization would ideally come from an authorized body capable of defining comprehensive domains that can effectively categorize concepts across diverse terminology systems. While the creation of entirely new domain standards represents a desirable long-term goal, a more immediate and pragmatic approach may involve expanding and refining the domain structures within existing terminology systems, particularly those like SNOMED CT, which already offer a high degree of granularity and conceptual richness.

Our study had several limitations that should be acknowledged. Firstly, our analysis relied heavily on the UMLS Metathesaurus for determining semantic equivalency between concepts from different terminology systems. As in our observed case of the LOINC concept 29693-6 Laboratory being incorrectly mapped to SNOMED CT’s 261904005 Laboratory (environment), UMLS mappings systematically suffer from several well-documented issues that may affect our metrics. The UMLS often maps specific concepts from one terminology to more general concepts in another, potentially understating differences in granularity. In their systematic quality analysis of mapping between MedDRA and ICD through the UMLS, Zhang et al.²⁶ reported that about 49% of the mapped pairs were not exact matches due to differences in granularity and focus. In our study, several granularity differences in a set of assumptively synonymous concept pairs in the UMLS were identified, such as “Burn of first degree of left wrist (ICD-10-CM)”–“Epidermal burn of left wrist (SNOMED CT)” (CUI: C2231652), “Venous engorgement (ICD-10-CM)”–“Retinal venous engorgement (SNOMED CT)” (CUI: C0154843), and “CD25 blasts (LOINC)”–“Blast cell positive for CD25 antigen (SNOMED CT)” (CUI: C1979642). These differences in granularity have important implications for data interoperability and clinical information exchange, causing information loss when mapping between terminologies and systematically influencing our MBR calculations. Future studies should explore alternative methods for establishing concept equivalency or develop mechanisms to validate UMLS mappings.

Secondly, we considered only hierarchical (is_a) relationships in our analysis, excluding other types of semantic relationships that may be important for assessing terminology systems’ expressiveness. For example, our analysis did not include SNOMED CT’s attribute relationships and GO’s part_of relationships. Furthermore, by focusing solely on hierarchical relationships, we could not observe differences in mapping burden distribution between scale-free and non-scale-free or small-world and non-small-world networks. A more comprehensive analysis incorporating various relationship types may reveal important differences in quantitative features across these distinctive topological characteristics.

We acknowledge that the clinical utility of these metrics requires validation against real-world performance outcomes. The correlation between our proposed metrics and actual interoperability performance represents a critical area for future research. For example, the mapping burden of one terminology against another can be validated by measuring inter-rater agreement when clinical coders map synonymous concepts using both terminologies, with higher MBR values hypothesized to correlate with lower coding consistency and accuracy. Recall and precision rates for standardized clinical queries regarding specific concepts (e.g., genomic substances) across systems can be compared among different terminologies, with higher content overlap expected to correlate with more consistent retrieval results. Such validation would transform our theoretical metrics into evidence-based tools for terminology selection and optimization, significantly strengthening their practical value for healthcare interoperability initiatives.

Lastly, although our metrics provide quantitative measures of terminology systems’ characteristics, they do not capture qualitative features such as concept accuracy, computational efficiency, adaptability, and semantic consistency²⁷. Future research could explore ways to incorporate these qualitative dimensions into the evaluation framework, such as crowdsourcing and abstract network²⁸.

Despite these limitations, our study constitutes an important step toward quantitatively evaluating the structural size, granularity, and overlap of terminology systems. The proposed metrics provide a foundation for the evidence-based selection of terminology systems in healthcare interoperability initiatives. However, they should be used considering other factors, such as implementation costs, maintenance requirements, and specific use case requirements. Future studies should focus on validating these metrics across diverse healthcare settings and use cases, developing methods to incorporate non-hierarchical relationships into the evaluation framework, and exploring ways to assess qualitative aspects of terminology systems. Additionally, investigating the relationship between these metrics and actual clinical outcomes could provide valuable insights into the practical impact of terminology system selection on healthcare delivery.

Conclusion

Our study introduced novel quantitative metrics for evaluating healthcare terminology systems through MBR and content overlap. Although the SNOMED CT system demonstrated superior granularity across most clinical domains, other systems excelled in specific areas, suggesting the benefit of complementary system usage. These metrics provide practical tools for implementing interoperability platforms, enhancing existing systems, and evaluating new ones. Despite its reliance on the UMLS Metathesaurus and focus on hierarchical relationships, our study constitutes a significant step toward objectively evaluating potential semantic expansion and degradation during mapping between terminology systems. Future research should validate these metrics across different healthcare settings and explore the incorporation of qualitative parameters and non-hierarchical relationships.

Data availability

Data is provided within the supplementary information files.

References

IBM. What Is Interoperability in Healthcare? https://www.ibm.com/topics/interoperability-in-healthcare (2024).
Menachemi, N., Rahurkar, S., Harle, C. A. & Vest, J. R. The benefits of health information exchange: an updated systematic review. J. Am. Med. Inf. Assoc. 25, 1259–1265 (2018).
Article Google Scholar
Adler-Milstein, J. & Jha, A. K. Health information exchange among U.S. Hospitals: who’s in, who’s out, and why? Healthc. (Amst). 2, 26–32 (2014).
Article PubMed Google Scholar
Vorisek, C. N. et al. Towards an interoperability landscape for a National research data infrastructure for personal health data. Sci. Data. 11, 772 (2024).
Article PubMed PubMed Central Google Scholar
Ayaz, M., Pasha, M. F., Alzahrani, M. Y., Budiarto, R. & Stiawan, D. The fast health interoperability resources (FHIR) standard: systematic literature review of implementations, applications, challenges and opportunities. JMIR Med. Inf. 9, e21929 (2021).
Article Google Scholar
Hripcsak, G. et al. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216, 574–578 (2015).
PubMed PubMed Central Google Scholar
McDonald, C. J. et al. LOINC, a universal standard for identifying laboratory observations: a 5-year update. Clin. Chem. 49, 624–633 (2003).
Article CAS PubMed Google Scholar
HL7. Observation - FHIR v5.0.0. https://www.hl7.org/fhir/observation.html (2023).
Chang, E. & Mostafa, J. The use of SNOMED CT, 2013–2020: A literature review. J. Am. Med. Inf. Assoc. 28, 2017–2026 (2021).
Article Google Scholar
Observational Health Data Sciences and Informatics. Chapter 5 Standardized Vocabularies. https://ohdsi.github.io/TheBookOfOhdsi/StandardizedVocabularies.html (2021).
SNOMED International. SNOMED International and LOINC^® announce renewed collaboration agreement and upcoming production release of the LOINC Ontology. https://www.snomed.org/news/snomed-international-and-loinc%C2%AE-announce-renewed-collaboration-agreement-and-upcoming-production-release-of-the-loinc-ontology (2025).
Nelson, S. J., Zeng, K., Kilbourne, J., Powell, T. & Moore, R. Normalized names for clinical drugs: RxNorm at 6 years. J. Am. Med. Inf. Assoc. 18, 441–448 (2011).
Article Google Scholar
Hirsch, J. A. et al. Current procedural terminology; a primer. J. Neurointerv Surg. 7, 309–312 (2015).
Article PubMed Google Scholar
Steindel, S. J. International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets. J. Am. Med. Inform. Assoc. 17, 274–282 (2010).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Cimino, J. J. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf. Med. 37, 394–403 (1998).
Article CAS PubMed PubMed Central Google Scholar
Rosenbloom, S. T., Miller, R. A., Johnson, K. B., Elkin, P. L. & Brown, S. H. A model for evaluating interface terminologies. J. Am. Med. Inf. Assoc. 15, 65–76 (2008).
Article Google Scholar
Rosenbloom, S. T. et al. Using SNOMED CT to represent two interface terminologies. J. Am. Med. Inf. Assoc. 16, 81–88 (2009).
Article Google Scholar
He, Z., Geller, J. & Chen, Y. A comparative analysis of the density of the SNOMED CT conceptual content for semantic harmonization. Artif. Intell. Med. 64, 29–40 (2015).
Article PubMed PubMed Central Google Scholar
Bodenreider, O. The unified medical Language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32, D267–D270 (2004).
Article CAS PubMed PubMed Central Google Scholar
Rony, M. R. A. H., Chaudhuri, D., Usbeck, R. & Lehmann, J. Tree-KGQA: an unsupervised approach for question answering over knowledge graphs. IEEE Access. 10, 50467–50478 (2022).
Article Google Scholar
National Library of Medicine. UMLS - Metathesaurus. Unified Med. Lang. System (2016). https://www.nlm.nih.gov/research/umls/knowledge_sources/metathesaurus/index.html
Bales, M. E., Lussier, Y. A. & Johnson, S. B. Topological analysis of large-scale biomedical terminology structures. J. Am. Med. Inf. Assoc. 14, 788–797 (2007).
Article Google Scholar
Schuyler, P. L., Hole, W. T., Tuttle, M. S. & Sherertz, D. D. The UMLS metathesaurus: representing different views of biomedical concepts. Bull. Med. Libr. Assoc. 81, 217–222 (1993).
CAS PubMed PubMed Central Google Scholar
Chute, C. G. Distributed biomedical terminology development: from experiments to open process. Yearb Med. Inform 58–63 (2010).
Zhang, X. et al. Evaluating MedDRA-to-ICD terminology mappings. BMC Med. Inf. Decis. Mak. 23, 299 (2024).
Article Google Scholar
Vrandečić, D. Ontology evaluation. in Handbook on Ontologies (eds Staab, S. & Studer, R.) 293–313 (Springer Berlin Heidelberg, 2009). https://doi.org/10.1007/978-3-540-92673-3_13.
Chapter Google Scholar
Amith, M., He, Z., Bian, J., Lossio-Ventura, J. A. & Tao, C. Assessing the practice of biomedical ontology evaluation: gaps and opportunities. J. Biomed. Inf. 80, 1–13 (2018).
Article Google Scholar

Download references

Acknowledgements

This work was supported by Chungbuk National University NUDP program (2024) and the National Research Foundation of Korea grant funded by the Republic of Korea government (Ministry of Science and Information and Communication Technology; RS-2024-00354718).

Author information

Authors and Affiliations

Republic of Korea Air Force Aerospace Medicine Research Center, Cheongju-si, 28187, Chungcheongbuk-do, Republic of Korea
Eunsuk Chang
College of Nursing, Research Institute of Nursing Science, Chungbuk National University, Cheongju-si, 28644, Chungcheongbuk-do, Republic of Korea
Sumi Sung

Authors

Eunsuk Chang
View author publications
Search author on:PubMed Google Scholar
Sumi Sung
View author publications
Search author on:PubMed Google Scholar

Contributions

Eunsuk Chang: Conceptualization, Data curation, Formal analysis, Methodology, Software, Visualization, Writing – original draft. Sumi Sung: Conceptualization, Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Writing – review and editing.

Corresponding author

Correspondence to Sumi Sung.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chang, E., Sung, S. Quantitative analysis of the comprehensiveness and granularity of biomedical terminology systems. Sci Rep 15, 34525 (2025). https://doi.org/10.1038/s41598-025-17737-0

Download citation

Received: 16 March 2025
Accepted: 26 August 2025
Published: 03 October 2025
DOI: https://doi.org/10.1038/s41598-025-17737-0