Fig. 4: The Extract-Transform-Load (ETL) process within the YCDL framework. | npj Digital Medicine

Fig. 4: The Extract-Transform-Load (ETL) process within the YCDL framework.

From: Continuous multimodal data supply chain and expandable clinical decision support for oncology

Fig. 4

In the clinical data extraction stage, we developed an ETL process, which includes Natural Language Processing (NLP), for each feature. The DSC DB serves as a reservoir containing raw medical text, (semi-) unstructured data, imaging files, next-generation sequencing (NGS) results, and Extensible Markup Language (XML) formats. In the initial phase of data processing, we tailored the database corpus from the DSC DB, optimizing the extraction and management of medical terminology, abbreviations, and recurrent misspellings (e.g., within pathology reports). Subsequently, the procured data underwent transformation through a specialized ETL algorithm designed to harmonize terminology based on assertions and the interrelationships of medical concepts. NLP was instrumental in utilizing CT and MRI interpretation counts from follow-up visits as criteria for individual selection.

Back to article page