Table 3 Summary of themes and challenges.

From: Evaluation of the portability of computable phenotypes with natural language processing in the eMERGE network

Theme

Challenges

Portability of algorithms

Algorithm performance varies by phenotype

Identifying the correct type(s) of notes across sites can be challenging given differences how notes are categorized

Well-known challenges in NLP and ML persist

Implementation environments

Use of different programming languages/NLP pipelines can cause delays in implementation when a site does not have local expertise

Sites run NLP and ML in different environments, which may have different requirements for the software that can be run

Local changes/customization were often needed for things like file paths and document input formats

Data preparation steps were the most time and resource intensive

Privacy

Given identifiers embedded in clinical notes, sites have different requirements and restrictions on their use of notes for NLP

Documentation

Scripts and software often lacked sufficient documentation on how to execute, and the expected output

Phenotyping workflow/process

Communication delays between author and implementer could have compounding effects on overall time to complete

Sharing NLP/ML pipelines with other sites may be hindered by intellectual property concerns

Reconsider traditional workflows to phenotyping

  1. Summary of the top themes found within our analysis, and a summary of the challenges reported by eMERGE sites within each theme. A full listing of themes is available in Supplementary Appendix B.