Table 1 Refinement process of lexicon for analyzing cosine similarity in clinical notes.

From: Lexical associations can characterize clinical documentation trends related to palliative care and metastatic cancer

Initial metastatic and PC terms chosen

Expanded lexicon of metastatic and PC terms from top 50 contextually similar terms within w2v models

Terms present in each year’s datasets

Final included list of terms after looking at most contextually similar terms

“metastatic”

“mets”

“metastases”, “metastatic”, “widely_metastatic”, “metastasis”, “mets”, “widespread_metastatic”, “osseous_metastatic”, “bony_metastasis”, “metastasized”, “metastatic_deposit”, “metastatic_poorly_differentiated”, “hepatic_metastasis”, “brain_metastasis”, “bony_mets”, “metastatic_rcc”, “metastasize”, “metastatic_nsclc”, “oligometastatic”, “nodal_metastasis”, “ntrathoracic_metastatic”

“metastatic”, “widely_metastatic”, “metastasis”, “mets”, “osseous_metastatic”, “metastasized”,

“metastasize”, “oligometastatic”, “nodal_metastasis”

“metastatic”, “widely_metastatic”, “metastasis”, “mets”, “osseous_metastatic”, “metastasized”,

“metastasize”, “oligometastatic”, “nodal_metastasis”

“palliative”

“palliative care”

“pal care”

“palliate”

“pall”

“palliation”

“pc”

“palliative care”

“pc consult”

“palliative”

“palliate”

“pall”

“palliation”

“pc”

“palliative”

“palliate”

“pall”

“palliation”

“palliative”

  1. Starting from initial key terms related to metastatic disease and palliative care, the lexicon was expanded using Word2Vec models to include top 50 contextually similar terms. This expanded the term list to a larger set of terms that could represent metastatic disease and/or palliative care terms. We then queried yearly datasets to ensure presence of each word in our word2vec models. Finally, we re-expanded each term to include the top 50 contextually similar terms via models to identify whether authors were using them to convey metastatic disease or palliative care respectively. This led to the final list of terms used in the analysis. The only term removed from prior to the final lexicon was “pc” which appeared to be unrelated to the study question around palliative care and seemed to be a common acronym related to general patient care. Hence, “pc” was removed from the final list following quality control.