Fig. 2

Process to create symptom-disease relations in the graph based on classifiers trained on EHRs. Logistic regression models were trained for each disease to understand symptom-disease associations. These models were trained using a train-test framework to predict whether patients exhibited the disease based on symptoms. Models with acceptable predictive performance (AUC greater than 0.7 on the testing set) were selected. Symptom-disease relations were quantified using odds ratios (OR) derived from the coefficients of the models. If a symptom-disease relation had an associated ORi = eβi > 1.9 (where βi represents the coefficient of symptom i for a given disease model), a new relation was incorporated into the graph if not already present.