Fig. 1: Tinnitus presence model.

A We used the NIPALS machine learning algorithm to predict the presence of tinnitus based on 101 features, representing eleven categories. The weights attributed to the features are depicted in A. The model was trained on 166,119 participants and tested on 26,874 individuals. B, C tested the efficacy of the model categorizations at baseline in the test set, while F, G evaluated the model’s predictions of tinnitus evolution over time (9 years after baseline). B This figure depicts the variance explained by each category of the model. Only hearing and demographic factors explained >1% of the variance each. C We used the AUC-ROC curves to test if the model was able to categorize participants based on how often they experienced tinnitus. The model predictions were good to excellent, depending on the level (some of the time, a lot of the time, all the time). D The model was re-trained removing each category of features. Only the removal of hearing health significantly impacted the performance. E This panel depicts the evolution of tinnitus presence between the baseline visit (left side) and the follow-up visit (right side) of participants of the testing dataset, spaced by nine years. F, G These panels illustrated the evolution of the adjusted risk scores (the boxplot depicts median (center line), interquartile range (box), and 1.5 × IQR whiskers) (F) and the model’s performances (effect sizes calculated using Cohen’s d and categorization efficacy assessed with AUC-ROC (G)) as a function of tinnitus presence over time. The 95% CI estimated across 10,000 bootstrap samples is shown for the effect size. The evolution is rated between −3 and 3, with −3 representing the evolution from tinnitus present all the time at baseline to the absence of tinnitus in the follow-up visit (full recovery), and +3 the opposite evolution (apparition of constant tinnitus). Based on those figures, we concluded that the model could not predict the evolution of tinnitus presence over time.