Extended Data Fig. 5: Results of the Weak-Robust approach.

The blue line represents the SVM model trained and evaluated on an increasing number of Principal Component Analysis (PCA) components of openSMILE vector representations of the audio signal for the Matched COVID-19 detection from audio task (‘weak-model-covid- matched’). Individuals correctly classified by the weak model in the Matched test set are hypothesized to har- bour confounding signal, and are removed to create the curated Matched test set. The red line shows SSAST performance on this curated Matched test set (‘ssast-covid-matched-curated-removal’). For comparison, we also randomly remove Matched test cases and these results are shown by the purple line (‘ssast-covid-matched- curated-removal’). The vertical green line corresponds to the calibration threshold, that is, the number of PCs for which the weak model achieves UAR of greater than 80% on the calibration task. The green shaded area corresponds to the drop in SSAST performance that we attribute to the removal of confounding in Matched test set cases. We note that the drop in performance below random classification is hypothesized to be due to only the ‘tricky’ cases remaining (for example, symptomatic COVID-). The 95% confidence intervals are calculated via the normal approximation method with the outcome of the experiment being the center line.