Fig. 1: Overview of workflow for downstream analysis tasks and HeLa dataset.

a Single tabular results taken from MS data analysis software (search and quantification) were used as input for downstream analysis. Here we used MaxQuant for data dependent acquisition to analyze raw MS data. We compared three different self-supervised DL approaches with 27 other methods: median imputation and KNN interpolation exemplified. Green and red not-available (NA) indicate simulated and real missing values. b Principal component one versus two of 539 selected HeLa runs for protein groups recorded on one instrument. c Same as (b), based on the 50 runs forming the small development dataset. We used a cutoff of 25% feature prevalence across samples to be included into the workflow shown in (a). Samples were filtered in a second step by their completeness of the selected features (Supplementary Fig. 1).