Fig. 6: Application and performance optimization of HarmonizR for large datasets based on different Brain Tumor entities, measured in 23 TMT batches23.

a Schematic representation of the experimental design. b Batch count distribution of 9156 proteins quantified. c Pearson correlation-based hierarchical clustering with Ward.D linkage prior to and after HarmonizR (ComBat) usage. d Sample specific CV and mean for uncorrected data and after HarmonizR (ComBat) execution. (All batches consisted of n = 11 biologically independent samples, details on the assignment of tumor types to batches can be obtained from Supplementary Table 5). In boxplots, 50% of the data points are inside the box (Q1 (Quartile 1) being the lower bound of the box (25%), Q3 being the upper bound of the box (75%)). Whiskers show all values beyond the box without outliers. Outliners were defined as Q3 + 1.5 * IQR (Interquartile range) (upper outlier) and Q1-1.5 * IQR (lower outlier). IQR being Q1–Q3. e Heatmap visualization of tumor type specific abundance distribution of proteins, associated with the tumor relevant gene sets “Hallmark-MYC Targets; Hallmark-E2FTargets and REACTOME -Signaling by WNT” after HarmonizR (ComBat) execution. f Performance analysis of multi-threaded HarmonizR algorithm. 1.: Visualization of the speedup of the HarmonizR implementation for ComBat (blue) and limma (orange) alongside Amdahl’s law (dashed lines) with respect to the number of processors. Tests have been made for 1, 2, 4, 8, 12, and 24 processors. Code corresponding to 78.48% of the sequential run time has been parallelized for ComBat and 66.82% for limma. Amdahl’s law has therefore been calculated using these percentages. 2.: Speedup visualization for the parallelized part of the HarmonizR implementation only with respect to the number of processors. Speedup while using the ComBat algorithm is shown in blue. Speedup while using the limma algorithm is shown in orange. Tests have been made for 1, 2, 4, 8, 12, and 24 processors. The potential maximum speedup is shown as a linear, proportional behavior. Computed on an Intel Xeon Gold 6226, 2.70 GHz, 2 × 12 compute cores, 96 GB RAM. Source data are provided as a Source Data file.