Extended Data Fig. 8: Parameter tuning for ligand activity ranking and interaction program discovery workflows.
From: Comparative analysis of cell–cell communication at single-cell resolution

a) Heatmaps depicting Jaccard overlap index between DE testing results from CCIMs constructed with 217 different combinations of ligand activity ranking parameters. Three different datasets were used for testing: a pancreas islet dataset21,33, a uterine decidua dataset13, and a dataset of HNSCC8. b-d) 217 different parameter combinations were used to analyze CCC between NK cells transfected with CD40L-encoding mRNA and B cells transfected with CD40-encoding mRNA. Ligand activity-weighted CCIMs were calculated from each of these combinations and differential expression testing performed to identify which parameter combinations returned CD40L-CD40 as a differential edge with the highest specificity. b) Box plot depicting the difference between the log(fold-change) for CD40L-CD40 and the mean log(fold-change) for all other ligand-receptor pairs, with and without application of ligand activity ranking. n = 1 for analysis without ligand activity ranking; n = 216 for with ligand activity ranking. c) β coefficients and p-values from multiple regression analysis modeling the impact of each ligand ranking parameter on relative predictive power for the CD40L-CD40 edge. d) Scatter plots depicting relative predictive power for the CD40L-CD40 edge for all combinations of ligand ranking parameters. The mean for each parameter is shown within the plot. e) Example ligand activity distributions to aid in selection of the appropriate Pearson coefficient threshold. Generally, ligand activity coefficients form a right-skewed distribution, similar to the distributions shown here. The right tails of these distributions represent the putative biological activity and are the coefficients that should be used for CCIM weighting. We therefore encourage users to consider the number of ligands that are expected to display biological activity and the number of cells that are expected to have downstream signaling induced by those ligands. If there are very few ligands expected to be biologically active, and only a subset of cells responding to them, this threshold should be increased to include less of the right tail of the distribution. f) The interaction program discovery workflow was repeated on 35 random subsamples of the inDrop panc8 dataset21,34, using 19 different R2 thresholds to define the appropriate softPower parameter. Scatter plots depict association between R2 threshold and (clockwise from top left): recommended softPower, percentage of identified programs that failed significance testing, percentage of programs composed of only 1 ligand or receptor, and the average number of ligands and receptors composing a program. Shown are Pearson’s r, and an exact two-sided P value.