Supplementary Figure 2: Determination of cell types by marker genes (related to Figs. 1, 2 and 3).

a) Molecule and gene count of the remaining 1545 cells. Black squares and error bars within each cluster represent the average values andstandart deviation). Clusters were hierarchically reordered to represent the relation between the 30 clusters (Number of cells for each cluster from left to right: 32, 34, 34, 34, 24, 75, 105, 77, 34, 129, 61, 35, 38, 34, 22, 18, 87, 145, 11, 19, 51, 39, 67, 85, 66, 48, 56, 34, 41, 11). b) Heatmaps showing the contribution of cells to each cluster by each of the donors (donors = different animals). Donors are named DH followed by a number that identifies the donor and the number of cells that were captured in each experiment is indicated in parenthesis. There are three heatmaps. The first is for VgattdTomato FACS sorted cells (n = 3 animals), the second for Vglut2GFP FACS sorted cells (n = 5 animals) and the bottom is unbiased dissociation of the dorsal horn and collection of cells (without FACS sorting) (n = 20 animals). This shows that cells are contributed to most clusters in each experiment and that this can be repeated over and over with similar results. The distribution of cells in each experiment is found by reading the heatmap horizontally. Some cell types are expected to be more abundant than others and will because of this appear stronger in the heatmap. Examining the heatmap vertically shows that there is no strong bias in contribution of cells between experiments. c) Contribution of donors (animals) to each cluster (neuron type) vs the cluster size. Large clusters are expected to have more donors than small clusters simply as the probability of capturing a cell belonging to that cluster is greater. If batch effects were strong, cell types should appear in certain batches (donors) but not in the other batches. Please note that even the smallest cluster with around 15 cells is composed of >7 donors (experiments). This can be taken as an indicator of the biological reproducibility of the clusters and the ability to identify all cell types across different experiments. d) Assessment of the predicted strength of clusters. Random forest classifier (max depth 30) was trained on 80% of the cells and testing its performance on the remaining 20% of the cells. The heatmap shows the predicted strength of the cluster identity. The rows are our “observed” cell types (i.e. the ones we assigned). The columns are the predictions the classifier makes. The color shows the probability of assigning each label, averaged across all the test cases. The mean probability along the diagonal is 69.8%.