Supplementary Figure 6: Validation for integrative modeling.
From: Architecture of Pol II(G) and molecular mechanism of transcription regulation by Gdown1

a, Convergence of the model score, for the 1693 good-scoring models; the scores do not continue to improve as more models are computed essentially independently. The error bar represents the s.d. of the best scores, estimated by repeating sampling of models 100 times. b, Distribution of model scores for model samples 1 (red) and 2 (blue), comprising the 1693 good-scoring structures. c, Three criteria for determining the sampling precision (y-axis), evaluated as a function of the RMSD clustering threshold (x-axis). First, the P value is computed using the chi-squared test for homogeneity of proportions (red dots). Second, an effect size for the chi-squared test is quantified by Cramer’s V value (blue squares). Third, the population of models in sufficiently large clusters (containing at least ten models from each sample) is shown as green triangles. The vertical dotted black line indicates the RMSD clustering threshold at which three conditions are satisfied (P > 0.05, Cramer’s V < 0.10, and population of clustered models > 0.80), thus defining the sampling precision of 18.6 Å. d, Populations of sample 1 and 2 models in the two clusters obtained by threshold-based clustering using the RMSD threshold of 23.6 Å. The cluster contains 96.8% of the models. Cluster precision is shown for the cluster. The precision of the cluster defines the model precision. e, Euclidean Cα–Cα distance statistics for each cross-link in the cluster. The cross-links are sorted by average distance (ordinate axis). The error bars represent the s.d. of the distance across all models in the clusters. (Inset, Euclidean Cα–Cα distance distributions of all measured cross-links in the ensemble of solutions for the cluster. The y axis provides the normalized number of cross-links that were mapped onto the model. The dashed red line denotes the expected maximum reach of a cross-link.).