Extended Data Fig. 3: Multicollinearity analysis.
From: Repertoire analyses reveal T cell antigen receptor sequence features that influence T cell fate

(a)-(c) Maximum Pearson’s correlation observed between each pair of TCR features in the discovery dataset, for all possible combinations of amino acid-based TCR feature values (Methods). Heatmaps are separated by TCR region: (a) CDR3βmr, (b) TRBV-encoded (CDR1β loop, CDR2β loop, and the V-region of CDR3β) and, (c) TRBJ-encoded. (d) Feature selection for the V-region model based on variance inflation in estimated regression coefficients (Methods); each plot represents a candidate mixed effects logistic regression model jointly modeling the effects of TCR features on the x-axis. Black arrow denotes improvement from the first model to the second model via reduction of the variance inflation factor (VIF). Black horizontal line denotes the ideal VIF: zero inflation compared to a model with uncorrelated features. (e) Same as (d), for candidate J-region models.