Fig. 2: Construction of the StocH clock.
From: Quantifying the stochastic component of epigenetic aging

a, Left: the age distribution of the 1,202 sorted monocyte samples from the MESA study. The shaded blue and red regions highlight the youngest and oldest samples used in the simulation, respectively. Middle: the average DNAm over the youngest (AvDNAm(Young)) and oldest (AvDNAm(Old)) samples for each of the 353 Horvath Clock CpGs. Right: the corresponding density of absolute effect sizes defined as the magnitude of the DNAm difference between youngest and oldest samples. b, The stochastic simulation of one CpG in one individual of a given age, which starts out from the average DNAm in the youngest samples and subsequently adds a stochastic deviate for each unit time step. The probability per time step that a CpG is altered is given by a decaying exponential with exponent determined by the observed absolute effect size of the CpG and a CpG-independent parameter, γ, that controls the overall probability of CpGs changing. The direction of the DNAm change is dictated by the directionality of the observed effect size, with the magnitude determined by the standard deviation, σ, of a signed Gaussian distribution, as indicated. Of note, the simulation model adds Gaussian deviates to the quantiles of an inverse normal distribution. The model is simulated to generate effect sizes for each of the 353 CpGs, which is then compared with the observed distribution to identify the optimal (λ and σ) parameters minimizing the MAE between simulated and observed values. c, To build the StocH clock, we then use the simulation model with the optimal (λ and σ) parameter values to generate three artificial cohorts of 195 samples each. There are 195 samples because we simulate five samples per age value, with ages ranging between 45 and 83 years; that is, a total of 39 distinct age values. One cohort is used to train elastic net regression models with \(\alpha =0.5\), and for varying penalty parameter values, λ. These models are then evaluated in the model selection set to select the model that optimizes the root mean square error (RMSE). This optimal model is then evaluated in the test set. Created with Biorender.com.