Extended Data Fig. 3: PSID requires orders of magnitude fewer training samples to achieve the same performance as NDM that uses a larger latent state dimension, and NDM with the same latent state dimension as PSID or RM do not achieve a comparable performance to PSID even with orders of magnitude more samples.
From: Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification

(a) Normalized eigenvalue error is shown for 1000 random simulated models with 16-dimensional latent states out of which 4 are behaviorally relevant, when using RM, PSID, or NDM with similar or larger latent state dimension than PSID. Solid lines show the average and shaded areas show the s.e.m. (n = 1000 random models). For NDM, to learn the behaviorally relevant dynamics using a model with a high-dimensional latent state (nx = 16), we first identify this model, then sort the dimensions of the extracted latent state in order of their decoding accuracy, and then reduce the model to keep the 4 most behavior predictive latent state dimensions (Methods). These operations provide the estimate of the 4 behaviorally relevant eigenvalues (Methods). For RM, the state dimension is the behavior dimension (here nz = 5). (b) Cross-validated behavior decoding CC for the models in (a). Figure convention and number of samples are the same as in (a). Note that unlike in (a), here we provide decoding results using the NDM with a 16-dimensional latent state both with and without any model reduction, as the two versions result in different decoding while they don’t differ in their most behavior predictive dimensions and thus have the same eigenvalue error in (a). Optimal decoding using the true model is shown as black. For NDM with a 4-dimensional latent state (that is in the dimension reduction regime) and RM, eigenvalue identification in (a) and decoding accuracies in (b) almost plateaued at some final value below that of the true model, indicating that the asymptotic performance of having unlimited training samples has almost been reached. In both (a) and (b), even for an NDM with a latent state dimension as large as the true model (that is not performing any dimension reduction and using nx = 16), (i) NDM was inferior in performance compared with PSID with a latent state dimension of only 4 when using the same number of training samples, and (ii) NDM required orders of magnitude more training samples to reach the performance of PSID with the smaller latent state dimension as shown by the magenta arrow. Parameters are randomized as in Methods except for the state noise (wt), which is about 30 times smaller (that is −2.5 ≤ α1 ≤ −0.5), and the behavior signal-to-noise ratio, which is 2 times smaller (that is −0.3 ≤ α3 ≤ +1.7), both adjusted to make the decoding performances more similar to the results in real neural data (Fig. 3).