Introduction

Music is a powerful medium for memory, capable of eliciting vivid recollections and emotional responses even in the absence of conscious retrieval. Musical memory encompasses several cognitive components, including perceptual encoding, long-term semantic representations of musical structure, and autobiographical associations formed through repeated exposure. Humans show a remarkable capacity to recognize melodies even after long delays, often based on minimal acoustic information, demonstrating the strength and durability of musical memory traces1,2,3.

Unlike most auditory stimuli, music engages a broad network that spans auditory association areas, medial temporal memory structures, and reward-related regions such as the striatum. This large-scale circuitry supports the encoding, retrieval, and affective resonance of familiar tunes4,5,6. Building on this, neuroimaging work has shown that auditory cortices interact dynamically with the hippocampus, parahippocampal cortex, and medial prefrontal regions during the encoding, retrieval, and recognition of musical material7,8,9. These interactions support long-term memory retrieval, associative binding, and predictive processing of melodic structure. Moreover, highly familiar or emotionally salient music increases functional coupling between auditory areas and key nodes of the default mode network—most notably the posterior cingulate cortex and medial prefrontal cortex—reflecting the engagement of autobiographical and self-referential processes8,10,11. Together, these findings indicate that auditory and memory networks are tightly integrated, providing a strong neurocognitive basis for the rapid recognition and internal continuation of familiar melodies.

Among the dimensions of musical memory, familiarity plays a central role. Familiar melodies are recognized rapidly—often within a few hundred milliseconds—and elicit stronger predictive processing and heightened activity in memory-related and reward networks compared to unfamiliar melodies12,13,14. This suggests that musical familiarity relies not only on long-term storage, but also on the dynamic interaction between memory and perceptual prediction. A key manifestation of this predictive mechanism is musical imagery, the internal recreation of musical sequences in the absence of external auditory stimulation. Musical imagery occurs spontaneously in most listeners and shares neural substrates with actual perception, including activation of auditory cortex and associative brain areas15. Because it reflects automatic access to long-term musical knowledge, musical imagery provides a powerful window into memory processes in healthy individuals, particularly with respect to the presence and timing of internally generated representations.

Experimental paradigms that introduce brief, unexpected silent gaps into familiar and unfamiliar melodies make it possible to probe internal musical processes with high temporal precision. In familiar melodies, such gaps frequently trigger spontaneous musical imagery, reflecting the rapid reconstruction of the omitted segment from memory15,16. Crucially, because neural activity is measured during silence rather than during ongoing auditory stimulation, these paradigms isolate internally generated memory-driven responses from bottom-up sensory processing. Neuroimaging studies have shown that the introduction of silence engages a distributed memory network, including auditory association areas, the dorsolateral prefrontal cortex, and the supplementary motor area17. Using EEG, transient neural responses can be detected within the first 500 ms of silence, reflecting the rapid detection of and reaction to the unexpected gap. Moreover, spectral analyses reveal a reduction in theta (5–8 Hz) and alpha (8–12 Hz) band synchronization during familiar melodies, indicating dynamic interactions between the auditory cortex and medial temporal structures involved in memory retrieval16. Together, these findings suggest that silent-gap paradigms provide a powerful window into memory processes and musical imagery, allowing objective detection of familiarity-related neural signatures while minimizing confounds from ongoing auditory input.

While these studies provide robust group-level insights, they also average out individual differences that may be critical for understanding familiarity at the single-subject level. Group-level analyses increase statistical power and reduce noise, making them effective for identifying reliable neural patterns associated with memory and other cognitive functions. However, this averaging can mask subtle but meaningful variations in brain dynamics that are essential for characterizing familiarity processes in individuals18,19. In clinical and applied contexts—such as early diagnosis or personalized intervention—such individual-specific signatures may hold diagnostic or predictive value. This underscores the need to complement traditional group analyses with methods capable of capturing subject-level patterns, including machine learning approaches.

In recent years, a wide range of machine learning approaches have been applied in neuroscience to classify cognitive states from EEG activity, including support vector machines, regularized linear models, random forests, and, increasingly, deep neural architectures20,21. Their performance is strongly shaped by the feature extraction strategy, as EEG decoding critically depends on how neural activity is represented through time-domain descriptors, frequency and time–frequency features, connectivity measures, or nonlinear dynamical metrics20. These findings highlight the importance of aligning analytical choices with the targeted cognitive and neurophysiological processes. Building on this literature, our approach combines spectral representations of EEG activity, classified with SVMs, random forests, and logistic regression, with Riemannian geometry–based methods. While the latter are commonly used in brain–computer interface applications21,22, we extend their use here to evaluate whether neural signatures of musical familiarity can be reliably decoded at the individual level. Importantly, this work is not intended to propose a comprehensive or mechanistic model of musical prediction or imagery, but to establish the methodological feasibility of identifying familiarity-related neural signatures in EEG data.

Our central hypothesis is that machine learning models can reliably distinguish neural activity associated with imagery of familiar versus unfamiliar music. Success in this endeavor would establish a foundation for objective, individualized assessment of musical familiarity. By training models on EEG recordings acquired during both listening and imagery conditions, we aim to develop a system capable of predicting song familiarity based solely on neural responses. We prioritized high-density EEG in the initial phase to localize the most informative regions for classification. By characterizing individualized neural markers of musical memory, this work lays the methodological foundation for future development of objective, automated, and non-intrusive tools for assessing memory-related processes. However, given the novelty of the approach, the present study is exploratory, and the analytical pipeline should be regarded as hypothesis-generating; the findings will need replication in an independent sample to establish robustness.

Methods

Participants to Preprocessing section describe experimental procedures originally developed and reported by16. The same dataset and protocol were subsequently used in23. In the present work, we provide a new analysis of these data focusing on the objective assessment of musical familiarity through the classification of EEG responses to familiar versus unfamiliar music using machine learning, with an emphasis on Riemannian geometry-based methods.

Participants

Twenty volunteers (7 male, 13 female, mean age = 32 ± 5 years old) participated in the experiment. All participants were right handed, nonmusicians (i.e. who did not receive any formal musical training or for less than 6 years), with no known neurologic or psychiatric diseases, reported normal hearing and were native French speakers. All subjects gave their written consent prior to participate in the study and received monetary compensation. None of the participants had memory deficits, as evaluated by neuropsychological evaluation. All participants provided written informed consent prior to participation and research methods were approved by the Committee for the Protection of Human Subjects at Clinical Investigation Center of Besançon (number 14/458), where the experiment was performed. All data collections and analyses were carried out in accordance with the Declaration of Helsinki. All ethical regulations relevant to human research participants were followed.

Experimental design

The experiment consisted of two main phases: selection visit and EEG recording.

Selection visit

One week prior to the first data recording, participants were asked to freely choose 5 familiar songs in their native language from different singers. Five unfamiliar songs in subjects’ native language were then selected by the experimenter among the original soundtracks recorded by independent or relatively unknown singers. Each of the unknown songs was carefully matched to be similar to one of the 5 familiar songs regarding musical genre and singer’s gender. Once familiar and unfamiliar songs were selected, all were normalized to the same loudness level with the Replay-Gain algorithm.

EEG recording

At the beginning of this session, subjects were informed about the experimental procedure that was divided into three parts: familiarity judgment of known and unknown songs, data recording and evaluation of musical imagery, as shown in Fig. 1. After listening to a 10-s excerpt of the 10 selected songs (5 familiar and 5 unfamiliar), subjects were requested to rate each song using a visual analog scale ranging from 0 (unfamiliar) to 10 (very familiar). The knowledge of lyrics was confirmed for familiar songs (8.7 ± 0.6) and absence of knowledge was confirmed for unfamiliar songs (0.0 ± 0.0). During the EEG recording, participants had to passively listen to the 5 familiar songs and the 5 unknown songs, each song being repeated twice in a random order. Participants were instructed to remain attentive throughout the entire trial, including during silent gaps, but were not given any explicit task or imagery instruction. Songs were 2 min long, and certain portions were replaced with twenty 2-second silent sections per song. The duration of the silent intervals was selected to allow internally generated musical processes, such as imagery, expectation, or internal continuation, to emerge following stimulus offset. This temporal window is consistent with previous music interruption paradigms15,17. These silent gaps were randomly embedded between 10 and 110 s after song onset and with a distance ranging from 2.5 to 3.5 s between sections. This resulted in 200 trials within familiar songs (20 gaps 5 familiar songs 2 repetitions) and within unknown songs. Silence generation and song presentation were accomplished using the E-Prime software (Psychology Software Tools Inc. Sharpsburg, PA). Audio streams embedded with gaps were played through a headphone with the volume adjusted to a comfortable listening level for each subject. After the EEG recording, subjects were required to rate whether they were mentally completing the gaps during the experiment (0 = not at all, 10 = very well).

Fig. 1
Fig. 1
Full size image

Timeline of the experimental procedure: familiarity evaluation, EEG acquisition, and final imagery assessment. The figure shows the three stages of the experiment: familiarity evaluation with music (scale 0–10), EEG acquisition during silences in familiar or unfamiliar music, and assessment of mental imagery completion of the silences (scale 0–10).

Data acquisition

Subjects were seated in a comfortable chair in a dark quiet testing room. EEG signals were recorded using a 256-channel Geodesic Sensor Net (Electrical Geodesics Inc.; EGI, Eugene, OR). The net is constructed to cover the skull as well as face and neck, with 20–25 mm inter-electrode distances in order to measure the arising electrical potentials from basal brain regions, thus improving/maximizing the spatial resolution of EEG. All channels were referenced to the vertex (Cz) and collected via a high impedance amplifier Net Amp 300 amplifier (Electrical Geodesics) and Net Station 4.5 software (Electrical Geodesics). Data were continuously recorded with a high-pass filter at 1 Hz, and a sampling rate at 1000 Hz. During the recording, subjects were instructed to close their eyes and remain still while passively listened to the song but they could blink and stretch as much as they wanted between each song. Subjects were not explicitly requested to complete the gaps in order to ensure that musical imagery was spontaneously/effortlessly produced. The total recording time was approximately fifty minutes, with breaks provided after every five songs to ensure participant comfort and alertness. The length of each break was flexible, allowing participants to resume the experiment at their own pace, but never exceeded a few minutes.

Preprocessing

Electrophysiological data were analyzed using Cartool software (version 3.55; http://brainmapping.unige.ch/Cartool.php). Epochs of 2600 ms were extracted from the raw data beginning 600 ms pre-silence onset and ending 2000 ms post-silence onset separately for the two conditions. A band-pass filter between 1 and 30 Hz and a notch of 50 Hz were applied to remove unwanted frequency components. The application of a baseline correction was re-considered because the real issue was to choose exactly when to apply the baseline period. We had the possibility of either applying it before silence onset (600 to 0 ms) or toward the end of the silent period (1500–2000 ms), or using no correction at all, as advocated by some authors (see24 for a discussion on baseline correction). The use of a baseline correction prior to the silence period may introduce biases due to the presentation of music and the use of a baseline correction toward the end of the silent period may conceal long-term imagery-related cognitive activities. We finally decided to select the one that had been selected in a prior magnetoencephalography (MEG) study using a similar design15 ranging from 600 ms to 100 ms in order to prevent any confusion over the baseline period with the silent periods. Periods with visually detectable artifacts (e.g. blinks, eye movements and gross movements) were removed from the analysis. The remaining data for each subject were averaged and individual channels with artifacts were interpolated using a 3-dimensional spine algorithm (up to 7% interpolated electrodes per subject). After excluding trials contaminated by artifacts, the average number of trials retained per subject was 203 (standard deviation: 39). To ensure that trial rejection did not introduce class imbalance between familiar and unfamiliar conditions, we quantified the number of valid epochs per class and per participant, and Table 1 summarizes the resulting distribution of retained trials for each subject. The proportion of familiar trials remained close to 50% for most subjects (mean = 53.6%, SD = 5.9%), indicating that trial exclusion did not introduce substantial class imbalance.

Table 1 Class distribution per participant after artifact rejection.

Feature extraction and classification

The primary objective was to classify EEG epochs corresponding to silent intervals following familiar and unfamiliar songs using machine learning techniques. Two categories of models were compared: traditional machine learning models based on engineered spectral features, and a model using Tangent Space Mapping (TSM) exploiting Riemannian geometry25. All models were trained and evaluated separately for each subject to account for individual variability. Features were extracted only from EEG signals of 204 channels, as electrodes located over the cheeks and lower neck were often poorly adhered and prone to motion artifacts, such as those caused by spontaneous jaw movements.

Spectral feature extraction

Each EEG epoch was transformed into the frequency domain using the Fast Fourier Transform (NumPy implementation, np.fft.fft). Prior to the transform, epochs were zero-padded to reach a spectral resolution of 0.25 Hz. No tapering window, FIR/IIR filtering, wavelet decomposition, or parametric spectral estimation methods were used.

The amplitude spectrum was obtained as the absolute magnitude of the FFT coefficients. Frequencies within the theta (4–8 Hz), alpha (8–12 Hz), low-beta (12–18 Hz), and high-beta (18–30 Hz) ranges were extracted by selecting the corresponding FFT bins, in accordance with standard EEG analysis practices and prior evidence that music familiarity modulates spectral power26. Importantly, these frequency bands have been repeatedly associated with internally generated auditory and memory-related processes. Theta and alpha oscillations, in particular, have been linked to musical imagery, predictive maintenance of auditory representations, and memory retrieval in the absence of sensory input15,16. Reduced alpha power in auditory cortices has been shown to facilitate internally driven perception of music during silence, reflecting the engagement of memory-based predictive mechanisms rather than bottom-up auditory processing15. Thus, spectral features in these bands plausibly capture oscillatory dynamics supporting the internal continuation of familiar melodies during silent gaps.

For each electrode and frequency band, four descriptors of the amplitude distribution—mean, median, standard deviation, and skewness—were computed. These statistical features are used in EEG signal analysis as basic descriptors of spectral characteristics27, and were also confirmed to be informative in preliminary analyses. This procedure yielded a total of 3264 features per trial (204 electrodes × 4 bands × 4 descriptors).

To reduce dimensionality and avoid overfitting, the resulting spectral feature matrix was submitted to a Principal Component Analysis (PCA). For each subject, components explaining up to 95% of the variance were retained, yielding on average around 131 ± 24 features per trial that were used as input to the classifiers.

Conventional machine learning models

Three conventional classifiers were considered: Logistic Regression (LogReg), Support Vector Machine (SVM) and Random Forest (RF) classifiers21. Hyperparameter tuning was performed through an exhaustive grid search, independently for each subject, using 5-fold cross-validation. The tested parameters with the grid search are presented in Table 2. Finally, feature scaling with z-score normalization was applied before training.

Table 2 Values of the grid search for each classifier.

Tangent space mapping approach

To exploit spatial covariance patterns in EEG signals while preserving their intrinsic geometric structure, we also implemented a Riemannian geometry-based feature extraction method known as TSM. In practice, this method relies on the covariance matrices computed from multichannel EEG signals. These matrices capture how different brain regions (represented by EEG electrodes) co-vary in their activity over time. Importantly, such covariance matrices are Symmetric Positive Definite (SPD), which means they cannot be treated as ordinary vectors in flat Euclidean space. Instead, they naturally belong to a curved mathematical space called a Riemannian manifold22,25. By taking into account this curved geometry, the TSM approach allows for more faithful and robust characterization of EEG covariance patterns compared to standard Euclidean-based methods.

Covariance matrix computation

For each EEG epoch, the spatial covariance matrix \(C\in{\mathbb{R}}^{204\times204}\) was computed as:

$$C=\frac{1}{t}X{X}^{\top}$$

where \(X\in{\mathbb{R}}^{204\times t}\) is the matrix of EEG signals (204 electrodes, \(t\) time samples per epoch), and \(t\)is the number of time points.

Riemannian mean of covariance matrices

Let \(\{{C}_{1},{C}_{2},\dots,{C}_{N}\}\) be the set of covariance matrices from all epochs of a subject (N epochs). The Riemannian (geometric) mean \(\stackrel{-}{C}\) of these matrices is defined as the matrix that minimizes the sum of squared Riemannian distances to all matrices:

$$\stackrel{-}{C}=\underset{C}{\text{argmin}}\sum_{i=1}^{N}{\delta}^{2}(C,{C}_{i})$$

where \(\delta\left(C,{C}_{i}\right)\) is the affine-invariant Riemannian distance:

$$\delta\left(C,{C}_{i}\right)={\parallel {\text{log}(C}^{-\frac{1}{2}}{C}_{i}{C}^{-\frac{1}{2}})\parallel}_{F}$$

with \(\text{l}\text{o}\text{g}(.)\) denoting the matrix logarithm and \({\parallel.\parallel}_{F}\)​ the Frobenius norm.

The computation of \(\stackrel{-}{C}\) does not have a closed-form solution but can be efficiently estimated using an iterative algorithm such as the Karcher flow (also known as the affine-invariant gradient descent method)28.

Projection onto the tangent space

Once the Riemannian mean \(\stackrel{-}{C}\) is computed, each covariance matrix \({C}_{i}\) is projected onto the tangent space at \(\stackrel{-}{C}\) using the Log-Euclidean mapping:

$${S}_{i}=Lo{g}_{\stackrel{-}{C}}\left({C}_{i}\right)={\stackrel{-}{C}}^{-\frac{1}{2}}\text{log}({\stackrel{-}{C}}^{-\frac{1}{2}}{C}_{i}{\stackrel{-}{C}}^{-\frac{1}{2}}){\stackrel{-}{C}}^{-\frac{1}{2}}$$

The resulting matrix \({S}_{i}\in{\mathbb{R}}^{204\times204}\) is symmetric and lies in the Euclidean tangent space.

Feature vector construction

Let \(vec\left({S}_{i}\right)\in{\mathbb{R}}^{D}\) be the feature vector corresponding to the projection of \({C}_{i}\)​, where :

$$D=\frac{204\times\left(204+1\right)}{2}=\text{20,910}$$

This feature vector is formed by flattening the upper triangular part of \({S}_{i}\) into a vector. Each entry of this vector corresponds to either:

  • the log-variance of an electrode (for diagonal elements coming from \({S}_{i}\)​).

  • the log-cross-covariance between pairs of electrodes (for off-diagonal elements coming from \({S}_{i}\)).

Therefore, each feature quantifies either the log-variance of a single electrode, representing its local power, or the log-cross-covariance between a pair of electrodes, reflecting their synchrony. Together, these covariance patterns provide a compact representation of large-scale neural coordination, which is consistent with evidence that musical familiarity engages distributed auditory, frontal, and memory-related networks rather than isolated local activations7,9.

By centering these measures at the subject-specific mean covariance and working in the log-domain, the features robustly capture deviations in spatial activity patterns and functional connectivity, while being invariant to global scaling and subject-specific variability22. Also, by working in the tangent space, we transform the manifold-valued SPD matrices into Euclidean vectors while retaining geometrically meaningful representations of covariance patterns.

After projection and vectorization, the resulting tangent-space feature vectors (20,910 dimensions) were reduced using a subject-specific Principal Component Analysis (PCA) retaining 95% of the variance, yielding on average around 169 ± 33 features per trial. The resulting reduced feature vectors were then classified using the same classifiers and hyperparameter optimization scheme described in the “Conventional machine learning models” section. Covariance matrices were processed and projected using the PyRiemann library (Python). Conventional classification models were implemented using the scikit-learn library.

Subject-specific training and validation

To account for individual EEG variability, all models were trained and tested independently for each subject. Model evaluation was performed using a nested 5-fold cross-validation procedure. The outer loop (5 folds) was used to estimate performance generalizability, while the inner loop performed hyperparameter tuning through a grid search. In each outer fold, the training set (80% of the data) was further split into five inner folds to select the optimal hyperparameters. All preprocessing steps, including z-score normalization, PCA dimensionality reduction, and, for TSM features, computation of the Riemannian mean and tangent-space projection, were fitted exclusively on the training data of the inner folds and applied to validation and test sets using the fitted parameters. The final model was then evaluated on the held-out outer test fold. Reported performance corresponds to the average of the outer folds for each subject. Across subjects, the model reported in the Results section is the classifier yielding the highest average accuracy.

In summary, both conventional and Riemannian-based feature sets were extracted and classified independently per subject to assess the ability to distinguish familiar and unfamiliar song conditions from silent EEG intervals.

Evaluation metrics

Model performance was assessed using standard classification metrics:

  • Accuracy : the proportion of correctly classified epochs.

  • Sensitivity (or True Positive Rate): the proportion of unfamiliar song epochs correctly classified as unfamiliar.

  • Specificity (or True Negative Rate): the proportion of familiar song epochs correctly classified as familiar.

All metrics were averaged across the 5 outer test folds for each subject. In addition, for the best-performing models, the distribution of performance across subjects was analyzed to assess inter-individual variability.

Results

We aimed to evaluate whether EEG responses to silent intervals following familiar and unfamiliar songs could be classified above chance, and to compare the performance of conventional spectral features versus Riemannian features.

Overall classification performance

Table 3 reports the best average classification performance (mean ± standard deviation across subjects) for each classifier and feature type.

Table 3 Performances of the best classifiers.

Overall, classification accuracy was higher when using TSM features compared to spectral features, with Logistic Regression achieving the best performance in both cases. Logistic Regression with TSM features yielded the highest accuracy (76.5 ± 8.0%), sensitivity (73.6 ± 12.8%), and specificity (78.0 ± 10.0%) among all tested configurations.

Inter-subject performance distribution

To illustrate the variability of classification performance across subjects, Fig. 2 presents boxplots of accuracy, sensitivity, and specificity for Logistic Regression using spectral and TSM features. These plots highlight the distribution of individual subject performances. The complete subject-wise performance metrics are provided in the Supplementary Material.

Fig. 2
Fig. 2
Full size image

Boxplots of logistic regression performance with spectral vs. TSM features across subjects (n = 20). The figure shows the distribution of the performance (accuracy, sensitivity and specificity) of the logistic regression, the best performing models, with both spectral and TSM features.

As shown in the boxplots, the TSM-based classifier led to a shift towards higher performance values for all three metrics. Accuracy scores with TSM were centered around 76–77%, compared to approximately 68% for spectral features. Moreover, both the minimum and maximum accuracies increased — from approximately 52% to 59% for the minimum and from 80% to 91% for the maximum — suggesting that TSM provided more consistently acceptable classification even for both the worst- and best-performing subjects. Inter-subject variability in accuracy was similar for spectral and TSM features (SD = 6.9% vs. 8.0%), indicating comparable consistency across participants.

These findings suggest that TSM features improve overall classification performance while maintaining stable results across participants. To further explore potential sources of inter-individual variability, we tested whether participants’ self-reported levels of musical imagery and familiarity were associated with their individual classification performance. Spearman correlation analyses revealed no significant relationships between classification accuracy and imagery or familiarity scores (imagery: ρ = -0.18, p = 0.50; familiarity: ρ = − 0.15, p = 0.55), indicating that subjective experience alone does not account for the observed variability in performance.

Feature importance and spatial patterns

Electrode importance in the classification was quantified from the absolute values of the regression coefficients associated with each spectral feature. For each subject, these values were normalized by the maximum coefficient across electrodes, yielding a relative importance score between 0 and 1. Scores were then averaged across the 20 training subjects to highlight electrodes consistently contributing to classification. The resulting heatmap (Fig. 3) shows the spatial distribution of these averaged normalized scores on the scalp, with higher values (warmer colors) indicating electrodes that had a stronger influence on the classifier’s decisions. Frontal electrodes are displayed at the top and occipital ones at the bottom of the map.

Fig. 3
Fig. 3
Full size image

Electrode importances for the classification based on spectral features with logistic regression. Electrodes are shown in a 2D flattened view of the EEG headset (occipital = bottom, frontal = top, left/right temporal = center sides). Background color indicates each electrode’s importance in distinguishing familiar vs. unfamiliar music (yellow = high importance, blue = low).

A similar procedure was applied for the TSM-based classifier. Absolute regression coefficients were normalized within each subject to obtain feature importance values between 0 and 1, then averaged across the 20 subjects. In this case, both diagonal (log-variances) and off-diagonal (log-covariances) features were considered. To avoid visual clutter, only the most influential features were displayed. The cutoff was determined from the distribution of averaged importances: when sorted in descending order, the curve exhibited a clear elbow, indicating a small subset of dominant features followed by a long tail of uniformly low contributions. The selected threshold (25%) corresponds to the point just before this elbow, ensuring that only meaningfully informative features were retained. Diagonal contributions appear as a scalp heatmap, while off-diagonal contributions—reflecting interactions between electrode pairs—are shown as weighted links whose thickness reflects their average importance.

Fig. 4
Fig. 4
Full size image

Electrode importances for the classificatifigon based on TSM features with logistic regression. Electrodes are displayed as in Fig. 3. The background heatmap shows the averaged normalized importance of variance-based (diagonal) features (yellow = high, dark blue = low). Links between electrodes represent covariance-based (off-diagonal) feature importances, with thicker lines indicating stronger contributions. Only features whose mean normalized importance exceeded 0.25 across the 20 training subjects are shown; all others were set to zero, resulting in dark-blue background values and invisible links.

In an exploratory analysis, we examined whether the importance of the most informative features (i.e., those exceeding the 25% threshold defined above) was related to participants’ imagery or familiarity ratings. For each selected feature, we computed correlations between its importance and the self-reported scores. Only a few correlations reached nominal significance, and none survived correction for multiple testing (Bonferroni-adjusted).

To assess this relationship at a multivariate level, we additionally ran a permutation-based analysis testing whether the full feature-importance vector could predict participant’s imagery or familiarity ratings. No significant association emerged (for familiarity : R² = − 2.493, p = 0.460 ; for imagery : R² = -0.356, p = 0.082). These results collectively suggest that the features most influential for classification do not systematically reflect individual differences in subjective imagery or familiarity.

Discussion

This study demonstrates that musical familiarity can be decoded from EEG signals recorded during short periods of silence inserted into songs. Classification was carried out solely on the signals during these silences, without any overt task on the part of the participants, showing that passive and automatic neural responses contain discriminative information about memory.

Among the tested models, the approach combining TSM with logistic regression achieved the best performance, reaching an average accuracy of 76.5%, compared with 68.1% for that using spectral features. These results confirm that familiarity modulates both spectral power and spatial covariance patterns in the brain.

For both the spectral-based models (see Fig. 3) and the TSM-based models (see Fig. 4), the most informative electrodes were located above the auditory region, in temporal areas, especially in the right hemisphere. In these areas, the relevant features formed coherent spatial clusters of neighboring electrodes rather than isolated points, reinforcing the neurophysiological plausibility of the classifiers’ decision pattern. Moreover, in the case of TSM, key connectivity features also involved links between temporal areas, suggesting that familiarity modulates not only local power but functional interactions between these brain regions. In addition, both spectral-based and TSM models also relied on frontal electrodes in the left and right hemispheres. These frontal regions contributed complementary information to the temporal areas.

While these feature patterns provide insight at the sensor level, source reconstruction can offer complementary spatial information, potentially clarifying the anatomical origin of the observed discriminative signals. We elected not to use source modeling here because our primary objective was to perform single-trial, subject-specific decoding, for which sensor-level features tend to provide greater robustness and require fewer modeling assumptions. Nevertheless, source reconstruction has been applied to data acquired with the same paradigm in previous work (see16, revealing increased engagement of frontal and temporal brain structures during familiar—but not unfamiliar—songs.

Temporal regions are well documented as supporting musical familiarity and imagery processes17,29,30, and frontal regions are known to be engaged during the perception of familiar melodies12 and even more strongly during musical imagery or illusory continuation17,31,32. An fMRI study using a similar protocol further showed higher activation of auditory cortices during embedded silences in familiar compared with unfamiliar songs17. Taken together, these findings suggest that the discriminative signals captured by our classifiers likely reflect a combination of memory- and imagery-related processes involving temporal and frontal networks, consistent with EEG and MEG evidence that familiar music reactivates long-term memory circuits30 and facilitates spontaneous imagery during unexpected silences29.

Although the observed patterns are consistent with memory- and imagery-related processes, it is important to consider that emotional engagement may also modulate neural responses during music listening. For example, listening to emotionally evocative classical music has been associated with early gamma-band increases in supramarginal, precentral, and inferior frontal gyri, followed by delta-band activity linked to relaxation effects33. Such findings highlight how emotional arousal and affective engagement can shape cortical dynamics. However, these stimulus-evoked patterns differ substantially from our silent-gap paradigm, in which no auditory input was present at the time of analysis. This reduces—though does not eliminate—the contribution of sensory-driven emotional responses, supporting the interpretation that the observed discriminative patterns more likely reflect internally driven mechanisms related to familiarity-based internal continuation and long-term memory retrieval. Still, because emotional factors can modulate imagery strength and familiarity judgments, their influence cannot be entirely ruled out.

To investigate the variability of performance across subjects, we examined whether participants’ subjective ratings could account for differences in classifier performance. Specifically, we tested whether reported levels of musical imagery and familiarity correlated with either (i) classification accuracy, or (ii) the importance of individual features in the TSM-based model. Neither familiarity nor imagery scores showed consistent correlations with model features or classification performance. Likewise, neither simple correlations nor a permutation-based multivariate analysis revealed any reliable association between subjective ratings and the feature importance in the classification. The absence of correlations with self-reported imagery suggests that other internally driven mechanisms, such as implicit familiarity, automatic internal continuation, or attentional processes, may underlie the discriminative neural patterns. It is also possible that post-session imagery ratings lacked the sensitivity to capture the precise neural representations driving classification. In other words, while musical imagery likely contributes to the observed neural responses, it may not be the sole or dominant factor explaining the discriminative signals extracted by the models.

Beyond the role of musical imagery, the task also engages memory-based mechanisms required to internally reconstruct the missing portions of familiar songs. These findings can be interpreted interpreted in terms of familiarity-related internal continuation processes, which posit that the brain continuously generates expectations about upcoming sensory input based on long-term regularities stored in memory. Familiar music provides a rich and well-structured internal model, enabling listeners to form precise predictions about the continuation of a melody—even during silence. Consequently, neural responses in familiar silent gaps may reflect reduced prediction error, because the internal model accurately “fills in” the missing auditory information. In contrast, unfamiliar melodies may yield less stable internal representations, leading to greater uncertainty and potentially larger prediction-error signals when the sound unexpectedly disappears.

The spectral differences we previously reported in the same dataset16, particularly in theta and alpha bands, are consistent with studies linking these oscillations to internally maintained representations and memory-related processing during the absence of sensory input. Taken together, these findings support a memory-based interpretation in which long-term musical knowledge shapes the strength and stability of internally generated representations during silence.

Recent EEG work also underscores the contribution of predictive mechanisms to musical pleasure, which is closely intertwined with familiarity and imagery. The study34 demonstrated that musical surprise—quantified via statistical models of melodic and harmonic prediction—elicits β- and γ-band increases in frontal regions, tightly coupled with subjective pleasure. These results suggest that prediction errors in music can recruit reward-related frontal networks, similarly to monetary rewards. Although no auditory input was present during our analysis windows, familiar melodies are more likely to elicit strong internal predictions during silent gaps, which may contribute to the discriminative neural patterns captured by our models. Thus, while our results primarily reflect internally driven memory and imagery processes, they may also incorporate reward-related frontal dynamics associated with predictive success or mismatch in familiar musical contexts.

Therefore, a key aspect of this experiment concerns identifying which memory systems are implicated. Listening to music involves multiple memory subsystems that operate across distinct temporal and cognitive scales, including sensory memory, short-term memory, and long-term memory35. By capturing early neural responses during silent gaps embedded in familiar and unfamiliar songs, we sought to probe participants’ ability to generate predictions based on prior musical knowledge. As familiar and unfamiliar songs were carefully matched in terms of musical genre and singer’s gender, we propose that the observed neural differences predominantly reflect the retrieval of information stored in long-term memory. Nevertheless, other factors—such as increased rhythmic predictability in familiar melodies—may also have contributed to the observed effects. Disentangling the specific role of long-term memory remains complex, particularly in light of previous findings showing that lyrics are more easily anticipated when paired with their original melodies than when presented in isolation, suggesting a synergistic effect on memory recall36. Further investigations are warranted to better isolate the contribution of long-term memory processes. In this regard, the use of generative artificial intelligence models to create personalized musical stimuli—such as lyrics tailored to the participants’ linguistic and cultural backgrounds—may offer valuable new methodological avenues.

While these findings offer insight into the neural dynamics of musical familiarity and imagery, several methodological and interpretative limitations should be considered. A key limitation concerns the cognitive specificity of the classifier and the selection of musical stimuli. Participants chose songs they knew well, ensuring a high level of personal familiarity and maximizing ecological validity. While factors such as tempo, rhythm complexity, and emotional valence can influence neural responses, we deliberately focused on broader characteristics such as musical genre and performer gender to avoid creating artificial stimuli and maintain real-life listening conditions. Selected unfamiliar songs were commercially available and freely accessible, which limited control over all acoustic and affective parameters. Importantly, neural activity was analyzed during brief silent gaps inserted into the melodies, when participants were not hearing the music. This design reduces the direct impact of low-level acoustic features and emphasizes internally generated responses associated with musical familiarity and imagery. For familiar vocal excerpts, internally generated activity during silent gaps may involve not only melodic or rhythmic representations, but also verbal or lyric-based components. The present decoding approach does not aim to dissociate these components, but rather captures familiarity-related neural activity at a global representational level.

Despite these precautions, the classifier may capture a mixture of memory-related, affective, and attentional processes rather than memory alone. Familiarity covaries with preference, emotional arousal, and attentional engagement, and empirical work shows a reciprocal relationship in which liked music is often perceived as more familiar even when objective exposure is controlled37. Neuroimaging evidence further demonstrates overlapping engagement of reward-related and limbic/paralimbic regions together with auditory and memory-related cortices such as the IFG and STG12,23. Fully disentangling these contributions remains challenging, but future studies could incorporate explicit ratings of arousal, valence, and preference, along with objective attentional measures or multivariate control analyses. Notably, this interplay may reflect ecologically valid listening conditions, in which memory, affect, and attention naturally interact during the experience of familiar music.

Another limitation concerns the relatively high proportion of trials discarded during preprocessing. This rejection rate reflects a deliberate methodological choice: for single-trial decoding, we applied conservative artifact-rejection criteria to ensure that only the cleanest EEG segments contributed to model training. High-density EEG systems increase the likelihood that transient noise contaminates individual trials, so stringent criteria disproportionately reduce retained epochs. Prioritizing signal quality over data quantity is consistent with recommendations in neural-decoding research, which emphasize the importance of high-quality training data over sheer sample size38,39. Importantly, our paradigm included a sufficiently large initial number of trials to allow such preprocessing without compromising classification performance. Future studies may explore adaptive artifact-correction or hybrid approaches to balance data cleaning and retention.

A further methodological limitation concerns the restricted exploration of alternative feature extraction strategies and classifier types. Although we compared spectral features with Riemannian geometry–based representations, both of which are well established in EEG decoding, recent work highlights that classification performance can vary substantially depending on how neural activity is represented and which algorithms are employed. EEG studies have demonstrated that different feature spaces can each emphasize distinct neurophysiological processes and lead to divergent decoding outcomes40,41. Similarly, classifier families differ in their sensitivity to noise, feature dimensionality, and subject-specific variability. Systematically evaluating a broader set of feature representations and model architectures would provide a more comprehensive understanding of the mechanisms driving classification accuracy. Beyond these methodological considerations, the analytical pipeline as a whole should be regarded as exploratory. Feature selection, model comparison, and post-hoc correlations with behavioral ratings were conducted without correction for multiple testing. Accordingly, the present results are hypothesis-generating rather than confirmatory. Future work should replicate the best-performing model in an independent sample, ideally with preregistered analytical decisions, and expand comparative analyses across feature spaces and classifier families to establish robustness and external validity. In addition, cross-subject generalization was not evaluated in the present study, as all models were trained and tested within subjects. Assessing whether the neural signatures of musical familiarity generalize across individuals—or whether individual calibration is required—will be essential for future work aimed at clinical or applied translation.

The study also lacked behavioral or psychometric measures capturing inter-individual differences in emotional regulation, mood, personality traits, or memory capacity. Although the design focused on familiar versus unfamiliar melodies, broader cognitive–affective factors may modulate musical imagery and corresponding EEG responses. For example, short (~ 2-s) EEG segments have been shown to contain connectivity patterns associated with emotional states42. We did not collect explicit ratings of arousal, valence, or preference per excerpt, so the classifier may partially reflect affective engagement in addition to familiarity. Future studies should incorporate standardized assessments of affective style, preference, personality traits, and musical sophistication, as well as excerpt-level emotional ratings, to better disentangle these contributions. Nevertheless, the consistency of the familiarity-related neural signatures across participants suggests that the observed effects primarily reflect shared cognitive mechanisms rather than idiosyncratic variability.

The generalizability of our findings is limited by sample size and population. Only 20 healthy young adults were included, which constrains statistical power and the applicability of results to broader populations. While this sample size is typical for EEG and single-subject decoding studies, larger and more diverse cohorts will be necessary to confirm robustness and reproducibility. The experiment was conducted under highly controlled laboratory conditions with a high-density EEG system, which maximizes data quality but limits applicability to real-world or clinical settings. In everyday environments, auditory processing occurs alongside multisensory input (e.g., visual attention, proprioception), which may reduce decoding reliability. To address this, ongoing work aims to develop simplified protocols and evaluate low-density or wearable EEG systems focused on temporo-frontal regions implicated in musical familiarity and imagery.

Finally, two methodological considerations warrant mention. First, participants listened to 10-second excerpts before EEG acquisition to provide familiarity ratings. Although this ensured accurate identification of familiar and unfamiliar items, it may have slightly reinforced memory traces. However, as familiar melodies were self-selected and already well-known, it is unlikely that this brief exposure substantially altered long-term representations. Second, subjective ratings of musical imagery were collected after the session, with separate scores for each song rather than a single global rating. This design captured song-specific variations while avoiding interruptions during listening. Given the large differences in familiarity between known and unknown songs, these per-song ratings likely reflect meaningful differences in imagery engagement. Although no correlations were observed between classifier performance and song-specific imagery ratings, this does not imply that musical imagery was unimportant; rather, it likely reflects the limited sensitivity of the post-session self-report measures.

Conclusion

In conclusion, we developed an objective and automatic system to assess musical familiarity based on neural responses to musical imagery. By combining passive EEG recordings with machine-learning decoding, the method captures subject-specific neural dynamics associated with familiarity without requiring explicit behavioral responses. The present validation was conducted in a small sample of healthy young adults using a high-density EEG system in controlled laboratory conditions, and the findings should therefore be interpreted as a methodological proof of concept rather than a clinically applicable tool.

Future work will need to replicate these results in larger and more diverse cohorts and evaluate the stability of the identified neural markers across sessions and environments. Testing this approach in populations with memory impairments will be essential to determine its clinical relevance, while ensuring that any potential diagnostic use undergoes formal prospective validation. Improving usability through electrode reduction, simplified preprocessing, and the development of a turnkey, user-friendly system will also be critical for eventual deployment in semi-ecological or clinical settings.