Fig. 1: Visual representation of the pre-processing of audio files and extraction of the linguistic measures.

Speech samples were obtained via the semi-structured interviews of the APACS test (1), and then transcribed using the CLAN software (2). Afterwards, token-based values were automatically extracted from the transcripts: R Studio was used to automatically obtain lexical frequency values for each token in the text from the Corpus and Frequency Lexicon of Written Italian (CoLFIS) corpus (3a), Natural Language Toolkit (NTLK) was employed to compute the Type-Token ratio (3b), while the Linguistic Inquiry and Word Count (LIWC) software was used to obtain the frequency of affective words and words indicating cognitive mechanisms (i.e., Psychological Lexicon) and Personal Pronouns (3c). Finally, the speech samples were processed using the PRAAT software (4) to determine the number of utterances for the computation of the Mean Length of Utterance, as well as to extract pause and gap duration and the number of pauses, used for the computation of the Pause-to-word ratio.