Table 1 Description of the linguistic features in the multilevel semi-automated analysis.

From: Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach

Linguistic dimensions

Measures

Description

Lexical Richness

Type-token ratio

The number of unique words divided by the total number of words in the speech sample; this measure is considered as an indicator of lexical variety and might reflect language or thought disorder5,86.

Lexical frequency

Mean frequency value of all uttered words, obtained from the Corpus and Frequency Lexicon of Written Italian (CoLFIS)78; it indicates whether the participant used more low- or high-frequency words.

Fluency

Mean length of utterance (in words)

Total number of words produced in each utterance divided by the total number of utterances; this measure might reflect poverty of speech.

Mean gap duration (in msec)

Total duration of silences between the interviewer’s question and the participant’s answer (gap) divided by the total number of gaps; this value reflects average turn planning time87.

Mean silent and filled pause duration (in msec)

Total duration of silent pauses (defined as silences longer than 200 msec) and filled pauses (e.g., uhm, ehm, etc.) divided by the total number of pauses; pause duration reflects intra-turn planning and self-monitoring processes29.

Pause-to-word ratio

Total number of pauses divided by the total number of words in the speech sample; this value can be considered as an indicator of processing speed29.

Frequency of Personal Pronouns

Percentage of personal pronouns over the total word count.

Psychological Lexicon

Frequency of affective words

Percentage of words conveying positive or negative emotional valence over the total word count.

Frequency of words related to cognitive mechanisms

Percentage of words expressing causality, insight, possibility, inhibition, or certainty (e.g., because, hence, think, know, consider, ought, should, exclude, etc.) over the total word count. This measure might reflect metacognitive processes48.