Schizophrenia

Table 1 Description of each feature extraction tool.

From: Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths

Tool	Number of Features	Description
EVA	42	EVA captures entropic scores derived from sentiment polarity and intensity¹⁵. It detects occurrences of polarized sentiments (i.e., positive and negative valence words) and effects from valence modifiers (e.g., amplifiers, de-amplifiers, negators, adversative conjunctions). EVA expresses sentiment variability using 21 unique features, including length, variance, frequency, and intensity of persistent sentiment states, flip frequencies, and moving averages. A filtered version of EVA, involving the removal of neutral valence words, produces another 21 variants of the original features.
TAACO	168	TAACO measures the degree of lexical and semantic overlaps across text¹³. Lexical overlap is measured by counting overlapping lemma and part-of-speech tags across sentences and paragraphs¹³, while semantic overlap is measured using LSA, latent Dirichlet allocation, and word2vec scores²³. Other features include type-token ratios, connectives, and givenness measures.
TAALES	485	TAALES measures lexical sophistication with n-gram frequencies, ranges, and strength-of-association scores that were calculated using various reference corpora²⁴. An n-gram’s frequency refers to the number of times it appears in the reference corpus, while its range refers to the number of corpus’s documents it appears in. An n-gram’s strength-of-association score measures the probability of its components co-occurring as an n-gram. Additional features include psycholinguistic word information, word recognition scores, and word neighborhood information.
TAMMI	66	TAMMI extracts morphological information including basic morpheme counts, morphological variety and complexity, and morpheme type-token counts¹⁷. Basic morphemes include derivational and inflectional morphemes. Morphological variety and complexity are measured using scores derived from the Morphological Complexity Index⁴⁶. TAMMI also calculates morpheme type-token counts and integrates information from MorphoLex to compute morpheme frequencies, family sizes, and hapax counts⁴⁷.
TAASSC	355	TAASSC evaluates syntactic complexity and sophistication using classic complexity and verb argument construction (VAC) features⁴⁸. Classic complexity features measure the length and diversity of word structures such as sentences, T-units, and clauses⁴⁹, while VAC features measure verb, VAC, and verb-VAC frequencies with reference to the Corpus of Contemporary American English (COCA)⁵⁰.
TAALED	38	TAALED measures lexical diversity across three dimensions: volume, abundance, and variety²⁶. Volume refers to the total number of words, while abundance refers to the total number of unique lemmas. Lexical variety features include hypergeometric distribution scores, moving average type-token ratios, and measure of textual lexical diversity scores.

Back to article page

Search

Advanced search

Quick links