Table 1 List of all lyrical descriptors extracted for the two datasets, including a brief description.
From: Song lyrics have become simpler and more repetitive over the last five decades
Name | Description |
|---|---|
Lexical descriptors | |
 Line counts | Total number of lines, blank lines, unique lines, ratio of blank and repeated lines |
 Token counts | Number of tokens, characters, repeated token ratio, unique tokens per line, and avg. tokens per line |
 Character counts | Number of [!?.,:;”-()] and digits (total amount of these characters and individual counts per character), ratio of punctuation and digits |
 Token length | Average length of tokens |
 n-gram ratios | Ratio of unique bigrams and trigrams |
 Legomenon ratios | Ratio of hapax legomena, dis legomena and tris legomena |
 Parts of speech | Frequency of adjectives, adverbs, nouns, pronouns, verbs |
 Past tense | Percentage of verbs in past tense |
 Stop words | Number and ratio of stop words, stop words per line |
 Uncommon words | Number of uncommon words (i.e., words not contained WordNet60) |
Diversity descriptors | |
 Compression ratio | Ratio of the size of zlib compressed lyrics vs. the original, uncompressed lyrics |
 Diversity measures | Measure of Textual Lexical Diversity (MTLD), Herdan’s C, Summer’s S, Dugast’s \(U^2\) and Maas’ \(a^2\) |
The diversity descriptors were extracted using the Python lexical_diversity and lexicalrichness library. | |
Readability descriptors | |
 Readability formulas | Flesch Reading Ease, Flesch Kincaid Grade, SMOG (Simple Measure of Gobbledygook), Automated Readability Index, Coleman Liau Index, Dale Chall Readability Score, Linsear Write Formula, Gunning Fog, Fernandez Huerta, Szigriszt Pazos and Gutierrez Polini |
 Difficult words | Number of difficult words (consisting of three or more syllables) |
The readability descriptors were extracted using the Python textstat library. | |
Rhyme descriptors | |
 Rhyme structures | Numbers of couplets, clerihews, alternating rhymes and nested rhymes |
 Rhyme words | Number of unique rhyming words, percentage of rhyming lines in the lyrics |
 Alliterations | Number of alliterations of length two, three, and four or more |
The rhyme descriptors were extracted using the Python pronouncing library, which provides an interface to the Carnegie Mellon University Pronouncing Dictionary. | |
Structural descriptors | |
 Element counts | Number of sections and verses |
 Distribution | Relation between the number of verses vs. sections and number of choruses vs sections |
 Title occurrences | Number of times the song’s title appears |
 Pattern | Verse and chorus alternating, two verses and at least one chorus, two choruses and at least one verse |
 Start | Starts with chorus (binary attribute) |
 Ending | Ends with two chorus repetitions (binary attribute) |
Emotional descriptors | |
 Sentiment scores | Positivity and negativity scores via AFINN61, the sentiment lexicon by Bing Liu et al.62, the MPQA opinion corpus63, the sentiment140 dataset64 and the SentiWordNetlexicon65 |
 NRC | Emotion scores according to the NRC affect intensity lexicon66 |
 LIWC | Descriptors provided by LIWC39 |
 Happiness | Happiness score according to labMT67 |