Table 5 Data dictionary for the participant dataset.

From: Polish multichannel audio-visual child speech dataset with double-expert sigmatism diagnosis

Variable

Description

Format, options

participant

Participant ID.

Integer; ranges from 30 to 237.

sex

Participant sex.

A single-character code: F – female, M – male.

age

Participant age at the time of the examination.

A char string “Yy Mm”, where Y, M – years and months, respectively.

unit

Randomized ID of the preschool unit.

Integer; ranges from 1 to 6.

deviceVer

Recording device (MDAD) version ID.

Integer: 1 – ver. 1, closed construction (Fig. 3a), 2 – ver. 2, open construction (Fig. 3b).

participantFolderName

Participant folder name in the repository (Fig. 5).

A five-character code 00XXX, where XXX is a three-digit representation of the participant field.

recording_1 recording_2

Presence/completeness of the recording of part 1 or 2 of the examination

Numerical: 1 (complete), 0.5 (incomplete—refers to missing audio data in channels 1–5 from the top recording arc), or 0 (missing).

nSegments_1 nSegments_2

Total number of AV segments in the recording of part 1 or 2 of the examination.

Integer or empty if there is no recording.

nWords_1 nWords_2

Total number of AV word segments in the recording of part 1 or 2 of the examination.

Integer or empty if there is no recording.

nPhones_1nPhones_2

Total number of AV phoneme segments in the recording of part 1 or 2 of the examination.

Integer or empty if there is no recording.

articulation

Initial overall classification of the participant’s articulation of sibilants.

A char string: typical or atypical.