Table 8 Segments distribution in the dataset.

From: Polish multichannel audio-visual child speech dataset with double-expert sigmatism diagnosis

Segment

#

Data validity level

Segment

#

Data validity level

Segment

#

Data validity level

1.0

0.9

0.5

1.0

0.9

0.5

1.0

0.9

0.5

Words (part 1)

Words (part 2)

Sibilants

bocian

190

190

0

0

bazie

199

189

5

5

c

1,179

1,163

11

5

cebula

197

196

1

0

biegacz

199

193

5

1

ci

732

705

23

4

ciastka

146

133

13

0

dzokej

197

191

5

1

cz

784

761

18

5

czapka

196

196

0

0

kasza

198

192

5

1

drz

589

570

15

4

dziadek

191

191

0

0

lodzie

190

178

5

7

dz

592

538

49

5

dzwonek

199

157

42

0

lokiec

198

190

6

2

dzi

577

558

10

9

jeze

178

123

1

54

pajac

197

193

3

1

rz

1,639

1,562

9

68

kaczka

188

185

0

3

paz

199

191

6

2

s

1,642

1,563

39

40

kalosze

190

189

0

1

radza

198

192

5

1

si

951

926

16

9

koszyk

195

191

0

4

sadzawka

196

189

3

4

sz

2,364

2,311

41

12

koza

178

178

0

0

taca

193

188

3

2

z

934

922

10

2

ksiazka

190

190

0

0

w pasie

198

189

5

4

zi

593

568

16

9

kucharz

192

190

2

0

ziarno

198

190

6

2

Other phonemes

las

180

179

0

1

Sibilant logotomes (part 2)

a

13,643

13,325

239

79

lekarz

142

140

2

0

ca

199

194

4

1

b

965

951

9

5

mazaki

167

167

0

0

cia

197

192

4

1

bi

199

193

5

1

noz

196

195

0

1

cza

193

187

5

1

d

388

388

0

0

owoce

196

195

0

1

drza

194

187

5

2

e

3,795

3,639

55

101

parasol

196

177

19

0

dza

198

193

4

1

f

1,066

1,047

12

7

pies

182

160

0

22

dzia

196

189

5

2

g

398

385

12

1

roza

171

171

0

0

rza

193

185

5

3

h

192

190

2

0

rzeka

174

174

0

0

sa

198

194

3

1

i

1,128

1,119

7

2

salata

161

160

1

0

sia

197

191

4

2

j

572

507

9

56

samolot

196

196

0

0

sza

194

188

4

2

k

4,646

4,565

50

31

siatka

167

165

1

1

za

196

192

3

1

ki

198

190

6

2

strazak

186

175

0

11

zia

196

189

5

2

l

1,635

1,598

34

3

szafa

191

188

3

0

Vowels (part 2)

l’

934

912

13

9

szalik

193

180

12

1

a

199

196

3

0

m

366

364

2

0

sznurek

142

131

11

0

i

195

192

3

0

n

931

864

63

4

szufelka

145

144

1

0

ia

196

193

3

0

o

3,103

2,997

86

20

warzywa

186

186

0

0

iu

195

193

2

0

p

993

948

37

8

waz

195

194

1

0

u

199

196

3

0

pi

182

160

0

22

widelec

197

197

0

0

     

r

1,655

1,593

48

14

zaba

189

186

3

0

     

t

1,049

1,018

18

13

zabawki

190

190

0

0

     

u

1,411

1,390

18

3

zarowka

171

171

0

0

     

w

1,345

1,301

43

1

zegar

199

192

7

0

     

y

581

576

1

4

zyrafa

193

193

0

0

          

Total

6,935

6,715

120

100

Total

5,895

5,716

127

52

Total

53,951

52,367

1,026

558

  

96,8%

1,7%

1,4%

  

97,0%

2,2%

0,9%

  

97,1%

1,9%

1,0%

     

Total

12,830

12,431

247

152

Total (dataset)

66,781

64,798

1,273

710

     

(words&logotomes)

96,9%

1,9%

1,2%

 

97,0%

1,9%

1,1%

  1. The left section contains words presented on the screen (part 1 of the examination), the central section covers words and logotomes repeated by the child following the SLP (part 2), and the right section lists phonemes (sibilants shown separately). Column “#” displays the total number of segment occurences in the dataset. Additional three columns show the distribution of data validity levels in each case. The bottom middle section combines the word/logotome statistics. The bottom right section summarizes the entire collection of segments.