Table 19 Data processing steps and results.

Steps	Step description	#Videos	Video duration	#Frames	#Humans detected
1	Identification of YouTube Videos	1439	NA	NA	NA
2	Downloading and Frame Extraction	1359	49 h:33 min:36 s	178,413	NA
3	Object Detection Step: Dropping frames with no humans	1359	Model 2:	Model 2:	Model 2:
			35 h:16 min:22 s (71.2%)	126,981 (71.2%)	500,709
			Model 4: 15 h:50 min:59 s (32.0%)	Model 4: 57,060 (32.0%)	Model 4: 110,769
			Model 5:	Model 5:	Model 5:
			35 h:8 min:5 s (70.9%)	126,486 (70.9%)	489,349
4	Distance Estimation Step I: Dropping frames with < 3 human objects	1359	Model 2:	Model 2:	Model 2:
			21 h:31 min:29 s (43.4%)	77,490 (43.4%)	427,777 (85.4%)
			Model 4: 3 h:45 min:28 s (7.6%)	Model 4: 13,527 (7.6%)	Model 4: 53,802 (48.6%)
			Model 5:	Model 5:	Model 5:
			21 h:37 min:4 s (43.6%)	77,824 (43.6%)	417,487 (85.3%)
5	Distance Estimation Step II: Dropping frames with a single class	1359	Model 2:	Model 2:	Model 2:
			13 h:50 min:52 s (27.9%)	49,852 (27.9%)	301,311 (60.2%)
			Model 4: 2 h:0 min:32 s (4.1%)	Model 4: 7234 (4.1%)	Model 4: 30,257 (27.3%)
			Model 5:	Model 5:	Model 5:
			4 h:40 min:33 s (9.4%)	16,834 (9.4%)	109,381 (22.4%)

Quick links

Search