Table 19 Data processing steps and results.
Steps | Step description | #Videos | Video duration | #Frames | #Humans detected |
---|---|---|---|---|---|
1 | Identification of YouTube Videos | 1439 | NA | NA | NA |
2 | Downloading and Frame Extraction | 1359 | 49Â h:33Â min:36Â s | 178,413 | NA |
3 | Object Detection Step: Dropping frames with no humans | 1359 | Model 2: | Model 2: | Model 2: |
35Â h:16Â min:22Â s (71.2%) | 126,981 (71.2%) | 500,709 | |||
Model 4: 15Â h:50Â min:59Â s (32.0%) | Model 4: 57,060 (32.0%) | Model 4: 110,769 | |||
Model 5: | Model 5: | Model 5: | |||
35Â h:8Â min:5Â s (70.9%) | 126,486 (70.9%) | 489,349 | |||
4 | Distance Estimation Step I: Dropping frames with < 3 human objects | 1359 | Model 2: | Model 2: | Model 2: |
21Â h:31Â min:29Â s (43.4%) | 77,490 (43.4%) | 427,777 (85.4%) | |||
Model 4: 3Â h:45Â min:28Â s (7.6%) | Model 4: 13,527 (7.6%) | Model 4: 53,802 (48.6%) | |||
Model 5: | Model 5: | Model 5: | |||
21Â h:37Â min:4Â s (43.6%) | 77,824 (43.6%) | 417,487 (85.3%) | |||
5 | Distance Estimation Step II: Dropping frames with a single class | 1359 | Model 2: | Model 2: | Model 2: |
13Â h:50Â min:52Â s (27.9%) | 49,852 (27.9%) | 301,311 (60.2%) | |||
Model 4: 2Â h:0Â min:32Â s (4.1%) | Model 4: 7234 (4.1%) | Model 4: 30,257 (27.3%) | |||
Model 5: | Model 5: | Model 5: | |||
4Â h:40Â min:33Â s (9.4%) | 16,834 (9.4%) | 109,381 (22.4%) |