Table 4 VisionTool’s detection accuracy on MOCA dataset, when used as annotator. A Unet with EfficientNetb1 backbone is trained on 50 frames, and used to predict the remaining ones, for each of the 60 videos included in the dataset. The results reported in this table correspond to the average mAP computed across the whole set of videos.

From: A semi-automatic toolbox for markerless effective semantic feature extraction

View point

mAP\(^{0.5}\)

mAP\(^{0.75}\)

mAP\(_{\text {index}}\)

mAP\(_{\text {little finger}}\)

mAP\(_{\text {hand}}\)

mAP\(_{\text {wrist}}\)

mAP\(_{\text {elbow}}\)

mAP

All together

0.992

0.987

0.974

0.945

0.985

0.971

0.976

0.970