Figure 4 | Scientific Reports

Figure 4

From: Multi-modal self-adaptation during object recognition in an artificial cognitive system

Figure 4

Visual classifier threshold and the Molyneux mechanism. (a) With the synthesized video generated from human object manipulation, we trained a one-vs-all based CNN model, generating an individual classifier for each class. In order to determine the output label of a classifier, we have studied which accuracy threshold results in the best performance of the visual model. By setting the threshold to 0.5, we obtain the confusion matrices shown at the bottom after testing the model with normal vision images (like the ones used for training) and blurred vision images. (b) Classifier Module. Every video frame and its corresponding haptic state are classified by the visual and haptic object recognition systems, respectively. There are three possible outputs of the visual recognition system: (i) a single class, (ii) multiple classes (CF), or (iii) none (IG). On the other hand, the label from the haptic recognition system is always univocal. (c) Molyneux Module. By applying the Molyneux mechanism, we check if visual and haptic recognition agree, which we call a match. Otherwise, we call this disagreement a mismatch. In the case where visual recognition result is CF, it will be a match if one of the possible classes agrees with the haptic classification. All mismatch situations are shown.

Back to article page