Fig. 2
From: Activity in perceptual classification networks as a basis for human subjective time perception

Simplified depiction of the time estimation model. Salient changes in network activation driven by video input are accumulated and transformed into standard units for comparison with human reports. The bottom left shows two consecutive frames of video input. The connected coloured nodes depict network structure and activation patterns in each layer in the classification network for the inputs. L2 gives the Euclidean distance between network activations to successive inputs for a given network layer (layers conv2, pool5, fc7, output). Neurons across the hierarchical layers of the classification network are differentially responsive to feature complexity in images, with higher layers more responsive to object-like archetypes and lower layers to primitive features like edges or contours (e.g. see Fig. 4 in ref. 41). In the Change Detection stage, the value of L2 for a given network layer is compared to a dynamic threshold (red line). When L2 exceeds the threshold level, a salient perceptual change is determined to have occurred, a unit of subjective time is determined to have passed and is accumulated to form the base estimate of time. Support vector regression is applied to convert this abstract time estimate into standard units (in s) for comparison with human reports