Fig. 5: Architecture of the 3D CNN for heritage building classification.
From: Binary damage classification of built heritage with a 3D neural network

The three-dimensional input layer (voxel maps) is introduced with a fixed-size I × J × K = 32 × 32 × 32 binary voxels. Then, (1) the first convolutional layer applies filters f = 32 of size d = 5, a spatial stride s = 2, and zero padding with ReLU activation. Next, (2) the second convolutional layer applies filters f = 32 of size d = 3, a spatial stride s = 1, and zero padding with ReLU activation. After that, (3) the pooling layer reduces the input size by a factor of m = 2. Later, (4) the first fully connected layer uses ReLU activation and provides 128 output neurons, N. Then, (5) the second fully connected layer uses sigmoid activationhas and provides one output neuron. Finally, the output layer delivers a probabilistic result, 0 for incomplete building and 1 for complete building.