Figure 1
From: Deep Cytometry: Deep learning with Real-time Inference in Cell Sorting and Flow Cytometry

Data preparation and deep learning pipeline. (a) The creation of the dataset. The raw TS-QPI waveform files collected by the ultrafast ADC are used as input data directly without conversion to images. Each waveform is divided into 100 waveform elements with an overlap ratio of 50%, creating the redundancy to enhance the training stability. At the beginning, these waveform elements are one-dimensional time-series data. To fit with the conventional convolutional neural network architectures, the waveform elements are reshaped into two dimensions: one dimension corresponds to the laser pulses in each element, the other dimension corresponds to the sampling points per pulse. To shorten the processing time, the digital resolution is further reduced by a reduction factor of 40 in the first dimension of the reshaped waveform elements. The resulting dataset composed of reshaped and reduced waveform elements is fed into the deep learning model as input examples. The whole dataset is split into three subsets consisting of training, validation, and test datasets. Since the entire dataset is too large to be processed at once due to the memory limitations, only a batch of examples is loaded and learned by the model at each iteration. (b) Architecture of the learning model. The deep convolutional neural network model (inspired by VGGNet) consists of 16 convolutional layers, three max pooling layers, and three fully-connected layers. The convolutional layers extract and learn the features of the input examples with 3 × 3 kernels (m × Conv3 − p + ReLU stands for m convolutional layers with p output filters and ReLU activation functions). Then max pooling is performed to reduce the number of parameters and computations. The first two fully-connected layers have full connections to all nodes in the previous layer and both apply dropout regularization after them. The third fully-connected layer computes the logits, which are the unscaled log probabilities of the three categories, namely SW-480 colorectal cancer cells, OT-II hybridoma white blood cells, and running buffer alone (blank examples). Finally, the probabilities of the three categories are output after a softmax layer, and the input example is classified.