Introduction

Fiber-reinforced ceramic-matrix composites are advanced materials used in aerospace gas-turbine engines1,2 and nuclear fusion3, due to their resistance to temperatures 100–200 °C higher than alloys used in the same applications.

Larson et al. investigated new manufacturing processes for curing preceramic polymer into unidirectional fiber beds, studying the microstructure evolution during matrix impregnation with the aim of reinforcing ceramic-matrix composites4,5. They used X-ray computed tomography (CT) to characterize the three-dimensional microstructure of their composites non-destructively, studying their evolution in-situ while processing the materials at high temperatures4 and describing overall fiber bed properties and microstructures of unidirectional composites5. The X-ray CT images acquired from these fiber beds are available at Materials Data Facility6.

Larson et al.’s fiber beds have widths of approximately 1.5 mm, containing 5000–6200 fibers per stack. Each fiber has an average radius of 6.4 ± 0.9 μm, with diameters ranging from 13 to 20 pixels in the micrographs5. They present semi-supervised techniques to separate the fibers within the fiber beds; their segmentation is available for five samples7. We were curious to see whether their results could be improved using different techniques.

In this study we separate fibers in ex-situ X-ray CT fiber beds of nine samples from Larson et al. Our paper makes the following contributions:

  • It annotates, explains, and expands Larson et al.’s dataset7 to facilitate reproducible research and benchmarking.

  • It provides open source tools to analyze such datasets, so that researchers may compare their results with ours and one another’s.

  • It shows that automated analysis can perform similarly to or better than human steered fiber segmentations.

The samples we used in this study correspond to two general states: wet — obtained after pressure removal — and cured. These samples were acquired using microtomographic instruments from the Advanced Light Source at Lawrence Berkeley National Laboratory operated in a low-flux, two-bunch mode5. We used their reconstructions obtained without phase retrieval; Larson et al. provide segmentations for five of these samples7, which we compare to our results.

To separate the fibers in these samples, we tested four different fully convolutional neural networks (CNN), algorithms from computer vision and deep learning. When comparing our neural network approach to Larson et al.’s results, we obtained Dice8 and Matthews9 coefficients greater than 92.28 ± 9.65%, reaching up to 98.42 ± 0.03%, showing that the network results are close to the human-supervised ones in these fiber beds, in some cases separating fibers that the algorithms created by Larson et al.5 could not find. All software and data generated in this study are available for download, along with instructions for their use. The code is open source, released under a permissive software license, and can be adapted easily for other domains.

Results

Larson et al. provide segmentations for their fibers (Fig. 1) in five of the wet and cured samples, obtained using the following pipeline5:

  1. 1.

    Fiber detection using the circular Hough transform10,11;

  2. 2.

    Correction of improperly identified pixels using filters based on connected region size and pixel value, and by comparisons using ten slices above and below the slice of interest;

  3. 3.

    Separation of fibers using the watershed algorithm12.

Fig. 1
figure 1

Slice number 1000 from the sample “232p3 wet”, provided in Larson et al.’s dataset7. The whole sample contains 2160 slices. This slice represents the structure of the samples we processed: they contain the fiber bed (large circular structure) and the fibers within it (small round elements).

Their paper gives a high-level overview of these steps, but provides no details on parameters used, nor the source code for computing their segmentation. We tried different approaches to reproduce their results, focusing on separating the fibers in the fiber bed samples. Our first approach was to create a classic, unsupervised image processing pipeline. We used histogram equalization13, Chambolle’s total variation denoising14,15, multi-Otsu threshold16,17, and the WUSEM algorithm18 to separate each single fiber. The result is a labeled image containing the separated fibers (Fig. 2). The pipeline had limitations when processing fibers on the edges of fiber beds, where its labels differed from those produced by Larson et al. Restricting the segmentation region to the center of beds gives satisfactory results (Fig. 2(e)), but reduces the total number of detected fibers.

Fig. 2
figure 2

Rendering fibers detected in the limited region of interest by the classic pipeline. We illustrate the classic image processing pipeline using Fig. 1 as the input image. This solution had limitations when processing fibers on the edges of fiber beds. (a) Histogram equalization and TV Chambolle’s filtering (parameter: weight = 0.3). (b) Multi Otsu’s resulting regions (parameter: classes = 4). Fibers are located within the fourth region (in yellow). (c) Binary image obtained considering region four in (b) as the region of interest, and the remaining regions as the background. (d) the processed region from (c), as shown in Fig. 1. (e) Regions resulting from the application of WUSEM on the region shown in (d) (parameters: initial_radius = 0, delta_radius = 2, watershed_line = True). Colormaps: (a,c,d) gray, (b) viridis, (e) nipy_spectral.

To obtain more robust results, we evaluated four fully convolutional neural network architectures: Tiramisu19 and U-Net20, as well as their three-dimensional counterparts, 3D Tiramisu and 3D U-Net21. We also investigated whether three-dimensional networks generate better segmentation results, leveraging the structure of the material.

Fully convolutional neural networks (CNN) for fiber detection

We implemented four architectures of fully convolutional neural networks (CNNs) — Tiramisu, U-Net, 3D Tiramisu, and 3D U-Net — to reproduce the results provided by Larson et al. Labeled data, in our case, consists of fibers within fiber beds. To train the neural networks to recognize these fibers, we used slices from two different samples: “232p3 wet” and “232p3 cured”, registered according to the wet sample. Larson et al. provided the fiber segmentation for these samples7, which we used as labels in the training. The training and validation datasets contained 250 and 50 images from each sample, respectively, in a total of 600 images. Each image from the original samples have width and height size of 2560 × 2560 pixels.

For all networks, we used a learning rate of 1−4, and binary cross entropy22 as the loss function. During training, the networks reached accuracy higher than 0.9 and loss lower than 0.1 on the first epoch. Two-dimensional U-Net is the exception, presenting loss of 0.23 at the end of the first epoch. Despite that, 2D U-Net reaches the lowest loss between the four architectures at the end of its training. 2D U-Net is also the fastest network to finish its training (7 h, 43 min), followed by Tiramisu (13 h, 10 min), 3D U-Net (24 h, 16 min) and 3D Tiramisu (95 h, 49 min, Fig. 3).

Fig. 3
figure 3

Accuracy (a) and loss (b) through time for each training epoch. We attribute the subtle loss increase or accuracy decrease on the start of each epoch to the data augmentation process.

Examining convergence behavior on the first epoch, the 2D U-Net does not progress as smoothly as the other networks (Fig. 4). However, this does not impair U-Net’s accuracy (0.977 after one epoch). Accuracy and loss for the validation dataset also improve significantly: Tiramisu had validation loss vs. validation accuracy ratio of 0.034 while U-Net had 0.048, and both 3D architectures had ratios of 0.043. The large size of the training set and the similarities between slices in the input data are responsible for these high accuracies and low losses.

Fig. 4
figure 4

Accuracy vs. loss on the first epoch. Accuracy surpasses 0.9 and loss is lower than 0.1 for all networks during the first epoch, except for 2D U-Net (loss of 0.23). Validation accuracy and validation loss on the first epoch are represented by diamonds.

We used the trained networks to predict fiber labelings for twelve different datasets in total. These datasets were made available by Larson et al.7, and we keep the same file identifiers for fast cross-reference:

  • “232p1”: wet

  • “232p3”: wet, cured, cured registered

  • “235p1”: wet

  • “235p4”: wet, cured, cured registered

  • “244p1”: wet, cured, cured registered

  • “245p1”: wet

Here, the first three numeric characters correspond to a sample, and the last character correspond to different extrinsic factors, e.g. deformation. Despite being samples from similar materials, the reconstructed files presented several differences, for example regarding amount of ringing artifacts, intensity variation, noise, therefore they are considered as different samples in this paper.

We calculated the average prediction time for each sample (Fig. 5). As with the training time results, 2D U-Net and 2D Tiramisu are the fastest architectures to process a sample, while 3D Tiramisu is the slowest.

Fig. 5
figure 5

Mean and standard deviation of prediction times for each sample. As with processing, during training 2D U-Net and 2D Tiramisu were the fastest architectures to process a sample in one hour, on average. 3D Tiramisu, being the slowest, takes on average more than a day to process one sample.

Evaluation of our results and comparison with Larson et al. (2019)

After processing all samples, we compared our predictions with the results that Larson et al. made available on their dataset7. They provided segmentations for five datasets from the twelve we processed: “232p1 wet”, “232p3 cured”, “232p3 wet”, “244p1 cured”, “244p1 wet”.

First, we compared our predictions to their results using receiver operating characteristic (ROC) curves and the area under curve (AUC, Fig. 6). AUC is larger than 98% for all comparisons; therefore, our predictions are accurate when compared with the semi-supervised method suggested by Larson et al.5. The 2D versions of U-Net and Tiramisu have similar results, performing better than 3D U-Net and 3D Tiramisu.

Fig. 6
figure 6

Receiver operating characteristic (ROC) and area under curve (AUC) obtained from the comparison between prediction and gold standard. We consider Larson et al.’s segmentation7 as the gold standard in this case. ROC curves were calculated for all slices in each dataset; their mean areas and standard deviation intervals are presented. AUC is larger than 98% in all comparisons.

We also examined the binary versions of our predictions and compared them with Larson et al.’s results. For each slice or cube from the dataset, we used a hard threshold of 0.5; values above that are considered as fibers, while values below that are treated as background. We used Dice8 and Matthews9 correlation coefficients for our comparison (Table 1). The comparison using U-Net yields the highest Dice and Matthews coefficients for three of five datasets. Tiramisu had the highest Dice/Matthews coefficients for the “244p1 cured” dataset, and both networks have similar results for “232p1 wet”. 3D Tiramisu had the lowest Dice and Matthews coefficients in our comparison.

Discussion

The analysis of ceramic matrix composites (CMC) depends on the detection of its fibers. Semi-supervised algorithms, such as the one presented by Larson et al.5, can perform that task satisfactorily. The description of that specific algorithm, however, lacks information on parameters necessary for replication. It also includes steps that involve manual curation. As such, it was not possible for us to reimplement it fully.

Convolutional neural networks are being used successfully in the segmentation of different two- and three-dimensional scientific data23,24,25,26,27,28, including microtomographies. For example, fully convolutional neural networks were used to generate 3D tau inclusion density maps29, to segment the tidemark on osteochondral samples30, and 3D models of structures of temporal-bone anatomy31.

Researchers have been studying fiber-analysis detection for a while, using a variety of tools. Approaches include tracking, statistical methods, and classical image processing32,33,34,35,36,37,38,39. To the best of our knowledge, there are two different deep learning approaches applied to this problem:

  • Yu et al.40 use an unsupervised learning approach based on Faster R-CNN41 and a Kalman filter based tracking. They compare their results with Zhou et al.36, reaching a Dice coefficient of up to 99%.

  • Miramontes et al.42 reach an average accuracy of 93.75% using a 2D LeNet-5 CNN43 to detect fibers in a specific sample.

Our study builds upon previous work by using similar material samples, but it expands tests to many more samples and it includes the implementation and training of four architectures: 2D U-Net, 2D Tiramisu, 3D U-Net, and 3D Tiramisu, used to process twelve large datasets (≈140 GB total), and comparing our results with the gold standard labeling provided by Larson et al.7 for five of them. We used ROC curves and their area under curve (AUC) to ensure the quality of our predictions, obtaining AUC larger than 98% (Fig. 6). Also, Dice and Matthews coefficients were used to compare our results with Larson et al.’s solutions (Table 1), reaching coefficients of up to 98.42 ± 0.03%.

Table 1 Dice and Matthews coefficients for each sample, obtained from the comparison of our neural network results and data from Larson et al.7.

When processing a defective slice (a slice with severe artifacts), the 3D architectures perform better than the 2D ones since they are able to leverage information about the structure of the material (Fig. 7).

Fig. 7
figure 7

A defective slice on the sample “232p3 wet” and the segmentation produced by each architecture. Segmentations computed by 2D architectures are impaired by defects in the input image, while 3D architectures leverage the sample structure to achieve better results. (a) Original defective image, (b) U-Net prediction, (c) 3D U-Net prediction, (d) Tiramisu prediction, (e) 3D Tiramisu prediction.

Based on the research presented, we recommend using the 2D U-Net to process microtomographies of CMC fibers. Both 2D networks lead to similar accuracy and loss values in our comparisons (Table 1); however, U-Nets converge more rapidly and are therefore computationally cheaper to train than Tiramisu. The 3D architectures, while performing better on defective samples (Fig. 7), do not generally achieve better results than the 2D architectures. In fact, the 3D architectures require more training to achieve comparable accuracy (Fig. 3) and are slower to predict (Fig. 5), therefore requiring considerable additional computation for marginal gains.

Our CNN architectures perform at the level of human-curated accuracy — i.e., Larson et al.’s semi-supervised approach —, sometimes even surpassing it. For instance, the 2D U-Net identified fibers that the Larson et al. algorithm did not find (Fig. 8).

Fig. 8
figure 8

(a) Visual comparison between 2D U-Net and Larson et al.’s results for sample “232p3 wet”. We divided the slices into 100 tiles, and compared each tile from our U-Net prediction to Larson et al.’s corresponding labels. The tiles presented here are the ones that return the lowest Matthews comparison coefficients. Labels present the Matthews coefficient for each tile. (b,c) tiles showing fibers found only by U-Net (in red), while some well-defined structures close to the borders are found only by Larson et al. (in yellow). Tile size: 256 × 256. Colors set according to the comparison. Blue: true positives; red: false positives; yellow: false negatives; gray: true negatives.

Using labels predicted by the U-Net architecture, we render a three-dimensional visualization of the fibers (Fig. 9). Despite the absence of tracking, the U-Net segmentation clearly outlines fibers across the stack.

Fig. 9
figure 9

Fibers on the sample “232p3 wet” processed using the U-Net architecture. As seen in the longitudinal cut, this pipeline identifies fibers across the sample height despite the absence of tracking.

In this paper, we presented neural networks for analyzing microtomographies of CMC fibers in fiber beds. The data used is publicly available7 and was acquired in a real materials design experiment. Results are comparable to human-curated segmentations; yet, the networks can predict fiber locations in large stacks of microtomographies without any human intervention. Despite the encouraging results achieved in this study, there is room for improvement. For example, the training time of especially the 3D networks turned out to be prohibitive in performing a full hyperparameter sweep. A search for optimal parameters of all networks used could be implemented in a future study. We also aim to investigate whether an ensemble of networks will perform better. We would also like to explore how to best adjust thresholds at the last layer of the network. Here, we maintained a hard threshold of 0.5 that suited the sigmoid on the last layer of the implemented CNNs, but one could, e.g., use conditional random field networks instead.

Methods

Fully convolutional neural networks

We implemented four architectures — two dimensional U-Net20 and Tiramisu19, and their three-dimensional versions — to attempt improving on the results provided by Larson et al. We used supervised algorithms: they rely on labeled data to learn what are the regions of interest — in our case, fibers within microtomographies of fiber beds.

All CNN algorithms were implemented using TensorFlow44 and Keras45 on a computer with two Intel Xeon Gold processors 6134 and two Nvidia GeForce RTX 2080 graphical processing units. Each GPU has 10 GB of RAM.

To train the neural networks in recognizing fibers, we used slices from two different samples: “232p3 wet” and “232p3 cured”, registered according to the wet sample. Larson et al. provided the fiber segmentation for these samples, which we used as labels in the training. The training and validation procedures processed 350 and 149 images from each sample, respectively; a total of 998 images. Each image from the original samples have width and height size of 2560 × 2560 pixels.

To feed the two-dimensional networks, we padded the images with 16 pixels, of value zero, in each dimension. Then, each image was cut into tiles of size 288 × 288, each 256 pixels, creating an overlap of 32 pixels. These overlapping regions, which are removed after processing, avoid artifacts on the borders of processed tiles. Therefore, each input slice generated 100 images with 288 × 288 pixels, in a total of 50,000 images for the training set, and 10,000 for the validation set.

We needed to pre-process the training images differently to train the three-dimensional networks. We loaded the entire samples, each with size 2160 × 2560 × 2560, and padded their dimensions with 16 pixels of value zero. Then, we cut slices of size 64 × 64 × 64 voxels, each 32 pixels. Hence, the training and validation sets for the three-dimensional networks have 96,000 and 19,200 cubes, respectively.

We implemented data augmentation, aiming for a network capable of processing samples with varying characteristics. We augmented the images on the training sets using rotations, horizontal and vertical flips, width and height shifts, zoom, and shear transforms. For that, we used Keras embedded tools within the ImageDataGenerator module to augment images for the two-dimensional networks. Since Keras’s ImageDataGenerator is not able to process three-dimensional input so far, we adapted the ImageDataGenerator module. The adapted version we used in this study is named ChunkDataGenerator, and is provided at the repository presented in the section Code Availability, along with the software produced in this study.

To reduce the possibility of overfitting, we implemented dropout regularization46. We followed the suggestions in the original papers for U-Net architectures: 2D U-Net received a dropout rate of 50% in the last analysis layer and in the bottleneck, while 3D U-Net21 did not receive any dropout. The Tiramisu structures received a dropout rate of 20%, as suggested by Jégou et al.19.

Hyperparameters

To better compare the networks, we maintain the same training hyperparameters when possible. Ideally, we would conduct a hyperparameter sweep — a search for the optimal hyperparameters for each network —, but training time turned out to be prohibitive, especially for the three-dimensional networks. Due to the large amount of training data and the similarities between training samples (2D tiles or 3D cubes), we decided to train all architectures for five epochs. The 2D architectures were trained with batches of four images, while the batches for 3D architectures had two cubes each. The learning rate used was 1−4, and the loss function used was binary cross entropy22. We followed advice from the original papers with regards to optimization algorithms: we used the Adam optimizer47 for U-Net architectures, and RMSProp48 for Tiramisu. We implemented batch normalization49 in all architectures, including the 2D U-Net. While Ronneberger et al.20 does not discuss batch normalization explicitly, it has been shown to improve convergence49.

Evaluation

We used Dice8 and Matthews9 correlation coefficients (Eqs. 1, 2) to evaluate our results, assuming that the fiber detections from Larson et al.7 are a reasonable gold standard.

$$Dice=\frac{2\times TP}{2\times TP+FP+{\rm{FN}}}$$
(1)
$$Matthews=\frac{TP\times TN-FP\times FN}{\sqrt{(TP+FN)(TP+FP)(TN+FN)(TN+FP)}}$$
(2)

Dice and Matthews coefficients receive true positive (TP), false positive (FP), true negative (TN), and false negative (FN) pixels, which are determined as:

  • TP: pixels correctly labeled as being part of a fiber.

  • FP: pixels incorrectly labeled as being part of a fiber.

  • TN: pixels correctly labeled as background.

  • FN: pixels incorrectly labeled as background.

TP, FP, TN, and FN are obtained when the prediction data is compared with the gold standard.

Dataset

The dataset accompanying Larson et al.5 includes raw images, segmentation results, and a brief description of segmentation tools — the Hough transform, mathematical morphology, and statistical filters. To reproduce their work fully would have required further information, including metadata, parameters used and, ideally, code for analysis. To aid reproducing segmentation results, we contribute a set of twelve processed fiber beds, based on the Larson et al. data. We also include the weights for each neural network architecture we implemented and trained. These weights can be used to process fibers of similar structure in other datasets.

Visualization

Imaging CMC specimens at high-resolution, such as the Larson et al. samples7, leads to large datasets — for example, each stack we used in this paper occupies around 14 GB after reconstruction, with the following exceptions: the registered versions of cured samples 232p3, 235p4 and 244p1, at 11 GB each, and the sample 232p3 wet at around 6 GB.

Often, specialists need software to visualize results during data collection. Yet, it can be challenging to produce meaningful figures without advanced image analysis and/or computational platforms with generous amounts of memory. We wanted to show that interactive exploration of large datasets is viable on a modest laptop computer. We therefore used matplotlib50 and ITK51 (Fig. 9) to generate all figures in this paper, using a standard laptop with 16 GB of RAM. This means that a scientist could use, e.g., Jupyter Notebooks52 to do quick, interactive probing of specimens during beamtime.