Background & Summary

Evaluating the rate at which cementitious materials absorb water is essential for understanding their long-term performance, particularly when subjected to chloride-induced corrosion, freeze-thaw damage, and sulfate attack1,2,3. The transport characteristics of these materials arise from three main mechanisms: diffusion4, absorption5, and permeability6, each significantly shaped by the connectivity, tortuosity, size, and overall volume of pores in the matrix. Absorption is significant among these transport properties, reflecting how an initially dry cementitious surface takes in water. Initially, water uptake is primarily driven by rapid capillary suction, influenced by Darcy’s law and the Laplace equation7. Over time, this swift absorption phase is replaced with slower processes, including the dissolution of entrapped air governed by Henry’s law8, or liquid diffusion into calcium silicate hydrate (C-S-H) layers9. Sorptivity, as defined by Hall and Hoff, quantifies a material’s tendency to transmit or absorb liquid through capillary action10. While the initial, rapid stage of water uptake is well-characterized, the subsequent slower secondary phase remains a subject of ongoing research and debate11. Conventional sorptivity measurement techniques, such as ASTM C158512, involve manually weighing samples at specific intervals over several days, rendering the test laborious, time-consuming, and unable to provide continuous data acquisition. To simplify this labor-intensive testing, automated weight measurement has been proposed13; however, challenges such as submersion errors, buoyancy, surface tension effects, and inaccurate secondary sorptivity predictions hinder its practicality. Although more advanced imaging technologies like X-ray CT14, EIT + X-ray Tomography15, X-ray Transmission/Attenuation16, and Neutron Radiography17 can provide detailed insights into internal water movement, their high cost and complexity limit routine use.

To overcome these barriers, there is increasing interest in automated, cost-effective methods that utilize modern computational tools for real-time sorptivity evaluation. Recently, we introduced two unique computer vision-based methods to predict sorptivity. The first, termed the “droplet” method, leverages rapid surface-wetting characteristics, including contact angle dynamics and drop-spreading rates, to estimate initial sorptivity within minutes, offering strong correlations (adjusted R2 ≥ 0.9) with traditional methods18,19. The second, known as the “waterfront” method, employs a custom computer vision model trained on thousands of real and synthetic images to monitor water absorption in real-time20,21. The latter approach accurately predicts initial and secondary sorptivity values by detecting penetration dynamics with R2 > 0.9 in various cementitious systems, bridging the gap between traditional measurements and automated, low-cost durability assessments. In this context, for this data descriptor, we are sharing SorpVision, a comprehensive training dataset of 7,384 images (5,000 real and 2,384 synthetic)22, which is designed to facilitate automated sorptivity analysis. By utilizing data from paste samples with w/c ratios of 0.4–0.8 and curing durations of 1–7 days, our custom vision-based model trained on SorpVision achieves high predictive accuracy for both initial and secondary sorptivity (R2 ≥ 0.97 for pastes). This approach also generalizes successfully to mortar and concrete, with R2 values of 0.96 and 0.87 for initial sorptivity and 0.74 and 0.65 for secondary sorptivity, respectively. Consequently, SorpVision offers a scalable, low-cost resource to enhance durability assessments and improve the monitoring of water absorption in cementitious materials.

Methods

Real dataset generation

To generate our real dataset, 10 mm paste samples were positioned between orthogonally aligned USB microscope cameras (Fig. 1a), capturing a total of 1,250 images as the water levels rose in the samples (Fig. 1b). Each image was annotated by four independent annotators (Fig. 1c), resulting in 5,000 annotated samples. The USB cameras used for image capture were surrounded by LED lights, enabling image acquisition under two distinct lighting conditions: diffuse reflections, with LEDs turned off under natural room lighting (Fig. 1a, bottom row, and Fig. 1b, top-right columns), and specular reflections, with LEDs turned on in a darker room (Fig. 1a, top row, and Fig. 1b, top-left columns). Moreover, the acquired images (640 × 480 pixels) were downsampled to 448 × 448 pixels, balancing computational efficiency and accuracy20. To ensure transparency, in addition to the dimensions and mass changes of specimens, we have uploaded the entire real and synthetic dataset to Zenodo23, including original 640 × 480 pixels VGA videos, 640 × 480 pixels video frames with annotations, and the 448 × 448 pixels resized versions used for training of our vision-based model.

Fig. 1
figure 1

SorpVision Dataset Creation for Automated Water Absorption Measurement in Cementitious Materials. (a) Orthogonally aligned USB microscope cameras capture liquid infiltration under LED-driven specular lighting (top) and ambient diffuse lighting (bottom) conditions. (b) Real and synthetic image generations, along with their binary masks. (c) Four annotators created binary masks of the water levels in paste samples. (d) Error maps reveal increased variability at lower water levels. (e) More annotators reduce variability, with diminishing discrepancies beyond three annotators. (f) At lower water levels, annotators produce higher error rates and variability, making it harder to capture fine details accurately. The data, adapted from Kabir et al.20, is presented in a modified format.

We additionally present Fig. 2, which outlines the complete data preparation and annotation workflow. Initially, as illustrated in Fig. 2a, all original VGA videos (640 × 480 pixels in MP4 format) were captured from different specimens23. Each video was then converted into a sequence of individual images (640 × 480 pixels), as depicted in Fig. 2b. Because our annotators manually marked the sample boundary and waterfront on an iPad, it was necessary to convert these images (*.jpg) into PDF format. However, as shown in Fig. 2c, an improper “orientation settings” option could distort image dimensions. Therefore, each image was carefully resized to a 6.4 by 4.8 ratio in landscape orientation, using the “scale to fit” feature as indicated in Fig. 2d, prior to PDF creation for accurate annotation.

Fig. 2
figure 2

Data Preparation and Annotation Workflow. (a) Original VGA videos (640 × 480 pixels) were captured from different specimens. (b) Each video was converted into a sequence of individual image frames for annotation. (c) Incorrect PDF print settings distorted image dimensions due to improper orientation and scaling. (d) Correct settings included landscape orientation, scale-to-fit enabled, and paper size matching image dimensions. (e) Images were annotated on an iPad by marking the sample boundary and waterfront using a stylus pen. (f) Annotated PDFs were processed using the custom Jupyter Notebook flood_fill_helper.ipynb to generate final segmentation masks.

After generating the correctly scaled PDFs, our annotators imported them onto an iPad (Fig. 2e), where the waterfront and sample boundary were manually outlined using a stylus pen. This bidirectional data transfer between a personal computer (hosting the JPGs) and the iPad (hosting the resulting PDFs) facilitated streamlined annotation while maintaining image fidelity. Accurate capture of the boundary and waterfront was crucial, as these annotations serve as ground truth for the subsequent segmentation step. Finally, as shown in Fig. 2f, the annotated boundaries were processed through our custom Jupyter Notebook, “flood_fill_helper.ipynb”22, which automatically filled the region between the boundary and the waterfront to produce the final segmented masks (640 × 480 pixels).

Variations in image annotation, shown as an error map in Fig. 1d, highlight the discrepancies that can arise when multiple individuals manually mark regions of interest (ROIs). Increasing the number of annotators from one to two offers diverse perspectives, enhancing model generalization and improving agreement. However, beyond three annotators, inter-annotator agreement plateaus, minimizing additional discrepancies in semantic segmentation (Fig. 1e). It is crucial to recognize that lower water levels present more significant challenges for accurate annotation, as smaller areas increase the likelihood of errors. This is evident in the percentage error and coefficient of variation (COV) among the four annotators (Fig. 1f), where annotators often struggle to capture fine details accurately.

Synthetic dataset generation

To minimize subjectivity in mask annotation, we created a synthetic dataset of 2,384 images using Microsoft PowerPoint. Cementitious textures were enhanced through the application of artistic effects, adjustments to lighting, and modifications of contrast, thereby replicating real-world conditions (see Fig. 1b, bottom rows). Additionally, these textures were originally made at 448 × 448 pixels. The simulation of water levels was achieved by overlaying layers of 90% transparent grey to signify saturation. Segmented masks were generated automatically by converting the ROI/layers to white and the background to black, thereby ensuring clarity and precision reproducibility. The dataset included diffuse and specular reflections and varied water level orientations, replicating complex surface interactions observed experimentally. This synthetic dataset23 provided a robust foundation for training computer vision models while reducing biases from manual annotations.

Hardware setup

The hardware configuration shown in Fig. 1a was meticulously designed to ensure an accurate and reproducible assessment of water absorption rates in cement paste cubes. For small paste samples, a transparent glass vial, measuring over 15 mm in width and 27 mm in height, was chosen to securely accommodate the paste samples while seamlessly integrating with the experimental setup. The vial features a polypropylene snap-top cap, allowing for controlled water introduction via a syringe. The syringe dispenses water at a calibrated flow rate over approximately three seconds, ensuring uniform wetting of the sample surfaces. The paste cubes were placed within the vial on top of hexagonal metallic beads to maintain a consistent water submersion depth of 0.5–2 mm. The cubes were positioned to align their faces orthogonally to the cameras, optimizing visibility and minimizing variability. Minor tilts in cube orientation were permissible, as computer vision algorithms could accommodate slight rotations within the ROI. Image acquisition was conducted using two orthogonally aligned USB microscopy cameras, each priced at less than $30. These cameras offer Full HD (1920 × 1080 pixels) or VGA (640 × 480 pixels) resolutions, with a minimum spatial resolution of 10 µm and magnification ranging from 50 × to 1000 ×. Equipped with a 120-degree angle of view, the cameras were surrounded by LED lighting to support image capture under both diffuse and specular reflection conditions. The lighting configuration is adjustable to accommodate different environments, enhancing the visibility of wetted areas on the sample surfaces. The water level is maintained within a narrow range of 0.5 to 2 mm to ensure consistent capillary suction during absorption. Focused image capture was achieved by fine-tuning the distance between the camera lens and the ROI. This integrated setup enables the detection of water levels using advanced computer vision models, which analyze changes in the visible wetted area on the sample surfaces. The dual-camera configuration improves measurement accuracy by capturing complementary views of the sample, allowing for a robust estimation of water penetration and sorptivity values20.

For larger mortar and concrete specimens, it is essential to demonstrate the scalability and reliability of our approach. As shown in Fig. 3a, the image-based sorptivity measurement technique developed for smaller paste samples can be effectively adapted for larger mortar and concrete specimens. This versatility is achieved by refining two critical components: (1) applying a geometry-based adjustment for specimen dimensions and aspect ratios and (2) training a feedforward neural network to model fluid penetration behavior across multiple material scales. Figure 3b and c highlight the role of cross-sectional dimensions in the time-normalization procedure. For square specimens (e.g., paste or mortar cubes), the time factor is multiplied by √(0.39/D), where D is the side length of the cross-section in inches and is >0.39”, capturing variations in overall scale. By contrast, for rectangular cross-sections, which are more common to concrete specimens, with an aspect ratio (AR) outside the 0.7–1.4 range, the time factor is normalized by √(0.39/(W × AR)), where W is the width of the specimen in inches and is >0.39”. That being said, for rectangular specimens with width-to-length ratios between 0.7 and 1.4, AR is assumed to be 1 for normalization purposes; hence, the time factor is normalized by √(0.39/W).

Fig. 3
figure 3

Demonstration of Scalable Image-Based Sorptivity Measurements for Paste, Mortar, and Concrete. (a) Adaptation of the technique to various specimen sizes; (b) schematic side views of paste to concrete specimens with square to rectangular cross-sections; (c) aspect ratio-based correction for square and rectangular sections; (d) fixture arrangement for different specimen heights ensuring consistent lighting; (e) USB cameras used for imaging testing specimens from two different viewing angles, and (f) feedforward neural network correlates adjusted time factors with wetted area ratios for reliable fluid penetration or sorptivity estimates of paste, mortar, and concrete specimens.

This method addresses the distinct waterfronts that emerge in different cross-sectional shapes, allowing our feedforward neural network to yield accurate fluid penetration and sorptivity predictions. Figure 3d and e illustrate the flexibility of the experimental setup and imaging system. By adjusting the sample height and fixture arrangement, specimens ranging from 10 mm-tall paste cubes to 2 in.-tall concrete prisms or cylinders can be accommodated. All samples are observed using USB digital microscopes complemented by external lighting to maintain consistent illumination. Finally, Fig. 3f demonstrates how our feedforward neural network correlates the adjusted time factor with observed wetted area ratios to produce reliable fluid penetration or sorptivity estimates. This advancement expands the approach originally tailored for paste samples to larger-scale mortar and concrete specimens.

Feature Pyramid Network (FPN) implementation

Figure 4a illustrates the FPN architecture using a modified EfficientNet-B2 backbone for hierarchical feature extraction24. For optimized performance, this backbone utilizes compound scaling, depth-wise separable convolutions, and squeeze-and-excitation modules. The encoder downsampled input images (448 × 448) to feature maps with resolutions of 224 × 224, 112 × 112, 56 × 56, and 28 × 28, while the channel depth increased from 24 to 120. Skip connections link encoder features to the decoder, preserving spatial details. The decoder begins with the highest-resolution feature maps, sequentially upsampling and merging them with encoder features via skip connections to recover resolution and refine features. Using 1 × 1 convolutions, feature maps are reduced to 32 channels before being processed through additional convolutional layers to produce high-resolution segmentation masks.

Fig. 4
figure 4

Feature Pyramid Network (FPN) Implementation for Water Level Detection in Cementitious Samples. (a) The FPN integrates an EfficientNet-B2 backbone for hierarchical feature extraction, using skip connections to preserve spatial details. The decoder refines up-sampled feature maps through convolutional layers, generating high-resolution segmentation masks. (b) Input images, probability masks, and overlays illustrate accurate segmentation of water absorption regions. The sigmoid-activated “predict_step” method ensures precise binary mask predictions under varying conditions. (c) The impact of real and synthetic image datasets on segmentation accuracy (DoM) shows stability improvements with synthetic data and practical training with limited real data. Outliers appear as points below the main distribution. The data, adapted from Kabir et al.20,21, is presented in a modified format.

The SMPModel class from the torchvision library handles these operations efficiently25. A 14 × 14 MLP ensured acceptable spatial accuracy for dense segmentation. The predict_step method, utilizing sigmoid activation, produced high-resolution probability masks. The model was trained using the Adam optimizer (torch.optim.Adam)26 with an initial learning rate of 0.001. ModelCheckpoint saved the best-performing model, while EarlyStopping, with a patience of 6 epochs, stopped training when performance stagnated to ensure efficient convergence. Setting an appropriate EarlyStopping threshold balances accuracy and training duration, as lower thresholds may prematurely stop training, while higher thresholds can lead to unnecessarily extended training times27. Fig. 4b illustrates the segmentation performance of the proposed FPN model under both diffuse and specular reflection scenarios20. The input images, their corresponding probability masks, and overlays illustrate the effectiveness of the sigmoid-activated predict_step method in producing accurate and consistent segmentation of wetted regions. The results emphasize the network’s capability to define the wet boundaries across different reflection conditions precisely.

Impact of dataset size on FPN model accuracy

Figure 4c demonstrates the relationship between the number of real images and segmentation accuracy, quantified by the Degree of Matching (DoM), which is analogous to the Intersection over Union metric28. A DoM value approaching 1 indicates near-perfect alignment between predicted and ground-truth masks. The data show that increasing the number of real images significantly improves segmentation accuracy, while synthetic images primarily stabilize performance, especially when there is limited availability of real images. Notably, the FPN model achieves effective training with only a few hundred real images20, a reflection of the low Kolmogorov complexity inherent to images captured by USB microscope cameras29,30,31. These cameras focus on ROIs with consistent backgrounds and minimal variability. Moreover, the computational efficiency of the FPN architecture and its capability to effectively employ small datasets support its success in attaining strong segmentation performance across diverse synthetic dataset sizes. This underscores the practicality of using cost-effective water absorption analysis setups in cementitious materials systems.

Traditional sorptivity analysis

Cement paste cubes measuring 10 mm were cast in silicone molds with varying w/c ratios ranging from 0.4 to 0.8. Mixing was performed using a vortex mixer at 3000 rpm for 2 minutes, followed by one day of moist curing. After demolding the samples, they were either cured in a sealed container at 93 ± 5% relative humidity or immersed in saturated lime water for up to 7 days. Hydration was stopped by immersing the specimens in isopropyl alcohol for two consecutive days, with periodic changes to the solution, and then placing them in a vacuum desiccator containing silica gel and soda-lime pellets for 14 days to stabilize their mass. For sorptivity measurements, following ASTM C158512, the samples were sealed with epoxy resin on five faces, leaving one face exposed to water. During testing, the mass change resulting from one-dimensional unsaturated water flow was recorded at regular intervals over 24 hours. The absorbed water was normalized using i = m/ (a . ρ), where m represents the mass change, a denotes the exposed area in contact with water, and ρ signifies the water density. The normalized water absorption, referred to as penetration (i, mm), was plotted against the square root of time, and the initial (Si) and secondary (Ss) sorptivity values were calculated from the slopes of linear fits to the respective data phases using the least squares method. Two-inch mortar cubes were cast and subjected to a similar procedure; their hydration was stopped by soaking in isopropyl alcohol for one week, followed by drying in a 60 °C oven for over two weeks until no further mass change was observed. Likewise, 2 × 4 inch concrete cylinders were sliced and dried in a 40 °C oven for over two weeks, then placed in a 105 °C oven for an additional 24 hours to complete the drying process.

Linking wetted area and absorption time with sorptivity

Following the refinement of our computer vision model, we observed water-level changes across various cementitious systems. In paste cubes (w/c ratios 0.4–0.8, cured for 1–7 days), penetration exhibited a linear relationship with the square root of time due to one-dimensional liquid movement, uniform drying, and the dominance of capillary forces over gravity in fine-pored matrices10,32. Initial sorptivity varies significantly with w/c ratio and curing age. At the same time, secondary sorption progresses more slowly than initial sorption as entrapped air diffuses through water-filled pores to the surface, driven by the internal-external pressure differences33. Our machine learning model adeptly learns patterns within the data and establishes strong relationships by combining the wetted area ratio and absorption time as key parameters. This method allows the model to predict penetration values accurately, achieving R2 values of 96% and 93% for the training and test sets, respectively.

Machine learning implementation

To develop a robust machine learning model for predicting penetration and sorptivity, we implemented a feedforward artificial neural network (ANN) using TensorFlow22. The dataset included two input features (wetted area ratio and absorption time), and the target variable was a single node predicting either penetration or sorptivity. Data preprocessing involved converting all numerical values to the float32 format to ensure computational efficiency34. The training dataset consisted of 1,440 data points, while the testing dataset, which contained 144 points, was kept separate to ensure an unbiased performance evaluation on unseen data. The ANN architecture included four hidden layers with 64, 32, and 16 nodes, utilizing the ReLU activation function to manage non-linear relationships effectively. A 10% dropout layer was introduced to prevent overfitting and enhance generalization by randomly deactivating neurons during training. The output layer generated a single prediction for either penetration or sorptivity. With a learning rate of 0.00005, the Adam optimizer minimized the mean squared error (MSE) loss function during training. The model was trained for 500 epochs with a batch size of 16, balancing computational efficiency with convergence. Architectural experiments compared a flat model with a consistent neuron count across layers to a pyramidal structure with progressively fewer neurons (64, 32, 16)22. The pyramidal design demonstrated superior generalization, providing slightly higher yet more stable MSE values than a flat architecture20.

Table 1 shows the current Data Descriptor and introduces enhancements beyond those presented in our earlier Nature Communications article20. Notably, we now provide time-lapse videos of paste absorption tests in VGA resolution (640 × 480 pixels) in MP4 format, along with the associated image sequences and segmented masks. All datasets are available via a permanent Zenodo link to ensure long-term access23. We also describe the manual annotation process for the real dataset, performed using a stylus pen, and include an evaluation of annotation errors. We additionally outline a detailed methodology for measuring sorptivity in larger mortar and concrete samples, an aspect previously unaddressed. Moreover, a more detailed architecture of our EfficientNet-based FPN is now documented. Finally, the training paste dataset is presented in greater depth, with information on sample sizes, mass variation over time, and the corresponding penetration measurements, i.e., called Dataset.xlsx23.

Table 1 Dataset Comparison Summary.

Data Records

The data supporting this study includes 7,384 images (5,000 real and 2,384 synthetic) of prismatic cementitious samples, penetration-time data, and wetted area ratios recorded during water absorption experiments. The datasets are organized as follows22,23:

  • Image Annotation: We developed a Jupyter Notebook, flood_fill_helper.ipynb, to streamline image annotations by maintaining fixed sample boundaries while multiple annotators enhance model generalization and mask agreement.

  • Train and Test Datasets: Images + Masks for model training and evaluation are provided in datasets.zip.

  • Penetration-Time Data: Water penetration measurements and the corresponding time values are provided in sorptivity_train_cv.csv and sorptivity_test_cv.csv.

Technical Validation

To ensure the reliability and reproducibility of SorpVision, several measures were taken throughout data collection, annotation, and curation. These steps collectively validate both the quality of the dataset and the robustness of the resulting computer vision models:

  • Robust Annotations and Dataset Generation (Figs. 1, 2)

    We captured water absorption in cementitious samples using orthogonally aligned USB cameras under diffuse and specular lighting (Fig. 1a). Each frame was annotated by multiple users, helping mitigate subjective bias (Fig. 1c–f). To maintain consistency, a standardized workflow converted video frames into annotated PDF files and binary masks (Fig. 2). This process minimized labeling errors. It ensured that both real (5,000 images) and synthetic (2,384 images) datasets accurately captured the evolving waterfront.

  • Reproducible Experimental Design (Figs. 1, 3)

    All specimens were prepared following established curing and drying protocols, minimizing batch-to-batch variations. Small paste cubes and larger mortar or concrete specimens were tested with the same camera configuration, and an aspect-ratio–based correction was applied to ensure consistent interpretation of the waterfront (Fig. 3). By controlling lighting and alignment, we obtained high-quality images with minimal background noise, increasing both reliability and ease of segmentation.

  • Cross-Validation with Standard Methods (Figs. 3, 4)

    To confirm the accuracy of image-based sorptivity, we tracked water absorption with traditional mass-uptake measurements per ASTM C1585. Segmented “wetted area vs. time” predictions correlated well with gravimetric data (R2 > 0.9 for paste samples), indicating that the trained Feature Pyramid Network (Fig. 4) faithfully captured capillary absorption. All code and datasets are openly available22,23, allowing independent verification of these results and supporting future research on automated durability assessments of cementitious materials.

Usage Notes

We provide detailed guidance for implementing the automated sorptivity estimation pipeline to facilitate the reuse of our data and methodology22. This pipeline integrates machine learning models and computer vision algorithms to analyze water absorption dynamics in cementitious systems with high accuracy and minimal user intervention. The workflow is designed to operate seamlessly on platforms supporting Python 3.x and Jupyter Notebooks. It has been optimized for Google Colab environments with a CUDA-enabled GPU for accelerated processing performance. The repository22,23 includes:

  • Jupyter Notebooks: For model training, validation, and inference.

  • Custom scripts: Python files (models.py, dataset_loader.py, evaluation.py, util.py) for modular data preprocessing, visualization, and performance assessment.

  • Datasets: Train and test datasets, along with pre-trained FPN model.

    Users can replicate the workflow by following these steps22:

  • Setup and Environment: Organize project folders as described (Colab Notebook, src, visualization, etc.) and ensure access to Google Drive with at least 1 GB of free space.

  • Execution: Open model_compile.ipynb in Google Colab, mount the Drive, and run cells sequentially. The entire model training (on NVIDIA T4 GPU) requires approximately 2 hours for datasets with thousands of images.

  • Visualization: The pipeline generates penetration-time graphs and binary masks of water-absorbed regions post-training.