SorpVision: A Comprehensive Dataset for Cementitious Sorptivity Analysis Powered by Computer Vision

Kabir, Hossein; Wu, Jordan; Dahal, Sunav; Joo, Tony; Garg, Nishant

doi:10.1038/s41597-025-05185-4

Download PDF

Data Descriptor
Open access
Published: 29 May 2025

SorpVision: A Comprehensive Dataset for Cementitious Sorptivity Analysis Powered by Computer Vision

Scientific Data volume 12, Article number: 904 (2025) Cite this article

2426 Accesses
1 Citations
Metrics details

Subjects

Abstract

As the construction industry advances toward more efficient methods for assessing durability, the need for automated sorptivity evaluation has become increasingly critical. Consequently, this study introduces SorpVision, a dataset of 7,384 images (5,000 real and 2,384 synthetic) designed to support our custom computer vision-based framework for automated sorptivity evaluation in cementitious materials. Traditional methods, such as ASTM C1585, depend on manual weighing, which is time-consuming and limits measurement intervals. SorpVision, combined with a cost-effective USB camera setup and a robust vision algorithm, facilitates real-time water level detection in cementitious systems. The framework, trained using 1,440 data points from pastes with water-to-cement (w/c) ratios of 0.4–0.8 and curing durations of 1–7 days, achieves high predictive accuracy for initial and secondary sorptivities (R² > 0.9 for cement pastes). Moreover, it generalizes well to mortar and concrete, yielding R² values of 0.96 and 0.87 for initial sorptivity and 0.74 and 0.65 for secondary sorptivity, respectively. SorpVision offers an accurate, data-driven foundation for scalable, automated durability evaluations, supporting sustainable infrastructure development.

Automated estimation of cementitious sorptivity via computer vision

Article Open access 15 November 2024

Rapid prediction of cementitious initial sorptivity via surface wettability

Article Open access 05 July 2023

High-yield solar-driven atmospheric water harvesting of metal–organic-framework-derived nanoporous carbon with fast-diffusion water channels

Article 26 May 2022

Background & Summary

Evaluating the rate at which cementitious materials absorb water is essential for understanding their long-term performance, particularly when subjected to chloride-induced corrosion, freeze-thaw damage, and sulfate attack^1,2,3. The transport characteristics of these materials arise from three main mechanisms: diffusion⁴, absorption⁵, and permeability⁶, each significantly shaped by the connectivity, tortuosity, size, and overall volume of pores in the matrix. Absorption is significant among these transport properties, reflecting how an initially dry cementitious surface takes in water. Initially, water uptake is primarily driven by rapid capillary suction, influenced by Darcy’s law and the Laplace equation⁷. Over time, this swift absorption phase is replaced with slower processes, including the dissolution of entrapped air governed by Henry’s law⁸, or liquid diffusion into calcium silicate hydrate (C-S-H) layers⁹. Sorptivity, as defined by Hall and Hoff, quantifies a material’s tendency to transmit or absorb liquid through capillary action¹⁰. While the initial, rapid stage of water uptake is well-characterized, the subsequent slower secondary phase remains a subject of ongoing research and debate¹¹. Conventional sorptivity measurement techniques, such as ASTM C1585¹², involve manually weighing samples at specific intervals over several days, rendering the test laborious, time-consuming, and unable to provide continuous data acquisition. To simplify this labor-intensive testing, automated weight measurement has been proposed¹³; however, challenges such as submersion errors, buoyancy, surface tension effects, and inaccurate secondary sorptivity predictions hinder its practicality. Although more advanced imaging technologies like X-ray CT¹⁴, EIT + X-ray Tomography¹⁵, X-ray Transmission/Attenuation¹⁶, and Neutron Radiography¹⁷ can provide detailed insights into internal water movement, their high cost and complexity limit routine use.

To overcome these barriers, there is increasing interest in automated, cost-effective methods that utilize modern computational tools for real-time sorptivity evaluation. Recently, we introduced two unique computer vision-based methods to predict sorptivity. The first, termed the “droplet” method, leverages rapid surface-wetting characteristics, including contact angle dynamics and drop-spreading rates, to estimate initial sorptivity within minutes, offering strong correlations (adjusted R² ≥ 0.9) with traditional methods^18,19. The second, known as the “waterfront” method, employs a custom computer vision model trained on thousands of real and synthetic images to monitor water absorption in real-time^20,21. The latter approach accurately predicts initial and secondary sorptivity values by detecting penetration dynamics with R² > 0.9 in various cementitious systems, bridging the gap between traditional measurements and automated, low-cost durability assessments. In this context, for this data descriptor, we are sharing SorpVision, a comprehensive training dataset of 7,384 images (5,000 real and 2,384 synthetic)²², which is designed to facilitate automated sorptivity analysis. By utilizing data from paste samples with w/c ratios of 0.4–0.8 and curing durations of 1–7 days, our custom vision-based model trained on SorpVision achieves high predictive accuracy for both initial and secondary sorptivity (R² ≥ 0.97 for pastes). This approach also generalizes successfully to mortar and concrete, with R² values of 0.96 and 0.87 for initial sorptivity and 0.74 and 0.65 for secondary sorptivity, respectively. Consequently, SorpVision offers a scalable, low-cost resource to enhance durability assessments and improve the monitoring of water absorption in cementitious materials.

Methods

Real dataset generation

To generate our real dataset, 10 mm paste samples were positioned between orthogonally aligned USB microscope cameras (Fig. 1a), capturing a total of 1,250 images as the water levels rose in the samples (Fig. 1b). Each image was annotated by four independent annotators (Fig. 1c), resulting in 5,000 annotated samples. The USB cameras used for image capture were surrounded by LED lights, enabling image acquisition under two distinct lighting conditions: diffuse reflections, with LEDs turned off under natural room lighting (Fig. 1a, bottom row, and Fig. 1b, top-right columns), and specular reflections, with LEDs turned on in a darker room (Fig. 1a, top row, and Fig. 1b, top-left columns). Moreover, the acquired images (640 × 480 pixels) were downsampled to 448 × 448 pixels, balancing computational efficiency and accuracy²⁰. To ensure transparency, in addition to the dimensions and mass changes of specimens, we have uploaded the entire real and synthetic dataset to Zenodo²³, including original 640 × 480 pixels VGA videos, 640 × 480 pixels video frames with annotations, and the 448 × 448 pixels resized versions used for training of our vision-based model.

We additionally present Fig. 2, which outlines the complete data preparation and annotation workflow. Initially, as illustrated in Fig. 2a, all original VGA videos (640 × 480 pixels in MP4 format) were captured from different specimens²³. Each video was then converted into a sequence of individual images (640 × 480 pixels), as depicted in Fig. 2b. Because our annotators manually marked the sample boundary and waterfront on an iPad, it was necessary to convert these images (*.jpg) into PDF format. However, as shown in Fig. 2c, an improper “orientation settings” option could distort image dimensions. Therefore, each image was carefully resized to a 6.4 by 4.8 ratio in landscape orientation, using the “scale to fit” feature as indicated in Fig. 2d, prior to PDF creation for accurate annotation.

After generating the correctly scaled PDFs, our annotators imported them onto an iPad (Fig. 2e), where the waterfront and sample boundary were manually outlined using a stylus pen. This bidirectional data transfer between a personal computer (hosting the JPGs) and the iPad (hosting the resulting PDFs) facilitated streamlined annotation while maintaining image fidelity. Accurate capture of the boundary and waterfront was crucial, as these annotations serve as ground truth for the subsequent segmentation step. Finally, as shown in Fig. 2f, the annotated boundaries were processed through our custom Jupyter Notebook, “flood_fill_helper.ipynb”²², which automatically filled the region between the boundary and the waterfront to produce the final segmented masks (640 × 480 pixels).

Variations in image annotation, shown as an error map in Fig. 1d, highlight the discrepancies that can arise when multiple individuals manually mark regions of interest (ROIs). Increasing the number of annotators from one to two offers diverse perspectives, enhancing model generalization and improving agreement. However, beyond three annotators, inter-annotator agreement plateaus, minimizing additional discrepancies in semantic segmentation (Fig. 1e). It is crucial to recognize that lower water levels present more significant challenges for accurate annotation, as smaller areas increase the likelihood of errors. This is evident in the percentage error and coefficient of variation (COV) among the four annotators (Fig. 1f), where annotators often struggle to capture fine details accurately.

Synthetic dataset generation

To minimize subjectivity in mask annotation, we created a synthetic dataset of 2,384 images using Microsoft PowerPoint. Cementitious textures were enhanced through the application of artistic effects, adjustments to lighting, and modifications of contrast, thereby replicating real-world conditions (see Fig. 1b, bottom rows). Additionally, these textures were originally made at 448 × 448 pixels. The simulation of water levels was achieved by overlaying layers of 90% transparent grey to signify saturation. Segmented masks were generated automatically by converting the ROI/layers to white and the background to black, thereby ensuring clarity and precision reproducibility. The dataset included diffuse and specular reflections and varied water level orientations, replicating complex surface interactions observed experimentally. This synthetic dataset²³ provided a robust foundation for training computer vision models while reducing biases from manual annotations.

Hardware setup

The hardware configuration shown in Fig. 1a was meticulously designed to ensure an accurate and reproducible assessment of water absorption rates in cement paste cubes. For small paste samples, a transparent glass vial, measuring over 15 mm in width and 27 mm in height, was chosen to securely accommodate the paste samples while seamlessly integrating with the experimental setup. The vial features a polypropylene snap-top cap, allowing for controlled water introduction via a syringe. The syringe dispenses water at a calibrated flow rate over approximately three seconds, ensuring uniform wetting of the sample surfaces. The paste cubes were placed within the vial on top of hexagonal metallic beads to maintain a consistent water submersion depth of 0.5–2 mm. The cubes were positioned to align their faces orthogonally to the cameras, optimizing visibility and minimizing variability. Minor tilts in cube orientation were permissible, as computer vision algorithms could accommodate slight rotations within the ROI. Image acquisition was conducted using two orthogonally aligned USB microscopy cameras, each priced at less than $30. These cameras offer Full HD (1920 × 1080 pixels) or VGA (640 × 480 pixels) resolutions, with a minimum spatial resolution of 10 µm and magnification ranging from 50 × to 1000 ×. Equipped with a 120-degree angle of view, the cameras were surrounded by LED lighting to support image capture under both diffuse and specular reflection conditions. The lighting configuration is adjustable to accommodate different environments, enhancing the visibility of wetted areas on the sample surfaces. The water level is maintained within a narrow range of 0.5 to 2 mm to ensure consistent capillary suction during absorption. Focused image capture was achieved by fine-tuning the distance between the camera lens and the ROI. This integrated setup enables the detection of water levels using advanced computer vision models, which analyze changes in the visible wetted area on the sample surfaces. The dual-camera configuration improves measurement accuracy by capturing complementary views of the sample, allowing for a robust estimation of water penetration and sorptivity values²⁰.

For larger mortar and concrete specimens, it is essential to demonstrate the scalability and reliability of our approach. As shown in Fig. 3a, the image-based sorptivity measurement technique developed for smaller paste samples can be effectively adapted for larger mortar and concrete specimens. This versatility is achieved by refining two critical components: (1) applying a geometry-based adjustment for specimen dimensions and aspect ratios and (2) training a feedforward neural network to model fluid penetration behavior across multiple material scales. Figure 3b and c highlight the role of cross-sectional dimensions in the time-normalization procedure. For square specimens (e.g., paste or mortar cubes), the time factor is multiplied by √(0.39/D), where D is the side length of the cross-section in inches and is >0.39”, capturing variations in overall scale. By contrast, for rectangular cross-sections, which are more common to concrete specimens, with an aspect ratio (AR) outside the 0.7–1.4 range, the time factor is normalized by √(0.39/(W × AR)), where W is the width of the specimen in inches and is >0.39”. That being said, for rectangular specimens with width-to-length ratios between 0.7 and 1.4, AR is assumed to be 1 for normalization purposes; hence, the time factor is normalized by √(0.39/W).

This method addresses the distinct waterfronts that emerge in different cross-sectional shapes, allowing our feedforward neural network to yield accurate fluid penetration and sorptivity predictions. Figure 3d and e illustrate the flexibility of the experimental setup and imaging system. By adjusting the sample height and fixture arrangement, specimens ranging from 10 mm-tall paste cubes to 2 in.-tall concrete prisms or cylinders can be accommodated. All samples are observed using USB digital microscopes complemented by external lighting to maintain consistent illumination. Finally, Fig. 3f demonstrates how our feedforward neural network correlates the adjusted time factor with observed wetted area ratios to produce reliable fluid penetration or sorptivity estimates. This advancement expands the approach originally tailored for paste samples to larger-scale mortar and concrete specimens.

Feature Pyramid Network (FPN) implementation

Figure 4a illustrates the FPN architecture using a modified EfficientNet-B2 backbone for hierarchical feature extraction²⁴. For optimized performance, this backbone utilizes compound scaling, depth-wise separable convolutions, and squeeze-and-excitation modules. The encoder downsampled input images (448 × 448) to feature maps with resolutions of 224 × 224, 112 × 112, 56 × 56, and 28 × 28, while the channel depth increased from 24 to 120. Skip connections link encoder features to the decoder, preserving spatial details. The decoder begins with the highest-resolution feature maps, sequentially upsampling and merging them with encoder features via skip connections to recover resolution and refine features. Using 1 × 1 convolutions, feature maps are reduced to 32 channels before being processed through additional convolutional layers to produce high-resolution segmentation masks.

The SMPModel class from the torchvision library handles these operations efficiently²⁵. A 14 × 14 MLP ensured acceptable spatial accuracy for dense segmentation. The predict_step method, utilizing sigmoid activation, produced high-resolution probability masks. The model was trained using the Adam optimizer (torch.optim.Adam)²⁶ with an initial learning rate of 0.001. ModelCheckpoint saved the best-performing model, while EarlyStopping, with a patience of 6 epochs, stopped training when performance stagnated to ensure efficient convergence. Setting an appropriate EarlyStopping threshold balances accuracy and training duration, as lower thresholds may prematurely stop training, while higher thresholds can lead to unnecessarily extended training times²⁷. Fig. 4b illustrates the segmentation performance of the proposed FPN model under both diffuse and specular reflection scenarios²⁰. The input images, their corresponding probability masks, and overlays illustrate the effectiveness of the sigmoid-activated predict_step method in producing accurate and consistent segmentation of wetted regions. The results emphasize the network’s capability to define the wet boundaries across different reflection conditions precisely.

Impact of dataset size on FPN model accuracy

Figure 4c demonstrates the relationship between the number of real images and segmentation accuracy, quantified by the Degree of Matching (DoM), which is analogous to the Intersection over Union metric²⁸. A DoM value approaching 1 indicates near-perfect alignment between predicted and ground-truth masks. The data show that increasing the number of real images significantly improves segmentation accuracy, while synthetic images primarily stabilize performance, especially when there is limited availability of real images. Notably, the FPN model achieves effective training with only a few hundred real images²⁰, a reflection of the low Kolmogorov complexity inherent to images captured by USB microscope cameras^29,30,31. These cameras focus on ROIs with consistent backgrounds and minimal variability. Moreover, the computational efficiency of the FPN architecture and its capability to effectively employ small datasets support its success in attaining strong segmentation performance across diverse synthetic dataset sizes. This underscores the practicality of using cost-effective water absorption analysis setups in cementitious materials systems.

Traditional sorptivity analysis

Cement paste cubes measuring 10 mm were cast in silicone molds with varying w/c ratios ranging from 0.4 to 0.8. Mixing was performed using a vortex mixer at 3000 rpm for 2 minutes, followed by one day of moist curing. After demolding the samples, they were either cured in a sealed container at 93 ± 5% relative humidity or immersed in saturated lime water for up to 7 days. Hydration was stopped by immersing the specimens in isopropyl alcohol for two consecutive days, with periodic changes to the solution, and then placing them in a vacuum desiccator containing silica gel and soda-lime pellets for 14 days to stabilize their mass. For sorptivity measurements, following ASTM C1585¹², the samples were sealed with epoxy resin on five faces, leaving one face exposed to water. During testing, the mass change resulting from one-dimensional unsaturated water flow was recorded at regular intervals over 24 hours. The absorbed water was normalized using i = m/ (a . ρ), where m represents the mass change, a denotes the exposed area in contact with water, and ρ signifies the water density. The normalized water absorption, referred to as penetration (i, mm), was plotted against the square root of time, and the initial (S_i) and secondary (S_s) sorptivity values were calculated from the slopes of linear fits to the respective data phases using the least squares method. Two-inch mortar cubes were cast and subjected to a similar procedure; their hydration was stopped by soaking in isopropyl alcohol for one week, followed by drying in a 60 °C oven for over two weeks until no further mass change was observed. Likewise, 2 × 4 inch concrete cylinders were sliced and dried in a 40 °C oven for over two weeks, then placed in a 105 °C oven for an additional 24 hours to complete the drying process.

Linking wetted area and absorption time with sorptivity

Following the refinement of our computer vision model, we observed water-level changes across various cementitious systems. In paste cubes (w/c ratios 0.4–0.8, cured for 1–7 days), penetration exhibited a linear relationship with the square root of time due to one-dimensional liquid movement, uniform drying, and the dominance of capillary forces over gravity in fine-pored matrices^10,32. Initial sorptivity varies significantly with w/c ratio and curing age. At the same time, secondary sorption progresses more slowly than initial sorption as entrapped air diffuses through water-filled pores to the surface, driven by the internal-external pressure differences³³. Our machine learning model adeptly learns patterns within the data and establishes strong relationships by combining the wetted area ratio and absorption time as key parameters. This method allows the model to predict penetration values accurately, achieving R² values of 96% and 93% for the training and test sets, respectively.

Machine learning implementation

To develop a robust machine learning model for predicting penetration and sorptivity, we implemented a feedforward artificial neural network (ANN) using TensorFlow²². The dataset included two input features (wetted area ratio and absorption time), and the target variable was a single node predicting either penetration or sorptivity. Data preprocessing involved converting all numerical values to the float32 format to ensure computational efficiency³⁴. The training dataset consisted of 1,440 data points, while the testing dataset, which contained 144 points, was kept separate to ensure an unbiased performance evaluation on unseen data. The ANN architecture included four hidden layers with 64, 32, and 16 nodes, utilizing the ReLU activation function to manage non-linear relationships effectively. A 10% dropout layer was introduced to prevent overfitting and enhance generalization by randomly deactivating neurons during training. The output layer generated a single prediction for either penetration or sorptivity. With a learning rate of 0.00005, the Adam optimizer minimized the mean squared error (MSE) loss function during training. The model was trained for 500 epochs with a batch size of 16, balancing computational efficiency with convergence. Architectural experiments compared a flat model with a consistent neuron count across layers to a pyramidal structure with progressively fewer neurons (64, 32, 16)²². The pyramidal design demonstrated superior generalization, providing slightly higher yet more stable MSE values than a flat architecture²⁰.

Table 1 shows the current Data Descriptor and introduces enhancements beyond those presented in our earlier Nature Communications article²⁰. Notably, we now provide time-lapse videos of paste absorption tests in VGA resolution (640 × 480 pixels) in MP4 format, along with the associated image sequences and segmented masks. All datasets are available via a permanent Zenodo link to ensure long-term access²³. We also describe the manual annotation process for the real dataset, performed using a stylus pen, and include an evaluation of annotation errors. We additionally outline a detailed methodology for measuring sorptivity in larger mortar and concrete samples, an aspect previously unaddressed. Moreover, a more detailed architecture of our EfficientNet-based FPN is now documented. Finally, the training paste dataset is presented in greater depth, with information on sample sizes, mass variation over time, and the corresponding penetration measurements, i.e., called Dataset.xlsx²³.

Table 1 Dataset Comparison Summary.

Full size table

Data Records

The data supporting this study includes 7,384 images (5,000 real and 2,384 synthetic) of prismatic cementitious samples, penetration-time data, and wetted area ratios recorded during water absorption experiments. The datasets are organized as follows^22,23:

Image Annotation: We developed a Jupyter Notebook, flood_fill_helper.ipynb, to streamline image annotations by maintaining fixed sample boundaries while multiple annotators enhance model generalization and mask agreement.
Train and Test Datasets: Images + Masks for model training and evaluation are provided in datasets.zip.
Penetration-Time Data: Water penetration measurements and the corresponding time values are provided in sorptivity_train_cv.csv and sorptivity_test_cv.csv.

Technical Validation

To ensure the reliability and reproducibility of SorpVision, several measures were taken throughout data collection, annotation, and curation. These steps collectively validate both the quality of the dataset and the robustness of the resulting computer vision models:

Robust Annotations and Dataset Generation (Figs. 1, 2)

We captured water absorption in cementitious samples using orthogonally aligned USB cameras under diffuse and specular lighting (Fig. 1a). Each frame was annotated by multiple users, helping mitigate subjective bias (Fig. 1c–f). To maintain consistency, a standardized workflow converted video frames into annotated PDF files and binary masks (Fig. 2). This process minimized labeling errors. It ensured that both real (5,000 images) and synthetic (2,384 images) datasets accurately captured the evolving waterfront.
Reproducible Experimental Design (Figs. 1, 3)

All specimens were prepared following established curing and drying protocols, minimizing batch-to-batch variations. Small paste cubes and larger mortar or concrete specimens were tested with the same camera configuration, and an aspect-ratio–based correction was applied to ensure consistent interpretation of the waterfront (Fig. 3). By controlling lighting and alignment, we obtained high-quality images with minimal background noise, increasing both reliability and ease of segmentation.
Cross-Validation with Standard Methods (Figs. 3, 4)

To confirm the accuracy of image-based sorptivity, we tracked water absorption with traditional mass-uptake measurements per ASTM C1585. Segmented “wetted area vs. time” predictions correlated well with gravimetric data (R² > 0.9 for paste samples), indicating that the trained Feature Pyramid Network (Fig. 4) faithfully captured capillary absorption. All code and datasets are openly available^22,23, allowing independent verification of these results and supporting future research on automated durability assessments of cementitious materials.

Usage Notes

We provide detailed guidance for implementing the automated sorptivity estimation pipeline to facilitate the reuse of our data and methodology²². This pipeline integrates machine learning models and computer vision algorithms to analyze water absorption dynamics in cementitious systems with high accuracy and minimal user intervention. The workflow is designed to operate seamlessly on platforms supporting Python 3.x and Jupyter Notebooks. It has been optimized for Google Colab environments with a CUDA-enabled GPU for accelerated processing performance. The repository^22,23 includes:

Jupyter Notebooks: For model training, validation, and inference.
Custom scripts: Python files (models.py, dataset_loader.py, evaluation.py, util.py) for modular data preprocessing, visualization, and performance assessment.
Datasets: Train and test datasets, along with pre-trained FPN model.

Users can replicate the workflow by following these steps²²:
Setup and Environment: Organize project folders as described (Colab Notebook, src, visualization, etc.) and ensure access to Google Drive with at least 1 GB of free space.
Execution: Open model_compile.ipynb in Google Colab, mount the Drive, and run cells sequentially. The entire model training (on NVIDIA T4 GPU) requires approximately 2 hours for datasets with thousands of images.
Visualization: The pipeline generates penetration-time graphs and binary masks of water-absorbed regions post-training.

Code availability

All codes used in this study are accessible via our GitHub repository: https://github.com/hosseinkabiruiuc/Sorptivity-Analysis-via-Computer-Vision

The code repository is permanently archived on Zenodo at²²: https://doi.org/10.5281/zenodo.13835634

The dataset repository is permanently archived on Zenodo at²³: https://doi.org/10.5281/zenodo.15324282

References

Angst, U. M. & Elsener, B. The size effect in corrosion greatly influences the predicted life span of concrete infrastructures. Sci Adv 3, e1700751 (2017).
Article ADS PubMed PubMed Central Google Scholar
Cai, H. & Liu, X. Freeze-thaw durability of concrete: ice formation process in pores. Cem Concr Res 28, 1281–1287 (1998).
Article CAS Google Scholar
Suleiman, A. R., Soliman, A. M. & Nehdi, M. L. Effect of surface treatment on durability of concrete exposed to physical sulfate attack. Constr Build Mater 73, 674–681 (2014).
Article Google Scholar
Bažant, Z. P. & Najjar, L. J. Nonlinear water diffusion in nonsaturated concrete. Matériaux et Construction 5, 3–20 (1972).
Article Google Scholar
Kelham, S. A water absorption test for concrete. Magazine of Concrete Research 40, 106–110 (1988).
Article CAS Google Scholar
Wang, K., Jansen, D. C., Shah, S. P. & Karr, A. F. Permeability study of cracked concrete. Cem Concr Res 27, 381–393 (1997).
Article CAS Google Scholar
Hall, C. & Pugsley, V. Spontaneous capillary imbibition of water and nonaqueous liquids into dry quarry limestones. Transp Porous Media 135, 619–631 (2020).
Article Google Scholar
Hall, C. Water movement in porous building materials—IV. The initial surface absorption and the sorptivity. Build Environ 16, 201–207 (1981).
Article Google Scholar
Alderete, N. M., Zaccardi, Y. A. V. & De Belie, N. Mechanism of long-term capillary water uptake in cementitious materials. Cem Concr Compos 106, 103448 (2020).
Article CAS Google Scholar
Hall, C. & Hoff, W. D. Water Transport in Brick, Stone and Concrete. (CRC Press, 2021).
Alderete, N. M., Villagran Zaccardi, Y. & De Belie, N. Insight into the secondary imbibition rate of concrete and its relationship with curing time. RILEM Technical Letters 5, 123–130 (2020).
Article Google Scholar
ASTM C1585 – 20. American Society for Testing and Materials, Standard Test Method for Measurement of Rate of Absorption of Water by Hydraulic-cement Concretes. Annual Book of ASTM Standards (2020).
Sabir, B. B., Wild, S. & O’farrell, M. A water sorptivity test for mortar and concrete. Mater Struct 31, 568–574 (1998).
Article CAS Google Scholar
Zeng, Q., Lin, Z., Zhou, C. & Wang, J. Capillary imbibition of ethanol in cement paste traced by X-ray computed tomography with CsCl-enhancing technique. Chem Phys Lett 726, 117–123 (2019).
Article ADS CAS Google Scholar
Smyl, D., Hallaji, M., Seppänen, A. & Pour-Ghaz, M. Three-dimensional electrical impedance tomography to monitor unsaturated moisture ingress in cement-based materials. Transp Porous Media 115, 101–124 (2016).
Article CAS Google Scholar
Weiss, J., Geiker, M. R. & Hansen, K. K. Using X-ray transmission/attenuation to quantify fluid absorption in cracked concrete. International Journal of Materials and Structural Integrity 9, 3–20 (2015).
Article CAS Google Scholar
Maruyama, I. et al. Evaluation of water transfer from saturated lightweight aggregate to cement paste matrix by neutron radiography. Nucl Instrum Methods Phys Res A 605, 159–162 (2009).
Article ADS CAS Google Scholar
Kabir, H. & Garg, N. Rapid prediction of cementitious initial sorptivity via surface wettability. Npj Mater Degrad 7, 52 (2023).
Article CAS Google Scholar
Kabir, H. & Garg, N. Predicting Sorptivity via Surface Wettability. 10th International Conference on CONcrete under SEvere Conditions – Environment and Loading (2024).
Kabir, H., Wu, J., Dahal, S., Joo, T. & Garg, N. Automated estimation of cementitious sorptivity via computer vision. Nat Commun 15, 9935 (2024).
Article CAS PubMed PubMed Central Google Scholar
Kabir, H., Dahal, S. & Garg, N. Estimating Cementitious Sorptivity via Computer Vision. 4th RN Raikar Memorial International Conference & Ghosh-Mukherjee International Symposium on ADVANCES IN SCIENCE & TECHNOLOGY OF CONCRETE (2024).
Kabir, H., Wu, J., Dahal, S., Joo, T. & Garg, N. Code Repository for Automated Estimation of Cementitious Sorptivity via Computer Vision. Zenodo https://doi.org/10.5281/zenodo.13835634 (2024).
Kabir, H., Wu, J., Dahal, S., Joo, T. & Garg, N. Dataset Repository for Automated Estimation of Cementitious Sorptivity via Computer Vision. Zenodo https://doi.org/10.5281/zenodo.15324282 (2025).
Lin, T.-Y. et al. Feature pyramid networks for object detection. in Proceedings of the IEEE conference on computer vision and pattern recognition 2117–2125 (2017).
Ali, K., Shaikh, Z. A., Khan, A. A. & Laghari, A. A. Multiclass skin cancer classification using EfficientNets–a first step towards preventing skin cancer. Neuroscience Informatics 2, 100034 (2022).
Article Google Scholar
Zhang, Z. Improved adam optimizer for deep neural networks. in 2018 IEEE/ACM 26th international symposium on quality of service (IWQoS) 1–2 (Ieee, 2018).
Prechelt, L. Early stopping-but when? in Neural Networks: Tricks of the trade 55–69 (2002).
Rezatofighi, H. et al. Generalized intersection over union: A metric and a loss for bounding box regression. in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 658–666 (2019).
Li, M. & Vitányi, P. An introduction to Kolmogorov complexity and its applications. vol. 3, (2008).
Kabir, H. & Garg, N. Machine learning enabled orthogonal camera goniometry for accurate and robust contact angle measurements. Sci Rep 13, 1497 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Kabir, H. & Garg, N. Low-Cost and Reliable Contact Angle Goniometry for Cementitious Materials. The 16th International Congress on the Chemistry of Cement 2023 (2023).
DeSouza, S. J., Hooton, R. D. & Bickley, J. A. Evaluation of laboratory drying procedures relevant to field conditions for concrete sorptivity measurements. Cement, concrete and aggregates 19 (1997).
Hamilton, A. & Hall, C. Beyond the sorptivity: definition, measurement and properties of the secondary sorptivity. ASCE Journal of Materials in Civil Engineering 30 (2018).
Krishnegowda, D. Analyzing different high speed adder architecture for Neural Networks. in 2022 IEEE Fourth International Conference on Advances in Electronics, Computers and Communications (ICAECC) 1–5 (IEEE, 2022).

Download references

Acknowledgements

The authors sincerely acknowledge the support from the Department of Civil and Environmental Engineering at the University of Illinois Urbana-Champaign, which was instrumental in successfully executing this study. They thank undergraduate researcher Ishita Purwar and graduate researcher Ahmed Ibrahim for their efforts in annotating natural images and generating synthetic models. H.K. acknowledges the American Concrete Institute (ACI) Foundation for awarding the S. P. Shah Fellowship, motivating this research. Lastly, the authors thank Sara Perez and Achyuth Kumar for their thorough manuscript proofreading.

Author information

Authors and Affiliations

Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Hossein Kabir, Sunav Dahal & Nishant Garg
Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Jordan Wu & Tony Joo

Authors

Hossein Kabir
View author publications
Search author on:PubMed Google Scholar
Jordan Wu
View author publications
Search author on:PubMed Google Scholar
Sunav Dahal
View author publications
Search author on:PubMed Google Scholar
Tony Joo
View author publications
Search author on:PubMed Google Scholar
Nishant Garg
View author publications
Search author on:PubMed Google Scholar

Contributions

H.K. and N.G. conceptualized the experimental framework. H.K. and J.W. developed and optimized the computer vision models. Additionally, H.K. developed the machine learning model and completed the hardware setup. S.D. contributed to sample preparation and conducted absorption measurements on mortar and concrete materials. All authors collaborated on data analysis, figure preparation, and manuscript writing. The study was supervised and led by N.G.

Corresponding author

Correspondence to Nishant Garg.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Kabir, H., Wu, J., Dahal, S. et al. SorpVision: A Comprehensive Dataset for Cementitious Sorptivity Analysis Powered by Computer Vision. Sci Data 12, 904 (2025). https://doi.org/10.1038/s41597-025-05185-4

Download citation

Received: 03 January 2025
Accepted: 09 May 2025
Published: 29 May 2025
Version of record: 29 May 2025
DOI: https://doi.org/10.1038/s41597-025-05185-4