A dataset for aqueous surfactant phase behavior as a function of temperature and composition

Rummel, Felix; Warren, Patrick B.; Bray, David J.; Sumer, Zeynep; Booth, Jonathan; Shkurti, Ardita; Anderson, Richard L.

doi:10.1038/s41597-025-06306-9

Download PDF

Data Descriptor
Open access
Published: 26 November 2025

A dataset for aqueous surfactant phase behavior as a function of temperature and composition

Felix Rummel ORCID: orcid.org/0000-0002-8568-8128¹,
Patrick B. Warren¹,
David J. Bray ORCID: orcid.org/0000-0002-6107-0091¹,
Zeynep Sumer²,
Jonathan Booth¹,
Ardita Shkurti¹ &
…
Richard L. Anderson¹

Scientific Data volume 12, Article number: 2042 (2025) Cite this article

1410 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Presented here is a dataset (PhDat) discretizing the aqueous phase behavior of 143 surfactants as a function of temperature and composition. Across the complete dataset, we classify the discretized state points into 118 distinct possible phase states, comprising both single- and two-phase regions, taking a probabilistic approach to describe phase transitions and narrow biphasic gaps. We also outline the workflow adopted to obtain the digitized phase diagrams. We anticipate this dataset will be useful for machine learning or similar applications, in the practically important field of surfactant formulation. The dataset has been designed to be extensible such that it can accommodate a wider variety of surfactant mixtures or non-surfactant molecule phase diagrams.

Will we ever be able to accurately predict solubility?

Article Open access 18 March 2024

Structure-based modeling of critical micelle concentration (CMC) of anionic surfactants in brine using intelligent methods

Article Open access 17 August 2023

On the reaction–diffusion type modelling of the self-propelled object motion

Article Open access 03 August 2023

Background & Summary

Surfactants are a key ingredient in many formulated products, ranging from pharmaceuticals and cosmetics, to paints and cleaning products^{1,2,3,4,5,6,7,8}. Their usefulness stems from their ability to decrease surface tension at interfaces, which allows, for example, immiscible liquids to be blended into emulsions. The action of surfactants can be attributed to their structure, comprising one or more polar or charged ‘head’ groups attached to one or more non-polar ‘tail’ groups⁹. It is convenient to broadly classify surfactants by the state of charge in aqueous solution: thus, ionic surfactants dissociate completely to form surfactant ions and counterions (within this classification, anionic surfactants are negatively charged and cationic surfactants positively charged), nonionic surfactants remain neutral, and zwitterionic surfactants carry both positive and negative charge. In terms of physical chemistry, the phase behavior of surfactants is of particular interest. It is, for example, a key determinant of the rheology, which is relevant both for formulated products, for instance in the home and personal care sector, and also in materials processing.

Thermodynamically, the phase behavior of an aqueous surfactant solution (e.g. a binary surfactant–water mixture) is summarized in a phase diagram, which gives the phase state as a function of temperature (and/or pressure) and composition. A full discussion of the rich complexity of surfactant phase behavior is not appropriate for this article, so we limit ourselves to sketching out the main features below. For more details the reader may wish to consult Laughlin’s comprehensive monograph on the subject¹, or Abbot¹⁰.

Two prototypical examples of surfactant phase diagrams are shown in Figs. 1a and 2a. At different compositions and conditions the solution exists in different phases or phase states. The phase state depends on how the surfactant self-organizes into assemblies such as micelles, rods, or lamellar sheets. These assemblies themselves may be then further ordered into ‘mesophases’ such as, for example, cubic packings of micelles, hexagonal arrangements of rods, stacked lamellar sheets, and so on.

The properties of these phases vary dramatically, with disordered micellar solutions typically being of low viscosity (unless the micelles grow into rod-like or worm-like structures), and mesophases such as cubic micellar packings, etc, often being highly viscous liquids or even soft solids. Usually the transitions between the mesophases are weakly first-order so that there are narrow biphasic gaps (two-phase regions) between them, and they also often ‘melt’ on increasing temperatures into a disordered liquid phase. Below the freezing point of the surfactant, mesophases or surfactant solutions coexist with various solid surfactant phases, which are often stoichiometrically hydrated crystals. Below the freezing point of water itself, ice appears as an additional phase, leaving a liquid region of diminishing extent which vanishes at a classic eutectic point. Conversely at high temperatures, and low concentrations, many nonionic surfactants exhibit what is usually termed a ‘cloud’ region. This is actually a broad two-phase region where the surfactant solution coexists with essentially excess water. This cloud region may ‘collide’ with the mesophase regions, further enriching the phase diagram and giving rise to yet further phase state possibilities such as a ‘sponge’ phase (dilute, disordered, singly-connected lamellar sheets).

Thus we see that an individual surfactant can exhibit a potentially complicated and rich phase diagram, with many features such as the above-listed intermediate mesophases and two-phase regions. While it is perfectly possible to collect this phase data experimentally, it is highly desirable if one could predict the phase behavior, especially for novel surfactants which may not have been synthesized and purified. This is especially attractive in the current surfactant markets, motivated by a pressing move to rapidly decarbonize the multi-billion dollar surfactant supply chain (moving away from petrochemicals and traditional plant-based feedstocks towards sustainably-sourced raw materials).

While simulation approaches such as molecular dynamics and coarse-grained methods such as dissipative particle dynamics are being continuously developed to capture phase behavior and other properties with improving accuracy^11,12,13,14, these methods still possess some shortcomings due to high computational cost and accessible time scales^15,16. Machine Learning (ML) approaches may offer a dramatically more cost effective alternative to this, potentially enabling the rapid prediction of complete phase diagrams for novel surfactants or filling in partially complete phase diagrams, allowing for a small amount of experimental data to be supplemented by ML data. ML has been used to predict phase diagrams^{17,18,19,20,21,22,23} and also other chemical properties based only on relatively simple descriptors such as the SMILES string^24,25,26.

All ML approaches rely on the availability of data, with more available data generally resulting in a better outcome²⁷. This is also true for simulation approaches which rely upon experimental observables to fit and validate models. To the best of our knowledge, the most complete (although not readily accessible) surfactant phase behavior dataset suitable for use in ML campaigns was collected by Bell²³. This comprised a dataset for 23 nonionic surfactants covering binary temperature-composition phase diagrams. Bell used this dataset to train ML algorithms and to predict phase diagrams. This was taken further by Thacker and coworkers¹⁷, one of whose findings was, unsurprisingly, that in order to further improve the predictive power of ML algorithms looking to predict the phase behavior of surfactants, a larger, more comprehensive dataset for surfactant phase behavior must be constructed. This need for more phase diagram data was the motivator behind the work presented in this article. It is hoped that the dataset presented here can be expanded by others in future covering an even wider range of phase diagrams from a more diverse set of surfactants, leading to better simulation and ML models.

Our data discovery effort captured binary aqueous composition/temperature phase diagrams for 143 surfactants found in the literature. These span both nonionic and ionic molecules. A semi-automatic workflow (summarized below) was developed to expedite the data extraction process. Unlike in previous work, where a strict categorical assignment of the phase state was made, in the present dataset the phase state is represented probabilistically whereby each composition/temperature point is assigned a probability of being in a given phase state, with single phase regions and two-phase coexistence regions treated on an equal footing. This allows for experimental uncertainties and broadened phase boundaries to be represented more accurately, but is also directly useful for ML approaches.

Figure 3 indicates the distribution of data, making up 99 % of the total collected, both in terms of the frequency of occurrence of individual phase states and in terms of how many phase diagrams contain a given phase state. This shows that there is wide spread both in terms of how many diagrams contain a given phase state, and how large a given phase state is in a phase diagram. For example the L₁ phase features as a large area in many phase diagrams, whereas the L_β and I₁ phases are less well represented.

Methods

Previously, Bell discretized the phase behavior manually on a grid of state points and identified the corresponding phase state by visual inspection²³. While this is a possible method of data collection, though likely resulting in some inherent measurement error, it was desirable to automate this process as far as possible to ease creation of a larger dataset with a chosen grid size for each phase diagram to be collected. With this in mind we automated many steps of the procedure. The final workflow from obtaining the literature data to the final database entry is illustrated in Fig. 4 and outlined below. It can be broken down into three main steps: data collection, image processing, and data extraction.

Data Collection

Phase diagrams were obtained from a variety of sources, including papers and books as electronic or scanned physical copies. Table 1 summarizes the collected nonionic diagrams, while Table 2 summarizes the ionic diagrams. The surfactant name and SMILES string were determined from the source. If the surfactant was described as polydisperse the average structure was used for naming purposes and the number of chemical groups rounded to the nearest integer using a standardized notation as defined in the Table caption. For example, the compound R₁₁COO(EO)_12.8CH₃ studied by Fujiwara et al.²⁸ is renamed as C₁₁C(O)E₁₃Me in our notation (see Table 1).

Table 1 Nonionic surfactants in PhDat detailing database key, surfactant structure, experimental method (assigned an index to save space - see Table 3), reference and figure number.

Full size table

Table 2 Ionic surfactants in PhDat detailing database key, surfactant structure, experimental method in original reference (assigned an index to save space - see Table 3), reference and figure number.

Full size table

Table 3 Mapping between index given for experimental method in Tables 1 and 2 and the measurement type employed in the original literature.

Full size table

Initially the selection was sorted by visual inspection into 99 complete diagrams and 44 incomplete diagrams. If phases were ambiguous in their definition or boundaries and it was unclear on where the phase transitions are then the diagram was classed as incomplete otherwise it was classed as complete. Further, for this work only binary (water/surfactant) phase diagrams with numerically labeled temperature and composition axes were retained.

Assignment of Phase State Labels

A total of 118 unique phase states (both one- and two-phase regions) were identified, with the additional symbol U being used as a label for unknown regions in incomplete diagrams. The single phase regions are described in Table 4 and a comprehensive list of all phase states (i.e. both one- and two-phase regions) is given in Table 5. To manage this a consistent naming scheme was substituted over the original labels given in the diagrams. This often required detailed perusal of the source text in addition to the phase diagram itself, and some previous familiarity with surfactant phase science was invaluable.

Table 4 Adopted phase state labels for single phase regions, across all mapped phase diagrams.

Full size table

Table 5 List of all 118 identified phase states (excluding U) across all sources, adopting the labeling scheme in Table 4.

Full size table

Having digitized a wide array of experimentally collected phase diagrams it became clear that there are large differences between sources in the way phase diagrams are presented. Some of this may well be due to some diagrams dating back several decades but also more modern papers still show differences. These include not just labeling variations such as a lamellar phase being reported either as D, L_α or G, but also the indication of phase boundaries, uncertainty and disputed, unknown or unidentified phases in older diagrams etc.

For the future, we recommend that when reporting phase diagrams a clear description of the labels should be provided. Further, each region on the phase diagram should be labeled, describing an unlabeled region in the text only creates extra work when extracting the diagrams. Additionally it is generally clearer to have diagrams without grid lines, or have them in a different color or thickness, the same goes for indicating tie lines in two-phase regions. Further it is not always clear if some regions are simply broad two-phase regions or phase transitions, especially when these are not labeled, as such it may be beneficial to indicate transitions by solid lines and simply change their thickness. It may also be helpful to publish the raw data along with the phase diagram to allow for direct use of this.

Diagram Digitization

Since the phase diagrams were obtained from a variety of sources, it was necessary to design and establish a consistent methodology to extract a digital image of the phase diagram from various media. Screen capturing or figure downloads, where available, were used for electronic phase diagrams, whilst a scan of the phase diagram was used as a starting point for physical papers and books. Using a custom collection of Python scripts packaged into a user interface named CurveClaw²⁹ these images were processed. The workflow required the diagram to be fully enclosed on all four sides. While most diagrams were already presented as such, in some cases it was necessary to do this manually using image editing software such as Paint. In these cases lines visually parallel to the axis of the plot were added to enclose the diagram. Each image is then loaded and converted it to binary format. The largest contour by area is then used to identify the graph area. The contents of the contour are then used to extract the phase diagram from the original binary image. In cases where the diagram did not have straight axis, but was distorted, often the case with scans taken from books, the four corner points were translated to a rectangle. As a result a cropped image, containing only the contents enclosed by the outline of the diagram, was retained for further use.

For all phase diagrams sampled, the temperature and composition values were directly extracted from the corresponding original publications, as listed in Tables 1 and 2. Each phase diagram was digitized to preserve the full numerical range reported in the source literature, typically spanning 0–100 ^°C (min. −55 ^°C, max. 420 ^°C) and 0–100 wt% surfactant. Full temperature and composition ranges can be found in the published dataset. No extrapolation or interpolation beyond the published data was performed except as described in this article.

Image Rectifying

To proceed to data extraction, each phase shown on the image needs to be well defined, such that the phase domain is fully enclosed by solid boundaries. To do this, the following rectifications where applied to the image as required. If a phase transition was defined by both a dashed-line and a solid-line running in parallel close to each other the dashed-line was removed. If a phase transition was only defined by a dashed-line the line was made solid. If a phase transition was defined by two parallel dashed-lines (or solid-lines) a solid-line was drawn in the middle and both dashed-lines removed. If an area was left open the boundary line was continued at its current slope until it intercepted the edge of the diagram or another boundary line. Some diagrams indicated more gradual phase transitions by adding additional dashed or solid lines either horizontally across the diagram or in parallel to a phase transitions, these lines were removed. All labels, arrows, data points and other marks not indicating a phase transition were removed. Below the freezing point of the surfactant, horizontal lines (i.e. isotherms) were added to delineate between two-phase regions with different coexisting phases. Eutectic and peritectic points (both cases where three phases coexist at a single temperature) were also identified and the appropriate horizontal lines drawn in. Examples of how phase diagrams were edited in this way are shown in Figs. 1 and 2 respectively. Finally, each extracted diagram was inspected manually to ensure all phase transitions of the original phase diagram were captured correctly throughout the process.

Image of unique phases

Once the image was rectified, the boundary lines were thinned to a minimum, by sequentially removing pixels in order of their adjacency to white pixels without increasing the number of areas present in the image. This maximized the image area that could be sampled effectively (i.e. well defined as a specific phase) and ensured all phase transition lines were of equal width (namely one pixel). This produces a binary image where ‘1’ (i.e. black) represents part of a domain’s boundary line and ‘0’ (i.e. white) the middle of the phase domain. The image can next be converted to identify unique phase domains by labeling phase states by using scipy.ndimage.label() in Python, using the cleaned binary image as input; conveniently the number of phases n in the image is then simply the maximum pixel value in the output of this function. The value of n (≤118) will vary between phase diagrams but the numerical value cannot be simply substituted for the actual phase state label; rather a curated, unique, individualised mapping is required for each image to convert the numbered regions to the standardized phase state labels.

Dataset Generation

To create a dataset from the image, a grid of sample points was extracted from the image using the range of composition and temperature of the phase diagram and specifying the grid resolution. Here a common resolution of 1 ^°C and 1 wt% was used (although the user may specify any desired grid spacing). Hence, each grid point was mapped to the equivalent test pixel on the image (rounding to the nearest pixel as required). Next, for each of these test pixels, the distance to all the pixels in a given phase was calculated. Since each pixel is simply a value in an array the row and column difference between two pixels can be used to obtain the distance they are apart on each axis. Here we treat the 1 ^°C and 1 wt% steps to be of equal length to obtain the final Euclidean distance between any two points. However different weightings could be used also. The minimum distance, d_i, of the test pixel to each phase i: i = 1, n is used to assign the phase state probability, P_i, of the test pixel according to \({P}_{i}={e}^{-{d}_{i}/2}\). Note that if a test pixel is in the same phase the distance to that phase is zero and as such P_i = 1, while if test pixel falls onto a boundary pixel it will be equally likely to be in adjacent phases. Finally, for each test pixel, all probabilities below the threshold of 10⁻³ were set to zero for simplicity and the resulting probabilities were normalized to one. This resulted in the final output matrix, with each row corresponding to a particular temperature and composition and each column corresponding to the probability of being in a particular phase state.

Automation of Image cleaning

While it is perfectly possible to perform all of the above image processing steps, adding and removing features as necessary, in any image processing software such as Paint, for a large number of images to be processed it is preferential to automate this process as much as possible. In particular to remove annotations in the form of lines and marks is very helpful. To do this CurveClaw was developed to enable editing of images and selecting desired features / curves. Key steps embedded in CurveClaw are using a convex hull with four corner points to identify the graph area and cropping the image to this, subsequently transforming the shape onto a rectangle to ensure axes are straight. This is followed by the user inputting an integer n. The program analyzes pixel connectivity and sorts them into areas by this, with some associated size (pixel count), then all pixels in all areas but the n largest areas are set to be white. This effectively discards all small regions keeping only the n largest regions. The user can then specify the correct value of n to keep only desired areas and that image will be saved.

For selecting individual curves in more difficult cases, readily available curve extraction tools such as CurveSnap or WebPlotDigitizer were trialed but found to be insufficient. As such the CurveClaw includes its own curve selection tool. Here the image is displayed for user interaction and points can be selected by the user for a specific curve, the points selected are stored in that order. The nearest black pixel to each selected points is found, and if it is near the picture border it is also checked if there is a connected pixel on the border, these coordinates are then saved. When the curve is saved the coordinate list is used to construct a new curve in an empty image of equivalent size. Here if two sequential points are in the same region (determined using the labeled image as before) a minimum paths traversing only pixels belonging to that regions is found and plotted on an empty image, if the pixels are not in the same region a straight line is drawn between them instead. The minimum path is found by treating the image as a maze, where white pixels are walls and only black pixels with the correct label can be traversed, then the shortest path is simply the path between selected pixels with the fewest steps (only up, down, left and right steps are allowed). Sometimes more than one minimum distance path is found, due to the nature of the probabilistic data it is not necessary to find the perfect minimum paths and since only pixels on the original curve can be selected the error potentially incurred by this is likely negligible compared to the error in the data collection of the phase diagram as we always recover some part of the original curve. Once all curves have been defined, all the phase regions have then been extracted effectively. All curves are overlayed to reconstruct the phase diagram and the user gives information about the axis sizes and if log scales are present, this is used to construct the grid of test points and return the probabilistic data as described above. The workflow is summarized in Fig. 4, with blue boxes indicating steps which are readily automated and green boxes representing steps that require more manual input, not all steps are always necessary.

Data Records

The PhDat dataset containing all collected data is available under the CC BY 4.0 license and is hosted on figshare, accessible via the link https://doi.org/10.6084/m9.figshare.29071202³⁰. Phase diagrams were processed using the methods described above and the results (the metadata and phase state probabilities for each phase diagram) were compiled into a JSON file structured as a list of records, indexed by a data record entry number. Each record thus contains data from one unique source, organized as a dictionary comprising: the SMILES string, the state of the diagram (either complete or incomplete if some areas are unknown), the name of the chemical compound, the source (e.g. the citation reference to the paper) and its figure location in the source (e.g. the figure number or page number), the purity of the chemical (if given), the measurement methodology in method, the type of the compound (nonionic, anionic, cationic, zwitterionic or mixed if both cation and anion are surfactants), the solvent which is water in all keys, the labels is a list of the original label assigned to a phase and the label this was assigned in the dataset, the keys for the data (header names) and then the values as a list for all data keys; the composition is always given as wt% (weight percent) of surfactant such that 0 wt% is pure solvent and 100 wt% is pure surfactant. Hence reading each column entry of the list of the set of data keys provides complete information on each discretized point of the diagram, e.g. its composition, temperature and the probability value (as a percentage) for each phase state. Each record retains the full temperature and composition grid corresponding to the original literature source. Note that this format allows for the same compound to have multiple records if there is more than one source for the phase diagram and one should not assume the SMILES strings are unique. To illustrate the record structure we show in Fig. 5 a generic example (record index “81”) for a compound with SMILES string CCCCOCCO and (hypothetical) phase states taken from Table 6, so that for instance at a temperature of 0 ^°C and a composition of 50 wt% the probability that the state point is in the isotropic liquid L₁ phase is 50%, the probability that it is in the cloud region (L₁ + W) is also 50%, and the probability that it is a lamellar phase (L_α) is 0% ; note that the grid size here does not reflect the grid size used in the actual dataset so as to better illustrate the changing probability as one moves across the diagram.

Table 6 Example output data: each row represents a particular temperature and composition point on the phase diagram for a specific molecule and gives the probability of that point being in a particular phase state.

Full size table

Technical Validation

In order to verify the automated data extraction a selection of Bell’s previously analyzed phase diagrams were digitized and sampled with the same grid size as in the original study²³. Here only the phase state of that particular point was sampled using a simplified script compared to the one outlined above, which rather than assigning a probability gives each point a categorical assignment e.g. L₁. The phase state assignments were then compared to Bell: only a handful of data points were found to have a different phase state assignment between our automated and Bell’s manual approach. All of these were grid points which fell onto or very near phase boundaries and as such could easily have been assigned as any of the adjacent phases by the manual approach. We note that the probabilistic assignment of phase states that we have adopted for the main dataset circumvents this problem in its entirety, since any point near a boundary would be assigned to all the nearby phases, with a vector of suitably weighted probabilities.

Usage Notes

The data is provided as a JSON file, and once loaded, data for any of the given surfactants can be retrieved using the record index or be filtered by any of the keys such as the SMILES string. A link to an example workflow³¹ for retrieving data from the dataset is provided along with the dataset in the code availability section. A user may for example use a SMILES string to extract all data on that structure as a list of dictionaries, where each dictionary contains the data for one source. Similarly a user may iterate through all entries and extract only entries containing anionic surfactants. The JSON structure allows for many different ways to extract the desired data.

Initial analysis of the dataset indicates that there is an imbalance in observed phase states, as shown in Fig. 3, showing the number of data points and phase diagrams for each given phase. Some phase states are a dominant in many diagrams (e.g. L₁, L₁ + W), whereas other phase states such as V₁ and N₁ appear only as small regions in a handful of phase diagrams. Table 5 provides a complete list of phase states identified across all sources. The dataset contains normalized data across all phases for a given point on the phase diagram (for each surfactant and point in the phase diagram, the probability vector is normalized). For incomplete diagrams removing all points assigned a non-zero probability of U (unknown) phase state can be useful. Further since not all phase diagrams were measured across the same range of temperatures and compositions, it may be of use to only consider data points in a particular range.

We endeavored to adhere to the FAIR principles³², each entry in the database has a unique identifier, its index number, with each entry being given with descriptive keys, metadata, and sources. By using a JSON format we aim to make it universally accessible and easy to add to in future. With anyone being able to send us suggestions via https://doi.org/10.6084/m9.figshare.29071202³⁰ (see Code Availability section) we plan to accumulate new data which will be regularly added to the dataset. We also plan to continually update PhDat with new data for different molecules and mixtures of molecules. By presenting data as reported in the original references while minimizing down stream processing we hope to stay accurate and allow for every user to make the processing decision themselves.

Data availability

The PhDat dataset is available under the CC BY 4.0 license and is hosted on figshare, accessible via the link https://doi.org/10.6084/m9.figshare.29071202³⁰.

Code availability

All Python scripts referred to in the article can be accessed via CurveClaw²⁹, a bespoke program used for the semi-automated extraction of phase diagram data into digital (numerical) form. CurveClaw is available under BSD 2-clause license from https://github.com/RummelF/CurveClaw.git, a tutorial is provided along with the code. We also provide dataExplorer.py³¹, a Python-based command-line tool for exploring, analyzing, and summarizing phase-state data associated with chemical compounds or surfactants. It enables researchers to search compound databases using SMILES identifiers, extract phase probability data, and generate structured summaries for further analysis. This can be found at https://github.com/ashkurti/DataExplorer and is also available under a BSD 2-clause license.

References

Laughlin, R. G.The Aqueous Phase Behavior of Surfactants (Academic Press, London, UK, 1994).
Karsa, D. R.Industrial Applications of Surfactants IV (Royal Society of Chemistry, Cambridge, UK, 1999).
Hegazy, M., Abdallah, M. & Ahmed, H. Novel cationic gemini surfactants as corrosion inhibitors for carbon steel pipelines. Corrosion Sci. 52, 2897–2904, https://doi.org/10.1016/j.corsci.2010.04.034 (2010).
Article Google Scholar
Xu, C. et al. Experimental investigation of coal dust wetting ability of anionic surfactants with different structures. Process Safety Environ. 121, 69–76, https://doi.org/10.1016/j.psep.2018.10.010 (2019).
Article Google Scholar
Zhao, M. et al. Study on the synergy between silica nanoparticles and surfactants for enhanced oil recovery during spontaneous imbibition. J. Mol. Liq. 261, 373–378, https://doi.org/10.1016/j.molliq.2018.04.034 (2018).
Article Google Scholar
Weiszhár, Z. et al. Complement activation by polyethoxylated pharmaceutical surfactants: Cremophor-EL, Tween-80 and Tween-20. Eur. J. Pharm. Sci. 45, 492–498, https://doi.org/10.1016/j.ejps.2011.09.016 (2012).
Article PubMed Google Scholar
Uchegbu, I. F. & Vyas, S. P. Non-ionic surfactant based vesicles (niosomes) in drug delivery. Int. J. Pharm. 172, 33–70, https://doi.org/10.1016/S0378-5173(98)00169-0 (1998).
Article Google Scholar
Lee, S., Lee, J., Yu, H. & Lim, J. Synthesis of environment friendly nonionic surfactants from sugar base and characterization of interfacial properties for detergent application. J. Ind. Eng. Chem. 38, 157–166, https://doi.org/10.1016/j.jiec.2016.04.019 (2016).
Article Google Scholar
Shaban, S. M., Kang, J. & Kim, D.-H. Surfactants: recent advances and their applications. Composit. Commun. 22, 100537, https://doi.org/10.1016/j.coco.2020.100537 (2020).
Article Google Scholar
Abbott, S.Surfactant Science: Principles & Practice (DEStech Publications Incorporated, 2017).
Panoukidou, M., Wand, C. R., Del Regno, A., Anderson, R. L. & Carbone, P. Constructing the phase diagram of sodium laurylethoxysulfate using dissipative particle dynamics. J. Colloid Interf. Sci. 557, 34–44, https://doi.org/10.1016/j.jcis.2019.08.091 (2019).
Article ADS Google Scholar
Guruge, A. G., Warren, D. B., Benameur, H., Pouton, C. W. & Chalmers, D. K. Aqueous phase behavior of the PEO-containing non-ionic surfactant C₁₂E₆: A molecular dynamics simulation study. J. Colloid Interf. Sci. 588, 257–268, https://doi.org/10.1016/j.jcis.2020.12.032 (2021).
Article ADS Google Scholar
Crespo, E. A., Vega, L. F., Pérez-Sánchez, G. & Coutinho, J. A. P. Unveiling the phase behavior of C_iE_j non-ionic surfactants in water through coarse-grained molecular dynamics simulations. Soft Matter 17, 5183–5196, https://doi.org/10.1039/D1SM00362C (2021).
Article ADS PubMed Google Scholar
Anderson, R. L. et al. Phase behavior of alkyl ethoxylate surfactants in a dissipative particle dynamics model. J. Phys. Chem. B 127, 1674–1687, https://doi.org/10.1021/acs.jpcb.2c08834 (2023).
Article PubMed PubMed Central Google Scholar
Shelley, J. C. & Shelley, M. Y. Computer simulation of surfactant solutions. Curr. Opin. Colloid Interf. Sci. 5, 101–110, https://doi.org/10.1016/S1359-0294(00)00042-X (2000).
Article Google Scholar
Taddese, T., Anderson, R. L., Bray, D. J. & Warren, P. B. Recent advances in particle-based simulation of surfactants. Curr. Opin. Colloid Interf. Sci. 48, 137–148, https://doi.org/10.1016/j.cocis.2020.04.001 (2020).
Article Google Scholar
Thacker, J. C. R., Bray, D. J., Warren, P. B. & Anderson, R. L. Can machine learning predict the phase behavior of surfactants? J. Phys. Chem. B 127, 3711–3727, https://doi.org/10.1021/acs.jpcb.2c08232 (2023).
Article PubMed PubMed Central Google Scholar
Agatonovic-Kustrin, S., Morton, D. W. & Singh, R. Hybrid neural networks as tools for predicting the phase behavior of colloidal systems. Colloid. Surface. A 415, 59–67, https://doi.org/10.1016/j.colsurfa.2012.10.005 (2012).
Article Google Scholar
Liu, D., Bai, G. & Gao, C. Phase diagrams classification based on machine learning and phenomenological investigation of physical properties in K_1−xNa_xNbO₃ thin films. J. Appl. Phys. 127, 154101, https://doi.org/10.1063/5.0004167 (2020).
Article ADS Google Scholar
Aghaaminiha, M., Ghanadian, S. A., Ahmadi, E. & Farnoud, A. M. A machine learning approach to estimation of phase diagrams for three-component lipid mixtures. BBA-Biomembranes 1862, 183350, https://doi.org/10.1016/j.bbamem.2020.183350 (2020).
Article PubMed Google Scholar
Peacock, C. J. et al. Predicting the mixing behavior of aqueous solutions using a machine learning framework. ACS Appl. Mater. Inter. 13, 11449–11460, https://doi.org/10.1021/acsami.0c21036 (2021).
Article Google Scholar
Kruglov, I. A., Yanilkin, A., Oganov, A. R. & Korotaev, P. Phase diagram of uranium from ab initio calculations and machine learning. Phys. Rev. B 100, 174104, https://doi.org/10.1103/PhysRevB.100.174104 (2019).
Article ADS Google Scholar
Bell, G. Non-ionic surfactant phase diagram prediction by recursive partitioning. Philos. Trans. Roy. Soc. A 374, 20150137, https://doi.org/10.1098/rsta.2015.0137 (2016).
Article ADS Google Scholar
Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241, https://doi.org/10.1038/s41586-020-2442-2 (2020).
Article ADS PubMed Google Scholar
Pinheiro, G. A. et al. Machine learning prediction of nine molecular properties based on the SMILES representation of the QM9 quantum-chemistry dataset. J. Phys. Chem. A 124, 9854–9866, https://doi.org/10.1021/acs.jpca.0c05969 (2020).
Article PubMed Google Scholar
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702, https://doi.org/10.1016/j.cell.2020.01.021 (2020).
Article PubMed PubMed Central Google Scholar
Shi, Y.-F. et al. Machine learning for chemistry: basics and applications. Engineering 27, 70–83, https://doi.org/10.1016/j.eng.2023.04.013 (2023).
Article Google Scholar
Fujiwara, M., Miyake, M. & Hama, I. Phase behavior of methoxypolyoxyethylene dodecanoate as compared to polyoxyethylene dodecyl ether and polyoxyethylene methyl dodecyl ether. Colloid Polym. Sci. 272, 797–802, https://doi.org/10.1007/BF00652420 (1994).
Article Google Scholar
Rummel, Felix. Curveclaw, https://github.com/RummelF/CurveClaw Version 0.1.23 (2025).
Rummel, Felix; B. Warren, Patrick; J. Bray, David; Sumer, Zeynep; Booth, Jonathan; Shkurti, Ardita. Phdat, https://doi.org/10.6084/m9.figshare.29071202.v2 Figshare (2025).
Shkurti, Ardita. dataexplorer.py, https://github.com/ashkurti/DataExplorer Version 0.1 (2025).
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3, 160018, https://doi.org/10.1038/sdata.2016.18 (2016).
Article PubMed PubMed Central Google Scholar
Mitchell, D. J., Tiddy, G. J. T., Waring, L., Bostock, T. & McDonald, M. P. Phase behaviour of polyoxyethylene surfactants with water. mesophase structures and partial miscibility (cloud points). J. Chem. Soc., Faraday Trans. 1 79, 975–1000, https://doi.org/10.1039/F19837900975 (1983).
Article Google Scholar
Ravey, J. & Stébé, M. Properties of fluorinated non-ionic surfactant-based systems and comparison with non-fluorinated systems. Colloid. Surface. A 84, 11–31, https://doi.org/10.1016/0927-7757(93)02731-S (1994).
Article Google Scholar
Kratzat, K. & Finkelmann, H. Branched non-ionic oligo-oxyethylene Y-amphiphiles effect of molecular geometry on the micellar shape 1. Liq. Cryst. 13, 691–699, https://doi.org/10.1080/02678299308026341 (1993).
Article Google Scholar
Qiu, H. & Caffrey, M. The phase diagram of the monoolein/water system: metastability and equilibrium aspects. Biomaterials 21, 223–234, https://doi.org/10.1016/S0142-9612(99)00126-X (2000).
Article PubMed Google Scholar
Briggs, J. & Caffrey, M. The temperature-composition phase diagram of monomyristolein in water: equilibrium and metastability aspects. Biophys. J. 66, 573–587, https://doi.org/10.1016/s0006-3495(94)80847-1 (1994).
Article ADS PubMed PubMed Central Google Scholar
Briggs, J. & Caffrey, M. The temperature-composition phase diagram and mesophase structure characterization of monopentadecenoin in water. Biophys. J. 67, 1594–1602, https://doi.org/10.1016/S0006-3495(94)80632-0 (1994).
Article ADS PubMed PubMed Central Google Scholar
Lühmann, B. & Finkelmann, H. A lyotropic nematic phase of lamellar micelles (N_L) obtained by a non-ionic surfactant in aqueous solution. Colloid Polym. Sci. 264, 189–192, https://doi.org/10.1007/BF01414849 (1986).
Article Google Scholar
Stubenrauch, C., Burauer, R. S. S. & Schmidt, C. A new approach to lamellar phases (L_α) in water - non-ionic surfactant systems. Liq. Cryst. 31, 39–53, https://doi.org/10.1080/02678290310001628555 (2004).
Article Google Scholar
Larsson, K. The structure of mesomorphic phases and micelles in aqueous glyceride systems. Z. Phys. Chem. 56, 173–198, https://doi.org/10.1524/zpch.1967.56.3_4.173 (1967).
Article Google Scholar
Qiu, H. & Caffrey, M. Phase behavior of the monoerucin/water system. Chem. Phys. Lipids 100, 55–79, https://doi.org/10.1016/S0009-3084(99)00040-7 (1999).
Article PubMed Google Scholar
Qiu, H. & Caffrey, M. Lyotropic and thermotropic phase behavior of hydrated monoacylglycerols: structure characterization of monovaccenin. J. Phys. Chem. B 102, 4819–4829, https://doi.org/10.1021/jp980553k (1998).
Article Google Scholar
Lavergne, A., Zhu, Y., Molinier, V. & Aubry, J.-M. Aqueous phase behavior of isosorbide-based non-ionic surfactants. Colloid. Surface. A 404, 56–62, https://doi.org/10.1016/j.colsurfa.2012.04.007 (2012).
Article Google Scholar
Clunie, J. S., Corkill, J. M., Goodman, J. F., Symons, P. C. & Tate, J. R. Thermodynamics of non-ionic surface-active agent + water systems. Trans. Faraday Soc. 63, 2839–2845, https://doi.org/10.1039/TF9676302839 (1967).
Article Google Scholar
Laughlin, R. G. Solvation and structural requirements of surfactant hydrophilic groups. In Brown, G. H. (ed.) Advances in Liquid Crystals, vol. 3 of Advances in Liquid Crystals, 41–98, https://doi.org/10.1016/B978-0-12-025003-5.50009-X (Elsevier, 1978).
Tiddy, G. Surfactant-water liquid crystal phases. Physics Reports 57, 1–46, https://doi.org/10.1016/0370-1573(80)90041-1 (1980).
Article ADS Google Scholar
Nibu, Y. & Inoue, T. Phase behavior of aqueous mixtures of some polyethylene glycol decyl ethers revealed by DSC and FT-IR measurements. J. Colloid Interf. Sci. 205, 305–315, https://doi.org/10.1006/jcis.1998.5621 (1998).
Article ADS Google Scholar
Kjellander, R. Phase separation of non-ionic surfactant solutions. a treatment of the micellar interaction and form. J. Chem. Soc., Faraday Trans. 2 78, 2025–2042, https://doi.org/10.1039/F29827802025 (1982).
Article Google Scholar
Huang, K. L., Shigeta, K. & Kunieda, H. Phase behavior of polyoxyethylene dodecyl ether-water systems. In Koper, G. J. M., Bedeaux, D., Cavaco, C. & Sager, W. F. C. (eds.) Trends in Colloid and Interface Science XII, 171–174 (Steinkopff, Darmstadt, Germany, 1998).
Nilsson, F., Söderman, O., Hansson, P. & Johansson, I. Physical-chemical properties of c9g1 and c10g1 β-alkylglucosides. phase diagrams and aggregate size/structure. Langmuir 14, 4050–4058, https://doi.org/10.1021/la980261a (1998).
Article Google Scholar
Kratzat, K. & Finkelmann, H. Branched non-ionic oligo-oxyethylene V-amphiphiles effect of molecular geometry on LC-phase behavior 2. Colloid Polym. Sci. 272, 400–408, https://doi.org/10.1007/BF00659451 (1994).
Article Google Scholar
Nilsson, F., Söderman, O. & Johansson, I. Physical-chemical properties of the n-octyl β-d-glucoside/water system. a phase diagram, self-diffusion NMR, and SAXS study. Langmuir 12, 902–908, https://doi.org/10.1021/la950602+ (1996).
Article Google Scholar
Marques, E. F. & Silva, B. F. B.Surfactants, Phase Behavior, 1290–1333 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2013).
McBain, J. W. & Sierichs, W. C. The solubility of sodium and potassium soaps and the phase diagrams of aqueous potassium soaps. Journal of the American Oil Chemists Society 25, 221–225, https://doi.org/10.1007/BF02645899 (1948).
Article Google Scholar
McBain, J. W. & Johnston, S. A. A note on the phase rule diagram for a mixture of sodium palmitate and sodium laurate with water. Journal of the American Chemical Society 63, 875–875, https://doi.org/10.1021/ja01848a510 (1941).
Article ADS Google Scholar
Vold, R. D. The phase rule behavior of concentrated aqueous systems of a typical colloidal electrolyte: Sodium oleate. The Journal of Physical Chemistry 43, 1213–1231, https://doi.org/10.1021/j150396a013 (1939).
Article ADS Google Scholar
Kunieda, H., Masuda, N. & Tsubone, K. Comparison between phase behavior of anionic dimeric (gemini-type) and monomeric surfactants in water and water-oil. Langmuir 16, 6438–6444, https://doi.org/10.1021/la0001068 (2000).
Article Google Scholar
Kocherbitov, V. & Söderman, O. Hydration of dimethyldodecylamine-n-oxide: Enthalpy and entropy driven processes. The Journal of Physical Chemistry B 110, 13649–13655, https://doi.org/10.1021/jp060934v (2006).
Article ADS PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the Hartree National Centre for Digital Innovation, a collaboration between STFC and IBM.

Author information

Authors and Affiliations

The Hartree Centre, STFC Daresbury Laboratory, Warrington, WA4 4AD, United Kingdom
Felix Rummel, Patrick B. Warren, David J. Bray, Jonathan Booth, Ardita Shkurti & Richard L. Anderson
IBM Research Europe - UK, Warrington, WA4 4AD, United Kingdom
Zeynep Sumer

Authors

Felix Rummel
View author publications
Search author on:PubMed Google Scholar
Patrick B. Warren
View author publications
Search author on:PubMed Google Scholar
David J. Bray
View author publications
Search author on:PubMed Google Scholar
Zeynep Sumer
View author publications
Search author on:PubMed Google Scholar
Jonathan Booth
View author publications
Search author on:PubMed Google Scholar
Ardita Shkurti
View author publications
Search author on:PubMed Google Scholar
Richard L. Anderson
View author publications
Search author on:PubMed Google Scholar

Contributions

R.L.A. and D.J.B. conceived the project and secured funding. F.R., A.S and R.L.A. gathered literature data. P.B.W. oversaw literature curation. F.R., P.B.W. conceived data extraction workflow. F.R., Z.S, J.B. and D.J.B. conceived database structure. R.L.A. and D.J.B. and oversaw project progress. F.R developed CurveClaw, A.S, developed DataExplorer. All authors developed the manuscript.

Corresponding authors

Correspondence to Felix Rummel or Richard L. Anderson.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rummel, F., Warren, P.B., Bray, D.J. et al. A dataset for aqueous surfactant phase behavior as a function of temperature and composition. Sci Data 12, 2042 (2025). https://doi.org/10.1038/s41597-025-06306-9

Download citation

Received: 09 June 2025
Accepted: 11 November 2025
Published: 26 November 2025
Version of record: 30 December 2025
DOI: https://doi.org/10.1038/s41597-025-06306-9