Introduction

Personalized treatment requires the right treatment at the right time for the right patient. However, there are multiple barriers to achieving these goals. Most standard-of-care clinical therapies are derived from clinical trials that aim to identify the treatment that improves the average desired outcome (local control, recurrence-free survival, disease-specific survival, overall survival, etc.) for the total population. Mean outcome, however, means that about half of the trial patient population has a better-than-average outcome, while the other half’s outcome is worse. There is a dire need to improve therapy for patients who are not average1. We must know critical tumor properties to understand how an individual tumor responds to treatment. There are multiple sources of patient-specific data—albeit with different temporal resolutions—including, but not limited to, patient demographics (including age, sex, race, and socio-economic status), medical history, patient-reported outcomes from medical surveys, blood samples (or liquid biopsies), radiology, and tissue biopsies typically for pathologic evaluation. Tumor dynamics, however, are currently not measurable with static pre-treatment routine clinical tests. This is the focus of the mathematical modeling technique developed herein.

Mathematical oncology, a burgeoning field of cancer research, seeks to develop quantitative methods to help understand, predict, and advance cancer biology and clinical oncology2,3,4,5. The explicit mathematical formulation of assumptions, analytical or numerical solution of the derived equation or systems of equations, and interpretation of the results within specified premises allow different hypotheses to be generated and tested6. For a mathematical model to make predictions, it is crucial to calibrate model parameters, validate model calibration on an independent data set, and test its predictive power on known outcomes7. Historical population data can train model parameter distributions, but the model and its parameters must be tailored to individual patients to help individualize cancer care. Thus, a significant barrier to model development, calibration, and validation is limited longitudinal clinical data to estimate patient-specific tumor dynamics before, during, and after treatment.

Among the most basic metrics of tumor dynamics are cell proliferation and the invasion rate into adjacent physiological tissue8. These are also critical parameters for mathematical models. One of the earliest and simplest models to simulate tumor growth and spread is the reaction-diffusion (R-D) equation, Fisher’s9, or Kolmogorov, Petrovsky, and Piskunov equation10. Gatenby and Gawlinski used the R-D equation to describe the tumor’s spatial spread and temporal development within the normal tissue. The model predicted a hypocellular interstitial gap between the growing tumor and the receding normal tissue, which was later confirmed in experimental studies and clinical tissue samples11.

The R-D equation has become omnipresent in mathematical oncology, often in multi-population model systems, during the past 25 years12,13,14,15,16,17,18,19. Swanson and colleagues demonstrated that different diffusion speeds in gray and white matter in the brain can simulate realistic glioma growth dynamics to help identify the locoregional spread of the tumor that may be below radiology detection thresholds20. In later work, Swanson demonstrated that the glioma R-D growth and invasion rates could be estimated from pre-treatment MRI features to predict survival21 and the efficacy of radiotherapy in individual glioblastoma patients22,23, as well as the effects of hypoxia on growth rates and recurrence location24. Enderling et al. have demonstrated that the R-D equation can be used to simulate the extent of cancer cells’ diffusion into adjacent tissue, which could help evaluate the therapeutic benefits of whole breast irradiation or targeted intraoperative radiotherapy25.

In addition to clinical data, cell diffusivity and cell proliferation rates can be inferred from cell scratch assay data26 and in vivo tumor growth dynamics27. With spatially resolved imaging data, extended R-D models can include anisotropy in different tissue regions28 and explicit consideration for molecularly distinct subpopulations29. Machine learning methods have been demonstrated to increase computational efficiency to enable numerical simulations of the R-D equation on richer data sets30. Integration of machine learning with the R-D equation also enables accurate predictions of variation in cell density of glioblastoma patients using multiparametric MRI31.

While multiple MRI features at multiple time points and rich pre-clinical data have been demonstrated to calibrate the R-D parameters in glioma patients, this approach cannot be used in many other clinical cancers without routine MRI imaging. Here, we hypothesized that routinely collected tissue biopsy data could be used to calibrate the R-D equation’s tumor growth and invasion parameters for individual patients. We demonstrate a method using spatial statistics and spectral analysis that permits an estimate of tumor growth and invasion rates when applied to histological images from breast cancer tissue biopsies.

Results

Different R-D equation parameters yield different power spectral densities (PSD)

We solve the R-D equation with arbitrary dimensionless parameters \(\hat{\gamma }=\{\mathrm{1,2,3}\}\) and \({\widehat{\mathcal{D}}}=\{{1,2,3}\}\) for an arbitrary time and space dimension. The time unit may be one day for tumor growth dynamics, and the corresponding parameter units for the R-D Eq. 1/day and μm2/day, respectively. Intuitively, a larger growth rate parameter, \(\gamma\), yields higher tumor densities, and a more significant diffusion rate, \({\mathcal{D}}\), leads to more spatially spread, invasive tumors. Thus, the most considerable localized tumor burden is simulated with a high \(\gamma\) and low \({\mathcal{D}}\) values (Fig. 1a). Analysis of the corresponding power spectral densities (PSD) at any given time (see Methods) reveals that variation in proliferation rates have a characteristic signature at low wavenumbers (intercept with the y-axis, Fig. 1b). In comparison, the influence at high wavenumbers is determined by variation in the diffusion coefficient (intercept with the x-axis) (Fig. 1c).

Fig. 1: Numerical analysis of the parameter dependency of the reaction-diffusion equation.
Fig. 1: Numerical analysis of the parameter dependency of the reaction-diffusion equation.
Full size image

a Numerical solution to the R-D equation of cancer cell density \(\rho (\hat{x},\hat{y},\hat{t})\) at an arbitrary time for different unitless proliferation and diffusion rates. b The power spectral density (PSD) for different growth rates, \(\hat{\gamma }\), for the same diffusion rate \({\widehat{\mathcal{D}}}\) = 2. c The power spectral density (PSD) for different diffusion rates, \({\widehat{\mathcal{D}}}\), for the same growth rate \(\hat{\gamma }\) = 2. The mathematical hat symbol above each variable and parameter indicates arbitrary units without specific biological reference.

Biopsy tissue spatial statistics and power spectrum analysis calibrate reaction-diffusion equation parameters

We analyzed the spatial position of each individual cancer cell in three different tumor ROI, \({\mathcal{T}}\), (Fig. 2) for each patient using spatial statistics (Fig. 3). The 2-point correlation function (see Methods) reveals high correlations at low radii \(r=\sqrt{{x}^{2}+{y}^{2}+{z}^{2}}\) around each cancer cell (r < 50 μm), which decays into randomness with increasing distances from each cell (r\(\ge\)50 μm) (Fig. 3b). The corresponding power spectrum densities (see Methods) exhibit the highly characteristic “shoulder profile” as the PSD of the continuous R-D equation above (Fig. 3c). The PSD of the continuous R-D equation can be fit to the average PSD of the different \({\mathcal{T}}\) for each patient (Fig. 4). The comparison of the different tumor growth rates averaged per patient, \(\hat{\gamma }\), and tumor invasion rates averaged per patient, \({\widehat{\mathcal{D}}}\), shows a highly heterogeneous distribution of mean growth rates skewed towards slower growth and a narrow distribution of mean invasion rates (Fig. 5a). The corresponding probability distribution of \({\widehat{\mathcal{D}}}\)/\(\hat{\gamma }\) for all \({\mathcal{T}}\) ROI in the analysis is shown in Fig. 5b. The high patient-specificity of growth rates has been suggested to have significant implications for radiotherapy response, with the growth rate being highly correlated with radiosensitivity parameters22.

Fig. 2: Data acquisition.
Fig. 2: Data acquisition.
Full size image

A board-certified pathologist selects 3 × 3 regions of interest (ROI), including tumor (\({\mathcal{T}}\)), stroma distant to the tumor (\({\mathcal{S}}\)), and tumor-stroma interface (\({\mathcal{T}}{/}{\mathcal{S}}\)) from breast cancer biopsy tissues for analysis.

Fig. 3: Spatial statistical analysis of patient biopsy tissues.
Fig. 3: Spatial statistical analysis of patient biopsy tissues.
Full size image

a Three different tumor \({\mathcal{T}}\)-ROI for an individual patient (patient 20). b Corresponding 2pCF for each tissue. c Corresponding PSD for each 2pCF.

Fig. 4: R-D equation fits to the PSD of the three \({\mathcal{T}}\)-ROI in Fig. 2c.
Fig. 4: R-D equation fits to the PSD of the three 
                        
                          
                        
                        $${\mathcal{T}}$$
                        
                          T
                        
                      -ROI in Fig. 2c.
Full size image

RD-equation provides excellent fit to the average data PSD.

Fig. 5: Analysis of reaction-diffusion equation parameter distributions for all T-ROI for the 44 patients in the analysis.
Fig. 5: Analysis of reaction-diffusion equation parameter distributions for all T-ROI for the 44 patients in the analysis.
Full size image

a Distribution of average diffusion rates, \({\widehat{\mathcal{D}}}\), and growth rates, \(\hat{\gamma }\). b Histogram of all ratios of patient averaged invasion and growth rates, \({\widehat{\mathcal{D}}}\)/ \(\hat{\gamma }\).

Discussion

Understanding tumor growth and invasion dynamics before clinical intervention is paramount to personalizing patient care and improving outcomes. We developed a novel quantitative approach to infer such dynamic characteristics from a single biopsy at patient diagnosis. Patient tissues provide a single-time snapshot of the complex evolution of the cellular density in different spatial locations inside the resection area. The presented approach relies on the “fair representation” hypothesis that histopathology slides are reasonably representative of the clustering properties of the total tumor population. While locoregional properties may be well captured in individual biopsy tissue samples, the overall tumor may be highly heterogeneous. Therefore, the herein-derived methodology must undergo rigorous prospective testing to quantify confidence and uncertainty in the derived parameter values. Of note, however, these are the only collected clinical tissues, and current clinical staging and treatment planning is done on these samples assuming fair representation. Therefore, applying the derived methodology to these data remains a valid approach.

It is conceivable that the exact location of each cancer cell and cancer cell conglomerates in tissue samples is a manifestation of the tumor-intrinsic proliferation and invasion dynamics. We demonstrated that spatial statistical analyses of biopsy tissues could calibrate tumor growth and invasion parameters in the reaction-diffusion equation for individual patients. While the 2-point correlation function and the corresponding power spectrum distribution from clinical tissues are virtually free of assumption, the R-D equation was not derived from the data. It was chosen due to its simplicity and prevalence in mathematical oncology, and because of the interpretability of equation parameters with biological hallmarks of cancer processes. There is no a priori justification for the choice and form of the equation if not the practical analytical tractability, especially in Fourier space. While more physically involved approaches exist, such as Navier Stokes approaches as well as more complex non-linear mathematical models with numerous variables, numerical investigation of the power spectral density (which easily captures several orders of magnitude in wavelength) would involve multiscale physical parametrization that is not justified by the resolution and quantity of routinely collected data at cancer diagnosis. Thus, the focus on the R-D equation aligns with available clinical data and is poised to advance predictive quantitative oncology frameworks and help guide clinical decision-making. If additional data become available to help calibrate and validate additional model parameters, more complex models may be considered to inform radiosensitivity for individual patients before further therapy.

While prospective validation against clinical outcome data is the gold standard to confirm the herein presented methodology, validation against other data would provide further support of this novel approach. Relative abundance of tumor cell proliferation markers, such as KI6732,33, could validate lower versus higher proliferation rates in different patient tissue samples. Similarly, relative abundance of cell adhesion molecules34 and epithelial to mesenchymal transition (EMT) markers35,36 could validate the invasion rates abstracted in the presented methodology. As the major motivation for the presented work is to abstract model parameters to inform treatment sensitivity and treatment response, in particular radiation therapy, the abstracted model parameters could be compared against molecular signatures of radiosensitivity, such as RSI37,38,39 or GARD40,41. Additionally, the theoretical concept that we propose herein, could be validated in more hypothetical scenarios. The R-D equation could be solved numerically, and derived population densities would inform discrete cell counts at specific spatial locations. Such in silico spatial cell maps can then serve as input for the proposed methodology, and the abstracted parameters could be validated against the numerical solution ground truth parameters. This will also allow for testing of model robustness against increasing noise levels in the discretization of the continuous R-D equation solution. For further stochasticity, one can simulate tumor growth and invasion with agent-based models42,43,44,45 and input such single cell simulation results into the proposed methodology.

This study’s analyzed tissues and model-calibrated parameters cannot be correlated with clinical outcome. Patients were either treated with neoadjuvant hypofractionated radiation (followed by surgery) or intraoperative radiotherapy. Inclusion of surgery in the treatment plan prevents correlation of R-D growth and invasion parameters with locoregional control or disease-free survival. It is conceivable that targeted local aggressive therapies (e.g., surgery or radiation therapy) would be most successful for a patient with a high proliferative index and a low diffusivity, as previously demonstrated for glioblastoma46. Seminal work by Rockne and Swanson demonstrated that R-D equation-derived tumor growth rates correlate with patient-specific radiosensitivity parameters and clinical radiation response dynamics22. Vice versa, systemic treatments (e.g., immunotherapy, chemotherapy) would be warranted for patients with higher diffusion rates, indicating long-range tissue invasion and an increased risk of metastasis.

The presented approach represents a fundamental step toward integrating quantitative methods into clinical decision-making to improve treatment responses and outcomes47. A possible prospective clinical validation study of the developed approach is to classify patients as high or low risk for locoregional radiation failure based on the R-D equation parameter ratio, \(\rho /{\mathcal{D}}\), as demonstrated for glioblastoma radiation response23. Such predictive biomarker can be synergized with other radiation response signatures37,39,40,48 to arrive at multimodal biomarkers to safely escalate or de-escalate radiation therapy for each individual patient. In addition to risk classification, the power of mechanistic modeling includes simulation of radiation response to different radiation doses and dose fractionations13,25,49,50,51,52. With a calibrated R-D equation, it is conceivable to simulate N = 1 clinical trials in a digital twin, and predict the most successful radiation protocol on a per patient basis1,53,54,55.

Our analysis suggests a narrow distribution of diffusion coefficients, \({\mathcal{D}}\), compared to the wider spread of proliferation rates, \(\gamma\). The nearly homogenous diffusion coefficient across all patients may reflect the selection of early-stage breast cancer patients with localized, non-invasive disease. For more invasive and diffusive diseases, such as glioblastoma, a wider distribution in diffusion coefficients compared to proliferation rates has been demonstrated56. Following the correlation of proliferation rates with radiosensitivity, a wide inter-patient heterogeneity of radiosensitivity has been demonstrated for many cancers, including breast cancer48.

While previous mathematical models and parameter abstraction methodologies require clinical measurements at multiple time points to calibrate tumor dynamics parameters21,57, this is the first solution that requires clinical data from only a single time point. Future steps will need the correlation of R-D equation-derived parameters with clinical outcomes, for which datasets with significant differences in clinical outcomes will be necessary. Herein, we developed the concept based on breast cancer tissue analysis. Future work will include pan-cancer analysis of calibrating tumor growth and invasion parameters with spectral-spatial analysis of cancer biopsy tissues to understand the methodology’s range and limits of applicability.

Methods

Patient cohort

\(44\) early-stage breast cancer patients were included in this analysis (NCT03137693: Preoperative Stereotactic Ablative Body Radiotherapy (SABR) for Early-Stage Breast Cancer; Pro00044616, Advarra: Immunobiology of ER+ and ER- Breast Cancer Tumors). Tissue collection and analysis was approved by the Moffitt Cancer Center Institutional Review Board. The IRB granted a Waiver of Consent per 45 CFR 46.116(d) and a Waiver of HIPAA Authorization per 45 CFR 164.512(i)(2) because the study is retrospective in nature and did not affect the way patients were treated. All data analyzed were collected as part of routine clinical care, with no additional interventions or procedures conducted for the research. A 5 μm thick unstained tumor slide from a formalin-fixed paraffin-embedded tumor block was acquired from patients’ untreated tissue biopsies and analyzed by the Moffitt Cancer Center tissue core and digital imaging laboratory. Formalin-fixed and paraffin-embedded (FFPE) tissue samples were immunostained using the AKOAYA Biosciences OPAL TM 7-Color Automation IHC kit (Waltham, MA) on the BOND RX autostainer (Leica Biosystems, Vista, CA). The OPAL 7-color kit uses tyramide signal amplification (TSA)-conjugated to individual fluorophores to detect various targets within the multiplex assay. Sections were baked at 65 C for one hour, and then transferred to the BOND RX (Leica Biosystems). All subsequent steps (ex., deparaffinization, antigen retrieval) were performed using an automated OPAL IHC procedure (AKOYA). OPAL staining of each antigen occurred as follows: heat-induced epitope retrieval (HIER) was achieved with Citrate pH 6.0 buffer for 20 min at 95 °C before the slides were blocked with AKOYA blocking buffer for 10 min. Then slides were incubated with primary antibody, CD68 (CST, D4BAC, 1:300, dye 520) at RT for 60 min followed by OPAL HRP polymer and one of the OPAL fluorophores during the final TSA step. Individual antibody complexes are stripped after each round of antigen detection. This was repeated five more times using the following antibodies; CD8 (DAKO, C8/144B, HIER-EDTA pH 9.0, 1:100, dye540), CD4 (CM, EP204, HIER- EDTA pH 9.0, 1:100, dye570), CD3 (Thermofisher, SP7, HIER-EDTA pH 9.0, 1:500, dye 570), FOXP3 (ABCAM, 236 A/E7, HIER- EDTA pH 9.0, 1:500, dye650), and PCK (DAKO, AE1/AE3, HIER- Citrate pH 6.0, 1:200, dye690). After the final stripping step, DAPI counterstain is applied to the multiplexed slide and is removed from BOND RX for coverslipping with ProLong Diamond Antifade Mountant (ThermoFisher Scientific). All slides were imaged with the Vectra®3 Automated Quantitative Pathology Imaging System, and the exact (x,y) position for each individual cell was abstracted. Three Regions of Interest (ROI) were selected by a pathologist, including tumor (\({\mathcal{T}}\)), stroma distant to the tumor (\({\mathcal{S}}\)), and tumor and stroma interface (\({\mathcal{T}}{/}{\mathcal{S}}\)) (Fig. 2). Here, we focus only on the tumor ROI, \({\mathcal{T}}\), and the cancer cell staining DAPI + PCK.

Reaction-diffusion equation and 2-point correlation function

The diffusion of a cluster of cancer cells with density \(\rho =\rho \left({\boldsymbol{r}},t\right)\) at position \({\boldsymbol{r}}\) and at a time \(t\) is described by the combination of the continuity equation without source and sink terms as \({\partial }_{t}\rho +{\boldsymbol{\nabla }}.{\boldsymbol{\varphi }}=0\) (with \(\partial\) being the partial derivative and \({\boldsymbol{\nabla }}\) the gradient operator) and Fick’s first law of proportionality between flux \({\boldsymbol{\varphi }}\) and density gradient \({\boldsymbol{\nabla }}\rho\), \({\boldsymbol{\varphi }}={\boldsymbol{-}}{\mathcal{D}}{\mathscr{.}}{\boldsymbol{\nabla }}\rho\), to get (for a generic anisotropic flow) \({\partial }_{t}\rho =-{\boldsymbol{\nabla }}.\left({\mathcal{D}}{{.}}{\boldsymbol{\nabla }}\rho \right)\) where \({\mathcal{D}}\) is the diffusion coefficient matrix. In a 3D system of reference \(S\left({\boldsymbol{O}},{\boldsymbol{r}}=\left\{x,y,z\right\}\right)\), with \({r}^{2}={x}^{2}+{y}^{2}+{z}^{2}\) being the squared radial direction from an arbitrary origin \({\boldsymbol{O}}\), we can write the diffusion equation for the locally homogeneous and isotropic case as \({\partial }_{t}\rho ={\mathcal{D}}{{\boldsymbol{\nabla }}}^{2}\rho ,\) where \({{\boldsymbol{\nabla }}}^{2}\) is the Laplacian operator. Green functions offer a natural solution for cell-like source points, even in the presence of an explicit exponential growth factor \(\gamma\), i.e., \({\partial }_{t}\rho ={\mathcal{D}}{\nabla }^{2}\rho +\gamma \rho\) of interest here, in the form:

$$\rho \left({\boldsymbol{r}},t\right)=\frac{{\rho }_{0}}{{\left(4\pi {\mathcal{D}}t\right)}^{\frac{3}{2}}}{e}^{-\frac{{\|{\boldsymbol{r}}\|}^{2}}{4{\mathcal{D}}t}\,+\,\gamma t}$$
(1)

This approach follows exponential tumor growth without carrying capacity constraints. We assume that the cells are spread over a small 3D volume \({d}^{3}{\boldsymbol{r}}\) so that, as long as this volume is asymptotically smaller than the cell migration movement per unit time, i.e., to the first order in time, it holds on a linear scale \(d{\boldsymbol{r}}\prec O\left(\sqrt{{\mathcal{D}}{dt}}\right)\), we can generalize Eq. (1) as \({d}^{3}\rho \left({\boldsymbol{r}},t\right)={\rho }_{0}{d}^{3}{\boldsymbol{r}}{\left(4\pi {\mathcal{D}}t\right)}^{-\frac{3}{2}}{e}^{-\frac{{\|{\boldsymbol{r}}\|}^{2}}{4{\mathcal{D}}t}+\gamma t}.\)

In the case of \(n\) different cells \({d}^{3}\rho \left({\boldsymbol{r}},t\right)=\sum _{i}{\left(4\pi {\mathcal{D}}t\right)}^{-\frac{3}{2}}{\rho }_{0i}{e}^{-\frac{{\|{{\boldsymbol{r}}}_{i}\|}^{2}}{4{\mathcal{D}}t}+\gamma t}{d}^{3}{\boldsymbol{r}}\) or, in the passage to the continuous limit, we get the classical heat-equation result that we are going to exploit here for the R-D equation:

$$\rho \left({\boldsymbol{r}},t\right)={\int_{{{\mathbb{R}}}^{3}}}{d}^{3}{{\boldsymbol{r}}}^{{\prime} }{\rho }_{0}\left({{\boldsymbol{r}}}^{{{{\prime} }}}\right){\left({\mathcal{D}}t\right)}^{-\frac{3}{2}}{e}^{-\frac{{\|r-{{\boldsymbol{r}}}^{{{{\prime} }}}\|}^{2}}{4{\mathcal{D}}t}+\gamma t}={\rho }_{0}* {f}_{{\rm{RDPS}}}{\boldsymbol{.}}$$
(2)

The last term in the equation is of interest here. The cell concentration at the generic time \(t\) is the convolution, denoted here by “*” of the initial concentration with a Gaussian, that by borrowing the terminology from Astronomy58 we call R-D point-spread-function \({f}_{{\rm{RDPS}}}\). Here, cell concentration at \({\mathbf{+}}{\mathbf{\infty}}\) is assumed to be null. Furthermore, taking constant temperature and pressure in the tissue sample, we justify \({\mathcal{D}}\) to be constant and elaborate on the \({f}_{{\rm{RDPS}}}\) as the function that carries the temporal dependence.

To proceed further, we adopt the “fair representation” hypothesis59, that the sampled tissues are reasonably representative of the total tumor population, to proceed with a mean-field solution for the R-D equation. By performing this Gibbs average60, we assume that the average volume is large enough to contain sufficient cells to perform statistical analyses. Still, it is small enough to neglect large-scale gradients. In this case, we obtain from Eq. (1):

$$\left\langle \rho \right\rangle =\left\langle {\rho }_{0}\right\rangle * \left\langle {f}_{{\rm{RDPS}}}\right\rangle ,$$
(3)

where we define the average over the time of the R-D spread function as:

$${f}_{{\rm{RD}}}\equiv \left\langle {f}_{{\rm{RDPS}}}\right\rangle =\frac{1}{\Delta t}{\int_{0}^{\Delta t}}{{dtf}}_{{\rm{RDPS}}},$$
(4)

with \(\Delta t\) sufficiently small that accounts for the non-equilibrium dynamics of diffusion and proliferation required by the average in \({\rho }_{0}\). Finally, by considering the non-normalized autocorrelation of the average cell concentration \(\left\langle \rho \right\rangle\), we obtain

$${c}_{2p}=\left(\left({\rho }_{0}-\left\langle {\rho }_{0}\right\rangle \right)* \left({\rho }_{0}-\left\langle {\rho }_{0}\right\rangle \right)\right)* \left({f}_{{\rm{RD}}}* {f}_{{\rm{RD}}}\right)={\sigma }_{\rho }^{2}\left({f}_{{\rm{RD}}}* {f}_{{\rm{RD}}}\right),$$
(5)

where the initial density is non-correlated, and the correlation function reduces to its variance \({\sigma }_{\rho }\). The biological implication is that diffusing cells enhances cell proliferation—a visualization of loss of space and contact inhibition. Therefore, these reactions are stochastically correlated, and Eq. (4) assumes cell diffusion is the only mechanism by which cells become spatially correlated. The autocorrelation of \({f}_{{\rm{RD}}}\) captures this spatial correlation thus not contradicting Eq. (2) as ergodicity does not imply mixing considering our definition of averages.

Power spectrum of the reaction-diffusion equation

The R-D equation’s analytical solution is unavailable for arbitrary reaction terms; therefore, working with power spectral densities is more accessible than with 2-point correlation functions to connect cancer diffusion with its spatial distribution. Performing Eq. (3) on Eq. (4), we obtain \({f}_{{\rm{RD}}}\) as

$$\begin{array}{l}{f}_{{\rm{RD}}}=\frac{{e}^{\iota \|{\boldsymbol{x}}\|\sqrt{\frac{\gamma }{{\mathcal{D}}}}}}{8\pi {\mathcal{D}}\|{\boldsymbol{x}}\|}{\rm{erfc}}\left(\frac{\|{\boldsymbol{x}}\|}{2\sqrt{{\mathcal{D}}\varDelta t}}+\iota \sqrt{\gamma \varDelta t}\right)\\\qquad\quad+\,\frac{{e}^{-\iota \|{\boldsymbol{x}}\|\sqrt{\frac{\gamma }{{\mathcal{D}}}}}}{8\pi {\mathcal{D}}\|{\boldsymbol{x}}\|}{\rm{erfc}}\left(\frac{\|{\boldsymbol{x}}\|}{2\sqrt{{\mathcal{D}}\varDelta t}}-\iota \sqrt{\gamma \varDelta t}\right),\end{array}$$
(6)

where \({\rm{erfc}}()\) is the complementary error function and \(\iota\) the complex unit. Finally, considering Eq. (4) and Eq. (5) together with the definition of power spectral density, we obtain

$$P\left(k\right)\equiv {\left|{\int_{{{\mathbb{R}}}^{3}}}{d}^{3}{\boldsymbol{x}}{f}_{{\rm{RD}}}{e}^{\iota {\boldsymbol{k}}{\boldsymbol{.}}{\boldsymbol{x}}}\right|}^{2}={P}_{0}^{2}{\left(\frac{{e}^{\left(\gamma -{\|{\boldsymbol{k}}\|}^{2}{\mathcal{D}}\right)\varDelta t}-1}{\varDelta t\left(\gamma -{\|{\boldsymbol{k}}\|}^{2}{\mathcal{D}}\right)}\right)}^{2},$$
(7)

where for easier comparison with Eq. (13) we normalized with \({P}_{0}\) to the zero-wavelength by computing \({\int_{{{\mathbb{R}}}^{3}}}P\left({\boldsymbol{k}}\right){d}^{3}{\boldsymbol{k}}\).

The 2-point correlation function of spatial cell data

We implement the 2-point correlation function (2pCF) as a spatial statistic estimator for discretized cell data61,62,63,64,65. The spatial n-point autocorrelation function \({c}_{\textit{np}}\) is defined as the probability of finding n cells at coordinates \({{\boldsymbol{x}}}_{1}\), \({{\boldsymbol{x}}}_{2}\), …, \({{\boldsymbol{x}}}_{n}\) in a suitably defined reference frame \(S\left({\boldsymbol{O}},{\boldsymbol{x}}\right)\) centered in \({\boldsymbol{O}}\) with \({{\boldsymbol{x}}}_{i}={\left\{{x}_{1},{x}_{2},{x}_{3}\right\}}_{i}\) as spatial coordinates for the \(i=1,\,2,\,3,\ldots ,n\). We formalize \({c}_{{np}}\) as \({c}_{{np}}=\left\langle \delta \left({{\boldsymbol{x}}}_{1}\right),\delta \left({{\boldsymbol{x}}}_{2}\right),\ldots ,\delta \left({{\boldsymbol{x}}}_{n}\right)\right\rangle\). We note the significant differences between npCf and cooccurrence45, aside from normalization factors, the pixeled/grid-free definition of the npCf and the possibility to capture both geometry and statistical nature of the images analyzed without gray-levels definition. Because of the connection with the continuous R-D equation, we implement the following definition for the 2pCF:

$${c}_{2{\rm{p}}}\left({\boldsymbol{r}},\Delta t\right)=\frac{\left\langle \left(\rho \left({\boldsymbol{x}}{\boldsymbol{+}}{\boldsymbol{r}},t+{\Delta}t\right)-\left\langle \rho \right\rangle \right)\left(\rho \left({\boldsymbol{x}},t\right)-\left\langle \rho \right\rangle \right)\right\rangle }{{\left\langle \rho \right\rangle }^{2}},$$
(8)

with \(\left\langle \rho \right\rangle =n\) the number of cells for volume unit, and \(\Delta t\) is a time interval. Because the same tissue section cannot be sampled twice, we consider the time dependence of the autocorrelation. Therefore, Eq. (7) simplifies to:

$$\left\langle \rho \left({\boldsymbol{x}}+{\boldsymbol{r}}\right)\rho \left({\boldsymbol{x}}\right)\right\rangle ={n}^{2}\left(1+{c}_{2{\rm{p}}}\left(r\right)\right),$$
(9)

where \(r=\|{\boldsymbol{r}}\|{\boldsymbol{\equiv }}\|{{\boldsymbol{x}}}_{{\boldsymbol{1}}}{\boldsymbol{-}}{{\boldsymbol{x}}}_{{\boldsymbol{2}}}\|\). Then, if the probability of finding a cell in the volume \({dV}\) is \({dP}={ndV}\), the average number of cells in the finite volume spanned by the tissue ROI is \(\left\langle N\right\rangle ={nV}\), and the joined probability of finding two cells (say cell 1 and cell 2) at a given distance \(r\), is

$${dP}={n}^{2}d{V}_{1}d{V}_{2}\left(1+{c}_{2{\rm{p}}}\left(r\right)\right).$$
(10)

If the cell distribution is a 3D random Poisson point process, then the probability of finding cells in \(d{V}_{1}\) and \(d{V}_{2}\) are independent, i.e., \({c}_{2{\rm{p}}}=0\). If the cell positions are correlated, \({c}_{2{\rm{p}}}\, > \,0\), and if the positions are anticorrelated \({c}_{2{\rm{p}}}\in \left[-{1,0}\right]\).

To account for the 3D overlapping of cells in the spatial analysis of the 2pCF, we tested Eq. (8) with several kernels, boundary conditions, and bandwidth, always finding convergence (but not a coincidence) of results.

Power spectrum of spatial cell data

The Fourier transform for the distribution of cells with a density \(\rho \left({\boldsymbol{x}},t\right)={\left(\left\langle \rho \right\rangle V\right)}^{-1}\sum _{i}\delta \left({\boldsymbol{x}}-{{\boldsymbol{x}}}_{i}\left(t\right)\right)\) is

$${\delta }_{k}={\left({nV}\right)}^{-1}\sum _{i}{e}^{\iota {\boldsymbol{k}}{{\cdot }}{{\boldsymbol{x}}}_{{i}}},$$
(11)

with wavenumber \(k=\|{\boldsymbol{k}}\|\), with \({\|\,\|=\|\,\|}_{2}\) Euclidean norm, \({\boldsymbol{k}}=\{{{\boldsymbol{k}}}_{{\boldsymbol{x}}},{{\boldsymbol{k}}}_{{\boldsymbol{y}}},{{\boldsymbol{k}}}_{{\boldsymbol{z}}}\}\), where the barycenter of the ith cell is located at \({{\boldsymbol{x}}}_{i}\) and \({e}^{\iota {\boldsymbol{k}}{{\cdot }}{{\boldsymbol{x}}}_{{\boldsymbol{i}}}}\) is periodic in \(V\) (again here \(\iota\) is the complex imaginary unit, “” the inner product). We slice the volume \(V\) (we refer to the slice and its volume with the same symbol \(V\) without loss of generality) into infinitesimal cells with unitary function \({{\boldsymbol{1}}}_{{dV}}\) (\({{\boldsymbol{1}}}_{{dV}}=1\) if the cell is in the volume \(V\), or \({{\boldsymbol{1}}}_{{dV}}=0\) if it is not). Therefore Eq. (10) is equivalent to

$${\delta }_{k}={\left({nV}\right)}^{-1}\mathop{\sum}\limits_{i}{{\bf{1}}}_{{dV}}{e}^{\iota {\boldsymbol{k}}{{\cdot }}{{\boldsymbol{x}}}_{{\boldsymbol{i}}}}.$$
(12)

The averaged two-cell contribution to the spectrum of wavelengths, say cell \(1\) in \(d{V}_{1}\) and cell 2 in \(d{V}_{2}\), reads (note how because \(\delta \left({\bf{x}}\right)\) is real, the complex-conjugate \({\delta }_{{\bf{k}}}^{* }={\delta }_{-{\bf{k}}}\)):

$$\begin{array}{*{20}{l}}{\left({nV}\right)}^{2}\left\langle {\delta }_{{\boldsymbol{k}}}{\delta }_{-{{\boldsymbol{k}}}^{{{{\prime} }}}}\right\rangle &=&\sum \left\langle {{\bf{1}}}_{1}^{2}\right\rangle {e}^{\iota \left({\boldsymbol{k}}-{{\boldsymbol{k}}}^{{{{\prime} }}}\right)\cdot {{\boldsymbol{x}}}_{1}}+{\sum} \left\langle {{\bf{1}}}_{1}{{\bf{1}}}_{2}\right\rangle {e}^{\iota \left({\boldsymbol{k}}\cdot {{\boldsymbol{x}}}_{1}-{{\boldsymbol{k}}}^{{{{\prime} }}}\cdot {{\boldsymbol{x}}}_{2}\right)}\\&=&n{\displaystyle\int_{{{\mathbb{R}}}^{3}}}d{V}_{1}{e}^{\iota \left({\boldsymbol{k}}-{{\boldsymbol{k}}}^{{{{\prime} }}}\right)\cdot {{\boldsymbol{x}}}_{1}}+{n}^{2}{\int }_{{{\mathbb{R}}}^{3}}d{V}_{1}d{V}_{2}\left(1+{c}_{2{\rm{p}}}\right){e}^{\iota \left({\boldsymbol{k}}\cdot {{\boldsymbol{x}}}_{12}+\left({\boldsymbol{k}}-{{\boldsymbol{k}}}^{{{{\prime}}}}\right)\cdot {{\boldsymbol{x}}}_{2}\right)}, \end{array}$$
(13)

because \(\langle {{\boldsymbol{1}}}_{1}{{\boldsymbol{1}}}_{2}\rangle ={n}^{2}d{V}_{1}d{V}_{2}(1+{c}_{2{\rm{p}}})\) as results of considering Eq. (9) (with \({{\boldsymbol{x}}}_{12}\) Euclidean distance between cell 1 and 2). Because Fourier components belong to different \({\boldsymbol{k}}\) are statistically independent, the integrals in the previous vanish if \({\boldsymbol{k}}{\;{\ne}\;}{{\boldsymbol{k}}}^{{{{\prime} }}}\), while \({\boldsymbol{k}}={{\boldsymbol{k}}}^{{{{\prime} }}}\) yields the spectrum, or power spectral density, PSD, of the cell distribution:

$$P\left(k\right)\equiv \left\langle {\left|{\delta }_{{\boldsymbol{k}}}\right|}^{2}\right\rangle ={\int_{{{\mathbb{R}}}^{3}}}\frac{{d}^{3}{\boldsymbol{x}}}{V}{c}_{2{\rm{p}}}\left(r\right){e}^{\iota {\boldsymbol{k}}\cdot {\boldsymbol{x}}}+\frac{1}{{nV}},$$
(14)

for \({\boldsymbol{k}}\,{{\ne }}\,{\boldsymbol{0}}\), and \(\langle {|{\delta }_{0}|}^{2}\rangle ={\int_{{{\mathbb{R}}}^{3}}}\frac{{d}^{3}{\boldsymbol{x}}}{V}{c}_{2{\rm{p}}}+\frac{1}{{nV}}\) otherwise. The power spectrum measures the mean number of neighbors over a random Poisson distribution within a distance of \({{ \sim }}{k}^{-1}\) from a randomly chosen cell. The extra term \({\left({nV}\right)}^{-1}\) is due to shot noise (a contribution visible as white noise \({P}\left(k\right)\propto {k}^{0}\)). If the cell distribution is Poisson, then \({\int }_{{{\mathbb{R}}}^{3}}\frac{{d}^{3}{\textbf{x}}}{V}{c}_{2{\rm{p}}}\left(r\right){e}^{\iota {\textbf{k}} \cdot {\textbf{x}}}=0\), but the Fourier modes have a not null variance \(\langle {|{\delta }_{k}|}^{2}\rangle ={({nV})}^{-1}\) due to the discreteness. This classical result66,67 is often coupled with filter-design windows functions commonly implemented in power spectrum determination to optimize its analytical treatment, which we also tested with consistent results (next section). Nuttall-softening length33 will be preferred if necessary.

Modulation analysis

We can normalize the 2pCF to the unitary sphere in \({{\mathbb{R}}}^{3}\) to test the impact of the most common non-parametric kernel function \(\kappa\), wherefrom signal-analysis literature, we tested the uniform (default in this work) \(\kappa \left(u\right)=\frac{1}{2}\), triangular \(\kappa \left(u\right)=1-\left|u\right|\), parabolic \(\kappa \left(u\right)=\frac{3}{4}\left(1-{u}^{2}\right)\) with support \(\left|u\right|\le 1\), quadratic (biweight) \(\kappa \left(u\right)=\frac{15}{16}{\left(1-{u}^{2}\right)}^{2}\) over \(\left|u\right|\le 1\), cosine \(\kappa \left(u\right)=\frac{\pi }{4}\cos \left(\frac{\pi }{2}u\right)\) with support \(\left|u\right|\le 1\), Gaussian, SemiCircle, and Triweight \(\kappa \left(u\right)=\frac{35}{32}{(1-{u}^{2})}^{3}\) over \({|u|} \le 1\). Although many powerful parametric kernels might help the fitting process, we avoid arbitrary parameter introductions. By following the scheme in65, we can detect a range of interest in the bandwidth, and the impact of the window function on the final result is generally minor (Fig. 6). Finally, the radial distribution function \({c}_{2p}\) is generally computed without uncertainty bars attached. \({c}_{2p}\) is indeed an average over many different measurements of the same tissue tumor ROI, \({\mathscr{T}}\). Therefore, the relatively small standard error in the mean (because of analyses across many cells in the same ROI) could be worked out. Advanced bootstrap simulations found some attempts to account for this error in an astronomical context68. Another approach, different from what we have done, works on binned data, thus obtaining error bars as standard errors65.

Fig. 6: Comparison analysis of different kernels on the unnormalized PSD for an individual patient.
Fig. 6: Comparison analysis of different kernels on the unnormalized PSD for an individual patient.
Full size image

Analysis demonstrates no significant difference between different window functions.

Isotropy analysis

Not every image satisfies the hypothesis of slight deviation from isotropy for several reasons (sampling size, tumor clustering grade, etc.), as in the single thick dots ROI in the bottom left of panel 1 of Fig. 7. The corresponding rose-wind plot is in the central panel. Where necessary, we analyze the original ROI copied several times (9 copies in the left panel) until isotropy distribution is achieved (corresponding rose-wind plot in the right panel). Note how this poor approach assumes that the 2D isotropic distribution correctly represents the unknown 3D isotropic distribution. Unfortunately, no 3D information is available to inform a 3D diffusion tensor.

Fig. 7: Replicated cell distribution architecture.
Fig. 7: Replicated cell distribution architecture.
Full size image

The original (bottom left in the left plot) is with darker dots. The corresponding wind plot in the central panel. The distribution is replicated (3 lighter gray copies, 9 lighter gray copies etc.) until isotropy is reached at a tolerable level (20% of the overall distribution).