Introduction

Survival models study the time to event until a certain event of interest occurs. They are characterized by including censored (incomplete) information within the study, either because the individual never presented the event during follow-up or because the follow-up of the individual was truncated during the study. Within this context, due to the fact of not assuming a specific distribution for failure times, one of the most referenced models in the literature is Cox’s proportional hazards (PH). For \({\varvec{x}}_i=(x_{i1},\ldots ,x_{ip})\) a set of p observed covariates (without intercept term), the hazard risk function for this model is given by

$$\begin{aligned} h(t\, |\, {\varvec{x}}_i) = h_0(t)\exp ({\varvec{x}}_i^\top {\varvec{\beta }}), \end{aligned}$$
(1)

where \({\varvec{\beta }}=(\beta _1,\ldots ,\beta _p)\) denotes a vector of p observed covariates and \(h_0(\cdot )\) denotes the baseline hazard function. Note that this model provides a proportional hazard structure because the ratio for two individuals with profiles \({\varvec{x}}_i\) and \({\varvec{x}}_{i'}\)

$$\begin{aligned} \frac{h(t\, |\, {\varvec{x}}_i)}{h(t\, |\, {\varvec{x}}_{i'})} =\exp \left( ({\varvec{x}}_i-{\varvec{x}}_{i'})^\top {\varvec{\beta }}\right) , \end{aligned}$$

does not depend on t. A way to break the proportional hazard risk assumption is by using univariate frailty models, although in practice the concept of frailty is more intuitive to explain in the context of data grouped into clusters or data that have some type of association (measurements of the same individual, for example). In the literature, there are many models considered for the frailty distribution in a univariate context. To name a few, gamma1,2, inverse gaussian (IG)3,4, Birnbaum-Saunders (BS)5,6, folded normal7, weighted Lindley (WL)8,9, mixture of IG10, among others, where the restriction that the frailty variable has mean 1 is usually required to avoid identifiability problems.

However, when the observations are grouped in clusters with different sizes, a multivariate frailty model framework is required. In addition to the aforementioned restriction, in this case is required that the derivatives of the Laplace transform have a known form because the joint density function depends on it. Few distributions in the literature satisfy these conditions. The gamma, IG and the recently proposed WL shared frailty model9 satisfies those conditions. For this reason, the literature on frailty models in a multivariate context has increased only for the bivariate or trivariate case, in which case all the clusters have 2 or 3 observations, respectively. In addition to the three distributions mentioned above, we found the generalized exponential discussed in11 and the generalized inverse Gaussian presented in12. The truncated normal (TN) model was mentioned as a possible frailty distribution in7. However, it was used without imposing any mean restrictions or reparameterization, applying it solely within a copula model and restricting their analysis to the bivariate case. To date, the behavior of the TN model in the context of frailty has not been explored for clusters larger than two, let alone for groups with varying sample sizes.

In this paper, we use the TN distribution as the frailty distribution for clustered survival data. For our model to be identifiable, we employ a TN distribution with mean one and frailty variance as the frailty distribution by using a new parameterization of the TN distribution. The conditional distribution of frailties among the survivors and the frailty of individuals dying at time t can be explicitly determined. Furthermore, we propose a recurrent closed form for the derivatives of the Laplace transform. For parameter estimation, we give a simple EM algorithm, since all conditional expectations involved in the E-step are obtained in explicit form. Finally, the results of this paper have been implemented into R statistical software. The manuscript is organized as follows. Section 2 presents a background of frailty models and introduces the TN frailty model with parameterization such that the mean of the distribution is 1. Section 3 discusses the estimation procedure for the model based on a classical approach. Section 4 presents a simulation study to assess the performance of the proposed estimators in finite samples. In Section 5, we present two real data, the first related to the recurrence times of patients with renal problems and the second fibrosarcoma data. Finally, in Section 6 are presented the main conclusions of this work.

Background of frailty models

In this Section, we introduce the truncated normal distribution and we present a background of frailty models. Then, we introduced the novelty truncated normal frailty model for the univariate and multivariate cases.

The truncated normal distribution

A variable Z has TN distribution defined in the positive axis if its probability density function (PDF) is given by

$$\begin{aligned} g(z)=\dfrac{\phi \Big (\frac{z-\mu }{\sigma }\Big )}{\sigma \, \Phi \Big (\frac{\mu }{\sigma }\Big )},\quad z> 0, \end{aligned}$$

where \(\phi (\cdot )\) and \(\Phi (\cdot )\) denote the PDF and cumulative density function (CDF) of the standard normal distribution, \(-\infty< \mu < \infty\) represents a location parameter and \(\sigma> 0\) a scale parameter. The mean and variance of the TN distribution are given by

$$\begin{aligned} \mathbb {E}(Z)=\mu +\sigma \dfrac{\phi (\mu /\sigma )}{\Phi (\mu /\sigma )}, \quad \text{ and } \quad \text{ Var }(Z)=\sigma ^2\Bigg \{1-\dfrac{\mu \,\phi (\mu /\sigma )}{\sigma \,\Phi (\mu /\sigma )}-\bigg (\dfrac{\phi (\mu /\sigma )}{\Phi (\mu /\sigma )}\bigg )^2\Bigg \}. \end{aligned}$$

Considering the reparameterization \(\nu =\mu /\sigma\) and the restriction \(\sigma =\bigg (\nu +\dfrac{\phi (\nu )}{\Phi (\nu )}\bigg )^{-1}\), we obtain that the pdf of the model is reduced to

$$\begin{aligned} g(z)=\dfrac{\gamma \phi \bigg (\gamma z-\nu \bigg )}{\Phi \big (\nu \big )}, \quad \nu \in \mathbb {R}, z>0, \end{aligned}$$
(2)

with \(\gamma =\gamma (\nu )=\nu +\phi (\nu )/\Phi (\nu )\), and the mean and variance of the model are given by

$$\begin{aligned} \mathbb {E}(Z)=1 \quad \text{ and } \quad \theta =\text{ Var }(Z)=\gamma ^{-2}-\dfrac{\phi (\nu )}{\Phi (\nu )}\gamma ^{-1}, \end{aligned}$$

respectively. From now on we will use the notation TN\((\nu )\) to refer to a random variable with PDF given in Equation (2). We note that this parameterization was not proposed in the statistical literature. But, it is not possible to directly reparameterize the frailty variance in terms of \(\theta\), however, there is a one-to-one relationship between \(\theta\) and \(\nu\). Thus, this parameterization is very useful because allows us to compare different frailty models also parameterized in the frailty variance directly.

Note that under the restriction \(\text {E}(Z) = 1, 0 \le \theta =\text{ Var }(Z) \le 1\). In principle, this can be a disadvantage. However, in practice usually, the frailty variance satisfies this condition (see Section 6).

Figure 1 shows the pdf and variance of the TN\((\nu )\) model with different values for \(\nu\). The flexibility of the TN distribution is apparent. Furthermore, the variance of the TN distribution decreases as \(\nu\) increases.

Fig. 1
figure 1

PDF of the TN model and variance of the distribution in terms of \(\nu\). PDF of the TN model (left) and variance of the TN distribution (right).

The Laplace transform for the TN\((\nu )\) model is given by

$$\begin{aligned} \mathcal {L}_g\big (s\big )&=\dfrac{\Phi \big (\kappa \big )}{\Phi \big (\nu \big )}\exp \bigg \{\frac{s}{\gamma }\,\Big (\frac{s}{2\gamma }-\nu \Big ) \bigg \}, \end{aligned}$$
(3)

where \(\kappa =\kappa (s,\nu )=\nu -s/\gamma\). Let \(\mathcal {L}^{(d)}_g\big (s\big )\) be the d-th derivative of the Laplace transform. For \(d=1\) and \(d=2\) such term is given by

$$\begin{aligned} \mathcal {L}^{(1)}_g\big (s\big ) = -\dfrac{\mathcal {L}_g\big (s\big )}{\gamma }\Bigg (\kappa +\frac{\phi (\kappa )}{\Phi (\kappa )}\Bigg ) \quad \text{ and } \quad \mathcal {L}^{(2)}_g \big (s\big ) = \frac{ \mathcal {L}_g \big (s\big )}{\gamma ^2} \Bigg [\kappa \left( \kappa +\frac{\phi (\kappa )}{\Phi (\kappa )}\right) +1 \Bigg ]. \end{aligned}$$
(4)

In13, Corollary 2.1 presents a recurrence relation for derivatives of order 3 or higher of the generating-moment function (denoted as \(M_g(\cdot )\)) for the TN model. Using the property \(M_g(s)=\mathcal {L}_g(-s)\) we can derive the following relation:

$$\begin{aligned} \mathcal {L}_g^{(d)}(s) = \frac{(d-1)}{\gamma ^2}\mathcal {L}_g^{(d-2)}(s) - \frac{\kappa }{\gamma }\mathcal {L}_g^{(d-1)}(s), \quad d = 3,4,\ldots , \end{aligned}$$
(5)

which depends on the two last derivatives, but it is simple to implement computationally. Higher-order Laplace transforms are provided in the table included in the Supplementary Material file. The results in Equations (4) and (5) are very important to the development of our approach for the TN model within the context of frailty models.

Univariate frailty models

In a univariate context, the extended Cox model with the unobserved source of heterogeneity has a conditional hazard function given by

$$\begin{aligned} h(t\, |\, z_i,\textbf{x}_i) = z_i \, h_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }}), \end{aligned}$$
(6)

where \(\textbf{x}_i\) denotes a vector of covariates and \(z_i\) is a latent variable representing the unobserved heterogeneity of the i-th observation. For \(z_1, z_2,\ldots ,z_n\) a positive distribution is assumed (say one with pdf \(g(\cdot )\)), typically with mean 1 to avoid identifiability problems14. Similar to Eq. (1), this implies that the quotient of the conditional hazard function of two individuals does not depend on t, but we remark that in this case it is the conditional (and not marginal) risk function that satisfies this property. Also note that the larger \(z_i\) is, the greater the risk associated with that observation. The conditional survival function for the i-th individual obtained from equation (6) is given by

$$\begin{aligned} S(t\, |\, z_i,\textbf{x}_i)=\exp \left\{ -\int _0^t h(u\, |\, z_i,\textbf{x}_i)du\right\} =\exp \Big \{-z_i H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})\Big \}, \end{aligned}$$

where \(H_0(t)=\int _{0}^{t}h_0(u)du\) represents the basal cumulative hazard function. The marginal survival function can be obtained as

$$\begin{aligned} S(t\, |\, \textbf{x}_i)=\int _{0}^{\infty }\exp \Big \{-z_i H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})\Big \}g(z_i)dz_i=\mathcal {L}_g\left( H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})\right) , \end{aligned}$$

where \(\mathcal {L}_g(\cdot )\) corresponds to the Laplace transform of the pdf \(g(\cdot )\). On the other hand, the marginal hazard function is given by

$$\begin{aligned} h(t\, |\, \textbf{x}_i)=-\frac{\partial S(t\, |\, \textbf{x}_i)/\partial t}{S(t\, |\, \textbf{x}_i)}=-\dfrac{h_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})\, \mathcal {L}^{(1)}_g\big (H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})\big )}{\mathcal {L}_g\big (H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})\big )}, \end{aligned}$$
(7)

where \(\mathcal {L}^{(d)}_g(\cdot )\), \(d\in \mathbb {Z}\), denotes the d-th derivative of \(\mathcal {L}_g(\cdot )\). It is clear from Eq. (7) that the assumption PH is not satisfied in this case. Particularly, when \(Z_i \sim \text{ TN }(\nu )\), the marginal survival and hazard functions are reduced to

$$\begin{aligned} S(t\mid \textbf{x}_i)&=\dfrac{\Phi \big (\nu -\frac{H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})}{\gamma }\big )}{\Phi \big (\nu \big )}\exp \bigg \{\frac{H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})}{\gamma }\,\left( \frac{H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})}{2\gamma }-\nu \right) \bigg \}, \quad \text{ and } \nonumber \\ h(t\mid \textbf{x}_i)&=-\frac{h_0(t)}{\gamma }\exp \left( \textbf{x}_i^\top {\varvec{\beta }}\right) \left\{ \nu -\frac{H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})}{\gamma }+\frac{\phi \left( \nu -\frac{H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})}{\gamma }\right) }{\Phi \left( \nu -\frac{H_0(t)\exp (\textbf{x}_i^\top {\varvec{\beta }})}{\gamma }\right) } \right\} . \nonumber \end{aligned}$$
(8)

Finally, for the univariate case we present two propositions related to the conditional distribution for the frailty given the events \(T>t\) and \(T=t\), respectively.

Proposition 1.1

The conditional distribution for the frailty \(Z\mid T>t\), follows a TN(\(\varepsilon\)), where \(\varepsilon =\varepsilon (H_0(t),\nu )=\nu -H_0(t)/\gamma\).

Proposition 1.2

The conditional distribution for the frailty \(Z\mid T=t\) follows a modified half Normal (MHN)15, which density function is given by

$$\begin{aligned} f(z|T=t)=\frac{\gamma ^2 \exp \big (-\frac{\kappa ^2}{2}\big )}{\sqrt{2\pi }\big (\kappa \Phi (\kappa )+\phi (\kappa )\big )} z \exp \Big \{ -\frac{\gamma ^2}{2} z^2 + \gamma \kappa z\Big \}. \end{aligned}$$

Proofs of Propositions 1.1 and 1.2 are provided in the Supplementary Material.

Multivariate shared frailty model

In a more general context, it is possible to consider that the observations are grouped in m clusters and the ith cluster has \(n_i\) observations, for \(i=1,\ldots ,m\). This scenario is ad hoc when the observations in the same cluster have some kind of dependence. For instance, measurements in the same individual, or members of the same family, among others. The assumption here is that all the observations related to the same cluster are conditionally independent given its corresponding frailty term (\(z_i\)). With this assumption, we obtain that the conditional hazard and the joint survival function are given by

$$\begin{aligned} h(t_{i1},\ldots , t_{in_i} \mid z_i,\textbf{X}_{i})&=\sum _{j=1}^{n_i} h(t_{ij}\mid z_i, \textbf{x}_{ij})=z_i \sum _{j=1}^{n_i}\exp \left( \textbf{x}_{ij}^\top {\varvec{\beta }}\right) h_0(t_{ij}), \quad \text{ and }\\ S(t_{i1},\ldots , t_{in_i} \mid z_i,\textbf{X}_{i})&=\exp \left( -z_i \sum _{j=1}^{n_i}\exp \left( \textbf{x}_{ij}^\top {\varvec{\beta }}\right) H_0(t_{ij})\right) , \end{aligned}$$

respectively, where \(\textbf{x}_{ij}^\top =(x_{ij1},\ldots ,x_{ijp})\) denotes a vector of p covariates related to the j-th individual in the i-th cluster and \(\textbf{X}_i^\top =(\textbf{x}_{i1}^\top ,\ldots ,\textbf{x}_{in_i}^\top )\) denotes the vector with all the information for the p covariates associated with the \(n_i\) observations in the i-th cluster, \(z_i\) represents the influence of the i-th cluster on its observations. Integrating \(z_i\) over its density function is obtained that the marginal survival function for \({\varvec{t}}_i=(t_{i1},\ldots ,t_{in_i})\) is given by

$$\begin{aligned} S({\varvec{t}}_i\mid \textbf{X}_i) = \mathcal {L}_g \left( \sum _{j=1}^{n_i}H_0(t_{ij})\exp (\textbf{x}_i^\top {\varvec{\beta }})\right) , \end{aligned}$$

and then, the marginal hazard function is given by

$$\begin{aligned} h({\varvec{t}}_i\mid \textbf{X}_i) = \frac{(-1)^{n_i} \displaystyle \sum _{j=1}^{n_i} h_0(t_{ij})\exp (\textbf{x}_i^\top {\varvec{\beta }}) \, \mathcal {L}_g^{(n_i)}\Big (\displaystyle \sum _{j=1}^{n_i} H_0(t_{ij})\exp (\textbf{x}_i^\top {\varvec{\beta }})\Big )}{\mathcal {L}_g \Big (\sum _{j=1}^{n_i}H_0(t_{ij})\exp (\textbf{x}_i^\top {\varvec{\beta }})\Big )}. \end{aligned}$$
(9)

For the TN, expressions for Equation (8) can be expressed using the recursive formula in (5). For the bivariate case (i.e., \(n_i=2\), \(\forall i=1,\ldots ,m\)), the marginal hazard function is reduced to

$$\begin{aligned} h(t_{i1},t_{i2}\mid \textbf{x}_{i1},\textbf{x}_{i2})=\frac{1}{\gamma ^2}\sum _{j=1}^{2} h_0(t_{ij}) \exp (\textbf{x}_{ij}^\top {\varvec{\beta }}) \left[ \left( \nu -\frac{s_i}{\gamma }\right) \left( \left( \nu -\frac{s_i}{\gamma }\right) +\frac{\phi \left( \nu -\frac{s_i}{\gamma }\right) }{\Phi \left( \nu -\frac{s_i}{\gamma }\right) }\right) +1 \right] , \end{aligned}$$

where \(s_i=H_0(t_{i1})\exp (\textbf{x}_{i1}^\top {\varvec{\beta }})+H_0(t_{i2})\exp (\textbf{x}_{i2}^\top {\varvec{\beta }})\).

Kendall’s tau

Kendall’s \(\tau\) is a measure that quantifies the dependency between observations in the same cluster. This measure is independent of the unit of measurement of the data, so it works better than the variance and the correlation of the data due to its limitations (non-existence of the second moment, existence of censored observations, different measurement scale, see16, page 153, for details). Considering the Laplace transform and its second derivative (see Equations (3) and (4), respectively), we can determine the value of \(\tau\), for TN distribution, which is defined as

$$\begin{aligned} \tau&=4\int _{0}^{\infty } s\mathcal {L}^{(2)}(s)\mathcal {L}(s)ds - 1,\\&=4\int _{0}^{\infty } s \left[ \dfrac{\Phi \big (\kappa \big )}{\gamma \Phi \big (\nu \big )}\right] ^2\exp \bigg \{\frac{s}{\gamma }\,\Big (\frac{s}{\gamma }-2\nu \Big ) \bigg \} \Bigg [\kappa \left( \kappa +\frac{\phi (\kappa )}{\Phi (\kappa )}\right) +1 \Bigg ]ds-1. \end{aligned}$$

The integral is solved computationally since it does not have a closed expression. Figure 2 shows the different dependency values (\(\tau\)) according to the variance value for different frailty models. Note that \(\tau \in [0,0.33]\), for \(\theta \in (0,1)\) in the TN frailty model. We also note that, for a given frailty variance \(\theta \in (0,0.864)\), the TN frailty model produces a higher degree of dependence \(\tau\) than the GA, IG, and WL frailty models.

Fig. 2
figure 2

Comparison among Kendall’s \(\tau\) for TN, weighted Lindley (WL), gamma (GA) and inverse Gaussian (IG).

On the basal hazard function

The basal hazard function \(h_0(t)\) is usually modeled with common distributions with positive support, such as Weibull, gamma, and Gompertz, among others. For the Weibull distribution, we consider the parameterization such as \(h_0(t)=\lambda \rho t^{\rho -1}\) and \(H_0(t)=\lambda t^{\rho }\), \(t, \lambda , \rho>0\) and we denote \(T\sim \text{ W }(\lambda ,\rho )\) to refer to this particular parameterization. The Weibull model has been widely used in the literature because it adapts well to diverse biological, physical, chemical, and industrial processes, to name a few. Furthermore, its hazard function can assume monotonic forms (increasing, decreasing, or constant), which are controlled only by \(\rho\). On the other hand, the piecewise exponential (PE) distribution introduced in17 and extended in18 for the case with covariates. This model considers a constant risk between each predefined interval, say \((a_1,...,a_L)\) such as \(0=a_0<a_1<...<a_{L-1}<a_L = \infty\). This distribution is extremely useful for adapting critical points where there may be abrupt changes in the baseline risk function and which cannot be captured by non-segmented distributions such as the Weibull distribution. We say that T has PE model with vector of parameters \({\varvec{\lambda }}=(\lambda _1,...,\lambda _L)\) and known partition time \({\varvec{a}}=(a_1,...,a_{L-1})\) (we denote \(T\sim PE_a({\varvec{\lambda }}))\), if its survival function is given by

$$\begin{aligned} S(t) = \exp \bigg (-\sum _{l=1}^{L}\lambda _l\nabla _l(t) \bigg ),\,\,\, t>0, \end{aligned}$$

where

$$\begin{aligned} \nabla _l(t) = \left\{ \begin{aligned} 0&,\ \text {if} \ t< a_{l-1},\\ t-a_{l-1}&,\ \text {if} \ a_{l-1} \le t < a_l, \\ a_l-a_{l-1}&,\ \text {if} \ t> a_l. \end{aligned} \right. \end{aligned}$$

The hazard function is given by

$$\begin{aligned} h_0(t)=\lambda _\ell , \quad t\in (a_{\ell -1},a_\ell ],\ \ell =1,...,L, \end{aligned}$$

and the cumulative hazard function is given by

$$\begin{aligned} H_0(t)=\sum _{l=1}^{L}\lambda _l\Delta _l(t). \end{aligned}$$

In the literature, when the PE model is used in the context of frailty models, it is typically referred to as a semi-parametric model19. However, in this work, we also consider a non-parametric form for the baseline hazard distribution.

Estimation

In this section, we discuss the parameter estimation for the TN frailty model. Let \(Y_{ij}\) and \(C_{ij}\) be the failure and censoring times for the j-th individual in the i-th cluster and \({\textbf {x}}_{ij}\) be a \(p\times 1\) covariate vector (without intercept term), where \(1\le i \le m\) and \(1 \le j \le n_i\). Under a right censoring scheme, we observe the random variables \(T_{ij}=\min (Y_{ij}, C_{ij})\) and \(\delta _{ij}=I(Y_{ij} \le C_{ij})\), where \(I(A)=1\) if the event A occurs (0 otherwise). We assume the frailty terms \(Z_1,\ldots ,Z_m\) to be a random sample from the TN\((\theta\)) distribution. Considering the following assumptions:

  1. i)

    The pairs \((Y_{i1}, C_{i1}),\ldots ,(Y_{in_i}, C_{in_i})\) are conditionally independent given \(Z_i\), and \(Y_{ij}\) and \(C_{ij}\) are mutually independent for \(j=1,\ldots ,n_i\).

  2. ii)

    \(C_{i1}, \ldots , C_{in_i}\) are non-informative about \(Z_i\).

Under this setting, the observed log-likelihood function is given by

$$\begin{aligned} L({\varvec{\beta }}, H_0, \nu )&=\prod _{i=1}^m\int _0^{+\infty } \prod _{j=1}^{n_i} \left[ z_i h_0(t_{ij})\exp \left( {\textbf {x}}_{ij}^\top {\varvec{\beta }}\right) \right] ^{\delta _{ij}} \exp \left( -z_i H_0(t_{ij})\text {e}^{{\textbf {x}}_{ij}^\top {\varvec{\beta }}}\right) \dfrac{\gamma \phi (\gamma z_i-\nu )}{\Phi \big (\nu \big )}dz_i \nonumber \\&=\left( \frac{\gamma \text {e}^{-\nu ^2/2}}{\sqrt{2\pi }\Phi (\nu )}\right) ^m \exp \left( \sum _{i=1}^m \sum _{j=1}^{n_i} \delta _{ij}\textbf{x}_{ij}^\top {\varvec{\beta }}\right) \prod _{i=1}^m \int _0^\infty z_i^{r_i} \exp \left( -b_{\nu } z_i^2+c_{\varvec{\psi }}^{(i)} z_i\right) dz_i \prod _{j=1}^{n_i} h_0(t_{ij})^{\delta _{ij}}. \end{aligned}$$

where \(r_i=\sum _{j=1}^{n_i} \delta _{ij}\) is the failures in the i-th cluster, \(b_\nu =\gamma ^2/2\) and \(c_{\varvec{\psi }}^{(i)}=\gamma \nu -\sum _{j=1}^{n_i} H_0(t_{ij})e^{\textbf{x}_{ij}^\top {\varvec{\beta }}}\). However, the last integral is related to the modified half-normal (MHN) distribution15 and it can be written as

$$\begin{aligned} \int _0^\infty z_i^{r_i} \exp \left( -b_\nu z_i^2+c_{\varvec{\psi }}^{(i)} z_i\right) dz_i=\frac{1}{2}b_\nu ^{-(r_i+1)/2}\Psi \left( \frac{r_i+1}{2},\frac{c_{\varvec{\psi }}^{(i)}}{\sqrt{b_\nu }}\right) , \end{aligned}$$

where

$$\begin{aligned} \Psi \left( \frac{\alpha }{2},x\right) =\sum _{k=0}^\infty \frac{\Gamma (\frac{\alpha +k}{2})}{k!}x^k, \end{aligned}$$

is a specific case of the Fox-Wright function. The supplementary material in15 discusses different ways to compute this term. Therefore,

$$\begin{aligned} L({\varvec{\beta }}, H_0, \nu )=b_\nu ^{-(r+m)/2}\left( \frac{\gamma \text {e}^{-\nu ^2/2}}{2 \sqrt{2\pi }\Phi (\nu )}\right) ^m \exp \left( \sum _{i=1}^m \sum _{j=1}^{n_i} \delta _{ij}\textbf{x}_{ij}^\top {\varvec{\beta }}\right) \prod _{i=1}^m \Psi \left( \frac{r_i+1}{2},\frac{c_{\varvec{\psi }}^{(i)}}{\sqrt{b_\nu }}\right) \prod _{j=1}^{n_i} h_0(t_{ij})^{\delta _{ij}}, \end{aligned}$$

with \(r=\sum _{i=1}^m r_i\) the total failures in the sample. In a parametric approach, \(H_0(t)\) or \(h_0(t)\) are specified by a set of parameters, say \({\varvec{\lambda }}\), and then the parameter vector is reduced to \(({\varvec{\beta }}, {\varvec{\lambda }}, \nu )\). For instance, for the Weibull (WEI) distribution, we use the parameterization \(H_0(t)=\lambda \, t^\rho\) and \(h_0(t)=\lambda \, \rho \, t^{\rho -1}\), where \(t>0\) and \({\varvec{\lambda }}=(\lambda ,\rho )\in \mathbb {R}_+^2\). From a classical approach, the ML estimator can be obtained by maximizing \(\log L({\varvec{\beta }}, {\varvec{\lambda }}, \nu )\) relative to \({\varvec{\beta }}, {\varvec{\lambda }}\) and \(\nu\). For the flexibility discussed in previous sections, we also consider the PE model. However, it can be also attractive to discuss a non-parametric approach for the baseline distribution. For this, in the next subsection, we consider an estimation procedure based on the EM algorithm.

EM algorithm

Given the unobservable nature of the frailty terms, the EM algorithm is an ad hoc tool to be applied in this context. Let \({\varvec{t}}_i^\top =(t_{i1},...,t_{in_i})\), \({\varvec{\delta }}_i^\top =(\delta _{i1},...,\delta _{in_i})\) and \({\varvec{x}}_i^\top =({\varvec{x}}_{i1},...,{\varvec{x}}_{in_i})\) the observed times, failure indicators and covariates, related to the \(n_i\) observations in the i-th cluster, \(i=1,\ldots ,m\). For our particular problem, \(\mathcal {D}_c= ({\varvec{t}}^\top ,{\varvec{\delta }}^\top ,{\varvec{X}}^\top ,{\varvec{Z}}^\top )\) represents the complete data, where \({\varvec{t}}^\top =({\varvec{t}}_1^\top ,...,{\varvec{t}}_m^\top )\), \({\varvec{\delta }}^\top =({\varvec{\delta }}_1^\top ,...,{\varvec{\delta }}_m^\top )\), \({\varvec{X}}^\top =({\varvec{x}}_i^\top ,...,{\varvec{x}}_m^\top )\) and \({\varvec{Z}}^\top =(z_1,...,z_m)\), where \(\mathcal {D}_{o}=({\varvec{t}}^T,{\varvec{\delta }}^T,{\varvec{X}}^T)\) is the observed data and \({\varvec{Z}}^\top\) represents the vector of latent variables. Note that the complete likelihood function can be written as \(L({\varvec{\beta }},H_0,\nu ;\mathcal {D}_{c})=L_1({\varvec{\beta }}, H_0;\mathcal {D}_{c})\times L_2(\nu ; {\varvec{Z}})\), where \(L_1({\varvec{\beta }}, H_0;\mathcal {D}_{c})\) \(=\prod _{i=1}^m \prod _{j=1}^{n_i} \left[ z_i h_0(t_{ij})\exp ({\textbf {x}}_{ij}^\top {\varvec{\beta }})\right] ^{\delta _{ij}}\)\(\exp (-z_i H_0(t_{ij})\text {e}^{{\textbf {x}}_{ij}^\top {\varvec{\beta }}})\) and \(L_2(\nu ; {\varvec{Z}})=\prod _{i=1}^m f(z_i;\nu )\).

The complete log-likelihood function is given by \(\ell _c({\varvec{\beta }}, H_0,\nu ;\mathcal {D}_{c})=\ell _{1c}({\varvec{\beta }}, H_0;\mathcal {D}_{c})+\ell _{2c}(\nu ; {\varvec{Z}})\), where except for a constant that does not depend on \({\varvec{\beta }}\), \(H_0\) or \(\nu\), such functions are given by

$$\begin{aligned} \ell _{1c}({\varvec{\beta }},H_0;\mathcal {D}_c) =&\sum _{i=1}^{m} \sum _{j=1}^{n_i}\bigg \{ \delta _{ij}\big [\log h_0(t_{ij})+{\varvec{x}}_{ij}^\top {\varvec{\beta }}\big ] -z_i H_0(t_{ij}) \exp ({\varvec{x}}_{ij}^\top {\varvec{\beta }})\bigg \}, \quad \text{ and }\\ \ell _{2c}(\nu ; {\varvec{Z}}) =&\sum _{i=1}^{m} \Big \{\log \gamma - \log \Phi (\nu ) - \frac{1}{2}\log (2\pi ) - \frac{1}{2}\big (z_i^2\gamma ^2 - 2\gamma \nu z_i+\nu ^2\big ) \Big \}. \end{aligned}$$

Let \({\varvec{\psi }}^{(k)} = \left( {\varvec{\beta }}^{(k)},H_0^{(k)}, \nu ^{(k)}\right)\) be the estimated vector of \({\varvec{\psi }}= ({\varvec{\beta }},H_0, \nu )\) at the k-th iteration and

$$\begin{aligned} Q({\varvec{\psi }}\,|\, {\varvec{\psi }}^{(k)})=\mathbb {E}\left( \ell _c({\varvec{\beta }}, H_0,\nu ;\mathcal {D}_{c})\mid \mathcal {D}_o, {\varvec{\psi }}={\varvec{\psi }}^{(k)}\right) , \end{aligned}$$

i.e., the conditional expectation of \(\ell _c({\varvec{\beta }}, H_0,\nu ;\mathcal {D}_{c})\) given the observed data and \({\varvec{\psi }}^{(k)}\). Note that \(Q({\varvec{\psi }}\mid {\varvec{\psi }}^{(k)})=Q_1(({\varvec{\beta }},H_0)\mid {\varvec{\psi }}^{(k)})+Q_2(\nu \mid {\varvec{\psi }}^{(k)})\)

$$\begin{aligned} Q_1(({\varvec{\beta }},H_0)\mid {\varvec{\psi }}^{(k)})=&\sum _{i=1}^{m} \sum _{j=1}^{n_i}\bigg \{ \delta _{ij}\big [\log h_0(t_{ij})+{\varvec{x}}_{ij}^\top {\varvec{\beta }}\big ] -\widehat{z}_i^{(k)} H_0(t_{ij}) \exp ({\varvec{x}}_{ij}^\top {\varvec{\beta }})\bigg \}, \quad \text{ and } \end{aligned}$$
(10)
$$\begin{aligned} Q_2(\nu \mid {\varvec{\psi }}^{(k)})=&\sum _{i=1}^{m} \Big \{\log \gamma - \log \Phi (\nu ) - \frac{1}{2}\log (2\pi ) - \frac{1}{2}\big (\widehat{z_i^2}^{(k)}\gamma ^2 - 2\gamma \nu \widehat{z}_i^{(k)}+\nu ^2\big ) \Big \}, \end{aligned}$$
(11)

where \(\widehat{z}_i^{(k)}=\mathbb {E}\big [Z_i\mid \mathcal {D}_o,{\varvec{\psi }}={\varvec{\psi }}^{(k)}\big ]\) and \(\widehat{z_i^2}^{(k)} =\mathbb {E}\big [Z_i^2\mid \mathcal {D}_o,{\varvec{\psi }}={\varvec{\psi }}^{(k)}\big ]\). It is possible to show that

$$\begin{aligned} Z_i\mid {\varvec{t}}_i^\top ,{\varvec{\delta }}_i^\top \sim \text{ MHN }\left( a_i=1+r_i,b_\nu =\frac{\gamma ^2}{2}, c_{\varvec{\psi }}^{(i)}=\gamma \nu - \sum _{j=1}^{n_i} H_0(t_{ij}) \exp ({\varvec{x}}_{ij}^\top {\varvec{\beta }})\right) . \end{aligned}$$
(12)

Refer to the supplementary material file for a proof of this fact. Using this notation, and applying Lemma 2 from Sun et al.15, it follows immediately that

$$\begin{aligned} {\widehat{z}}_i^{(k)}&= \mathbb {E}\left( Z_i\mid \mathcal {D}_o,{\varvec{\psi }}={\varvec{\psi }}^{(k)}\right) = \frac{\Psi \left( \frac{r_i+1}{2};\frac{c_{\psi }^{(i)}}{\sqrt{b_\nu }}\right) }{\sqrt{b_\nu }\Psi \left( \frac{r_i}{2};\frac{c_{\psi }^{(i)}}{\sqrt{b_\nu }}\right) }, \quad \text{ and } \\ {{\widehat{z}}_{i}^{2}}{}^{(k)}&= \mathbb {E}\left( Z_i^{2}\mid \mathcal {D}_o,{\varvec{\psi }}={\varvec{\psi }}^{(k)}\right) = \frac{\Psi \left( \frac{r_i+1}{2};\frac{c_{\psi }^{(i)}}{\sqrt{b_\nu }}\right) }{b_\nu \Psi \left( \frac{r_i}{2};\frac{c_{\psi }^{(i)}}{\sqrt{b_\nu }}\right) }. \end{aligned}$$
(13)

On the other hand, it is possible to construct a discrete version of the cumulative baseline hazard function, considering \(H_0^D(t)=\sum _{\ell : t_{(\ell )}\le t} H_0(t_{(\ell )})\), where \(t_{(1)},\ldots ,t_{(q)}\) are the ordered distinct failure times and q is the number of different observed failure times. Replacing \(H_0(\cdot )\) and \(h_0(\cdot )\) in Equation (9) is obtained

$$\begin{aligned} Q_1(({\varvec{\beta }},H_0)\mid {\varvec{\psi }}^{(k)})=\sum _{\ell =1}^q d_{(\ell )} \log \left[ h_0(t_{(\ell )})\right] +\sum _{i=1}^m \sum _{j=1}^{n_i} \delta _{ij} {\textbf {x}}_{ij}^\top {\varvec{\beta }}-\sum _{\ell =1}^q h_0(t_{(\ell )})\sum _{i,j \in R(t_{(\ell )})} \widehat{z}_i^{(k)} \text {e}^{{\textbf {x}}_{ij}^\top {\varvec{\beta }}}. \end{aligned}$$

Replacing the solution for \(h_0(t_{(\ell )})\), i.e., \(\widehat{h}_0(t_{(\ell )})=d_{(\ell )}/\left[ \sum _{i,j \in R(t_{(k)})}\exp \left( {\textbf {x}}_{ij}^\top {\varvec{\beta }}+\log \widehat{z}^{(\ell )}_i\right) \right]\), the expression for \(Q_1\) is reduced, up to a constant that does not depend on \({\varvec{\beta }}\), to

$$\begin{aligned} Q_1({\varvec{\beta }}\mid {\varvec{\psi }}^{(k)})&=-\sum _{\ell =1}^q d_{(\ell )}\log \left( \sum _{i,j \in R(t_{(\ell )})} \exp \left( {\textbf {x}}_{ij}^\top {\varvec{\beta }}+\log \widehat{z}_i^{(k)}\right) \right) +\sum _{i=1}^m\sum _{j=1}^{n_i} \delta _{ij} {\textbf {x}}_{ij}^\top {\varvec{\beta }}. \end{aligned}$$

Note that \(Q_1(\cdot )\) has the same form of the partial log-likelihood function of the Cox model, except for the offset \(\log \widehat{z}_i^{(k)}\). For this, to update \({\varvec{\beta }}\) in the M-step we can use the Cox approach. Finally, the non-parametric estimator for \(H_0(\cdot )\) in the k-th step of the algorithm is given by

$$\begin{aligned} \widehat{H}^{(k)}_0(t)=\sum _{\ell : t_{(\ell )}\le t} \frac{d_{(\ell )}}{\sum _{i,j \in R(t_{(\ell )})}\exp \left( {\textbf {x}}_{ij}^\top \varvec{\beta }^{(k)}+\log \widehat{z}^{(k)}_i\right) }, \quad t>0. \end{aligned}$$

In summary, the EM algorithm is given by the following steps.

  • E-step: For \(i=1,...,m\), compute \(\widehat{z}_i^{(k+1)}\) and \(\widehat{z_i^2}^{(k+1)}\) using equations (12) and (13), respectively, with \({\varvec{\beta }}^{(k)}\), \(H_0(\cdot )^{(k)}\) and \(\nu ^{(k)}\) as the estimated parameters at the k-th iteration.

  • M1-step: Update \({\varvec{\beta }}^{(k+1)}\) and \(H_0^{(k+1)}(\cdot )\) by fitting a Cox regression model with offset \(\log \widehat{z}_i^{(k+1)}\) for the nonparametric case, or maximizing \(Q_1(\beta ,H_0)\) for the parametric (WEI) and semi-parametric (PE) cases.

  • M2-step: Update \(\nu ^{(k+1)}\) by maximizing \(Q_2(\nu \mid {\varvec{\psi }}^{(k)})\) in relation to \(\nu\).

Maximization around \(H_0\) refers to optimizing the parameters in \(H_0(\cdot )\): \(\rho\) and \(\lambda\) for the Weibull baseline distribution, or the vector \({\varvec{\lambda }}\) for the piecewise exponential case. The unified formulation ensures algorithmic generality. The algorithm iterates until a convergence criterion is satisfied. For instance, we consider \(||\widehat{{\varvec{\psi }}}^{(k-1)}-\widehat{{\varvec{\psi }}}^{(k)}||<\epsilon\), where \(\epsilon\) is a predefined value and \(||\cdot ||\) denotes the Euclidean norm. Initial values are derived from the ordinary Cox model, taking \(\nu ^{(0)}=0.5\). On the other hand, following the suggestion of20, we estimate the standard error of \(\widehat{\varvec{\beta }}\) and \(\widehat{\nu }\) via a profile log-likelihood function: \(\ell ({\varvec{\beta }},\nu )=\log L({\varvec{\beta }}, H_0, \nu )\), replacing \(H_0\) with its estimate \(\widehat{H}_0\). The variance-covariance matrix of \((\widehat{\varvec{\beta }},\widehat{\nu })\) is then:

$$\begin{aligned} I(\widehat{\varvec{\beta }}, \widehat{\nu })=-\frac{\partial ^2 \ell ({\varvec{\beta }},\nu )}{\partial ({\varvec{\beta }},\nu ) \partial ^\top ({\varvec{\beta }},\nu )}\Bigg |_{{\varvec{\beta }}=\widehat{\varvec{\beta }}, \nu =\widehat{\nu }}. \end{aligned}$$

Finally, more important than \(\widehat{\nu }\) is \(\widehat{\theta }:= \widehat{\gamma }^{-2}-\dfrac{\phi (\widehat{\nu })}{\Phi (\widehat{\nu })}\widehat{\gamma }^{-1}\) (the frailty variance) because allows us to compare this term with the variance of other models parameterized directly in the frailty variance. The variance of \(\widehat{\theta }\) is estimated as:

$$\begin{aligned} \widehat{Var (\widehat{\theta })} = \widehat{Var (\widehat{\nu })} \left[ \frac{\phi (\widehat{\nu })}{\Phi (\widehat{\nu })}\widehat{\gamma }^{-2} \left( 1- \frac{\phi (\widehat{\nu })}{\Phi (\widehat{\nu })}\widehat{\gamma } \right) - 2\widehat{\gamma }^{-3} \left( 1- \frac{\phi (\widehat{\nu })}{\Phi (\widehat{\nu })}\widehat{\gamma } \right) + \frac{\phi (\widehat{\nu })}{\Phi (\widehat{\nu })} \right] ^2. \end{aligned}$$

Remark 1

Note that the result in Equation (11) is also interesting if a Bayesian approach were applied to the model, because also is valid conditioning on the parameters. This facilitates, among other things, the application of an MCMC type method to simulate from the corresponding conditional distribution related to the frailties.

Computational aspects

The extrafrail21 package of R22 includes the computational implementation for the TN frailty model considering as the baseline model the Weibull, exponential and PE distributions and the non-parametric specification. For instance, to fit the Weibull case, it can be used

frailty.fit(formula, data, dist = “weibull”, dist.frail=“TN”)

whereas is usually in survival analysis with random effects in R, the formula can be defined as

Surv(time, event) \(\mathtt {\sim }\) covariates + cluster(id)

A similar syntax can be used to fit the other cases specifying dist=“exponential”, dist=“pe” or dist=“np” in the last sentence. We highlight that the function allows us to perform the estimation even for the case where the clusters have different sizes (i.e., \(n_1, n_2\),\(\ldots\),\(n_m\) are not necessarily the same).

Simulation study

In this Section, we present a simulation study to assess the performance of the maximum likelihood estimators obtained via the EM algorithm with samples of different percentages of censoring.

Recovery parameters

We consider the following three different scenarios:

  • Scenario 1: 19 clusters with 2 observations each and 19 clusters with 4 observations each, totalling 114 observations. (\(n_1=\ldots =n_{19}=2, n_{20}=\ldots =n_{38}=4\) and \(m=38\)).

  • Scenario 2: 38 clusters with 2 observations each and 38 clusters with 4 observations each, totalling 228 observations. (\(n_1=\ldots =n_{38}=2, n_{39}=\ldots =n_{76}=4\) and \(m=76\)).

  • Scenario 3: 19 clusters with 4 observations each and 19 clusters with 8 observations each, totalizing 228 observations. (\(n_1=\ldots =n_{19}=4, n_{20}=\ldots =n_{38}=8\) and \(m=38\)).

The idea is to verify if, under a certain amount of data, it is advisable to increase the number of clusters or increase the cluster observations. We consider as baseline model the PE distribution with \(L=3\) and time partition \({\varvec{a}} = (7/365, 56/365)\). Similar to the real data application, we also consider one dichotomous covariate x, which was drawn from the Bernoulli distribution with success probability 20/76. We also consider three values for \(\theta\), the variance of the frailty terms: 0.20, 0.50 and 0.75. The percentage of censoring was fixed at 10%, 25% and 50%. In all the cases, the regression coefficient was fixed as \(\beta =1.8\) and the parameters from the PE distribution were fixed as \({\varvec{\lambda }}=(\lambda _1=0.3,\lambda _2=2.6,\lambda _3=1.9)\). To simulate values from the model, we use the following steps:

  1. i)

    Draw \(z_i\sim \text{ TN }(\nu )\), \(i=1,\ldots ,m\), using the inverse transform method, i.e., do \(z_i=\Big (\Phi ^{-1} \big (u_i \Phi (\nu ) + \Phi (-\nu ) \big ) + \nu \Big )\gamma ^{-1}\), where \(u_i\sim \text{ U }(0,1)\) (the standard uniform distribution).

  2. ii)

    Draw the failure times from the conditional distribution \(y_{ij}\mid z_i\sim \text{ PE }({\varvec{\lambda }} z_i \exp (\textbf{x}_{ij}^{\top }{\varvec{\beta }}), {\varvec{a}})\).

  3. iii)

    Define the censoring times, \(c_{ij}\), as the \(100\times (1-q)\)-th quantile of the corresponding conditional distribution \(\text{ PE }({\varvec{\lambda }} z_i \exp (\textbf{x}_{ij}^{\top }{\varvec{\beta }}), {\varvec{a}})\) distribution.

  4. iv)

    Define the observed failure times and failure indicators as \(t_{ij}=\min (y_{ij},c_{ij})\) and \(\delta _{ij}=I(y_{ij}\le c_{ij})\), respectively, for \(i=1,\ldots ,m\), \(j=1,\ldots ,n_i\).

For each scenario and combination of censoring and \(\theta\), we draw 1,000 samples and compute the ML estimates. For each parameter, Tables 1 and 2 summarized the average bias (bias), the root of the estimated mean squared error (RMSE), the mean of the standard errors (SE) and the coverage probabilities (CP) of the asymptotic 95% confidence intervals.

Table 1 Estimated bias, RMSE, SE and approximated 95% coverage probabilities for the TN frailty model with basal distribution PE under different scenarios (cases censoring 10% and 25%).
Table 2 Estimated bias, RMSE, SE and approximated 95% coverage probabilities for the TN frailty model with basal distribution PE under different scenarios (case censoring 50%).

An increase in the sample size improves the precision and accuracy of the estimates. In particular, scenarios 2 and 3, which have larger sample sizes, exhibit better performance than scenario 1. In general, an increase in heterogeneity (\(\theta\)) and in the censoring percentage tends to raise the bias, standard error, and RMSE, while reducing coverage probability (CP). However, the behavior of the estimator for \(\theta\) improves under higher censoring, showing reduced bias and increased coverage, possibly due to a better identification of the random effect in the presence of censored events. The most affected estimator is \(\lambda _3\), since censored information tends to concentrate within its interval. When comparing Scenarios 2 and 3, the former yields better results. This suggests that for a fixed total sample size, increasing the number of clusters is preferable to increasing the number of observations per cluster. This leads to greater diversity in latent effects, which enhances the estimation of frailty terms.

Applications with real data sets

In this Section, we present two applications to illustrate the performance of the TN frailty model in comparison with traditional models. The first application is related to patients with Chronic Kidney Disease (CKD), while the second application is related to patients with fibrosarcoma.

Kidney data set

CKD is the slow and progressive loss of kidney function over time. The main job of these organs is to remove waste and excess water from the body. This disease may be asymptomatic for some time until the kidneys have almost stopped working, whereupon kidney disease usually subsides, diagnosed in its final stages. The final stage of CKD is called End-Stage Renal Disease (ESRD). At this stage, the kidneys can no longer sufficiently remove waste and excess fluid from the body, requiring the patient to undergo dialysis (a life-sustaining treatment) or a kidney transplant (US National Library of Medicine). Dialysis is broken down into two main modalities: hemodialysis and peritoneal dialysis. Hemodialysis consists of extracting blood from the body to direct it to a machine that eliminates waste and excess fluid; after filtration, it is reintroduced into the bloodstream. Peritoneal dialysis, for its part, is a simpler process and can be done on an outpatient basis. Liquid is inserted into the peritoneal cavity through a catheter located in the stomach. This solution absorbs waste and excess fluid and is later extracted. The solution is removed through the same channel.

CKD represents one of the most important non-communicable diseases worldwide23. For many patients, dialysis is the focal point around which their lives revolve, not only because of the time spent travelling to and from the sessions in specialized centres and the time dedicated to the dialysis treatment itself but also due to the diet that accompanies it, fluid restrictions and medication load24. Thus, one of the most advantageous options, considering quality of life, is treatment by ambulatory peritoneal dialysis (with a portable machine). The peritoneal catheter is a foreign body that facilitates the appearance of infections and serves as a reservoir for bacteria. Infection can appear both in the exit orifice and the tunnel (tunnelled path of the catheter) or the peritoneum (peritonitis). Peritonitis continues to be an important complication of PD, as it contributes to technique failure, hospitalization, and even death25.

We focus on a real dataset named kidney, available in the R22 package frailtyHL26. For further details, see page 11 of its documentation: https://cran.r-project.org/web/packages/frailtyHL/frailtyHL.pdf. The study collected bivariate times, consisting of the times of first and second recurrence of infection at the catheter insertion point in patients with kidney problems using a portable dialysis machine. The catheter is later removed if infection occurs and can be removed for other reasons, in which case the observation is censored. Available covariates are sex and type of kidney disease: Glomerulonephritis (GN), acute nephritis (AN), Polycystic kidney disease (PKD) and others. Previous analysis suggests that only sex is significant in this context27. The study has 38 patients, 10 men and 28 women, each person has 2 times of recurrence of the infection, so there are a total of 76 observations. A summary of such times is presented in Table 3 and Figure 3 presents the Kaplan-Meier (KM) estimator by both times and by sex.

Fig. 3
figure 3

KM estimator for kidney K-M for kidney dataset considering recurrence time (left panel) and sex (right panel).

Table 3 Summary of the first and second time of recurrence (TR\(_1\) and TR\(_2\)).

For comparison purposes, we also consider the GA, WL and IG frailty models with baseline distribution WE and PE. Figure 4 shows the cumulative hazard function for the kidney data. The proposed partition for the PE model was set at 1 and 8 weeks (indicated by the vertical segments in the graph). A change in the slope behavior is evident, as highlighted in the zoomed-in view on the right. This supports the conclusion that the PE model provides a better fit than a non-segmented model for this dataset. Practically, this suggests that the risk of infection at the catheter insertion site is highest during the first week post-insertion and gradually decreases over time. After two months, the risk stabilizes and remains relatively low. This understanding can help healthcare professionals in identifying critical time periods for infection prevention and monitoring patients accordingly.

Table 4 shows the Akaike information criterion (AIC)28 and the Bayesian information criterion (BIC)29 for such models. According to the AIC and BIC criteria, it is suggested that the baseline PE model is more appropriate for this data than the WE model, independent of the frailty model used. However, the TN frailty model provides better results. Table 5 presents the estimates for all the models considering the PE baseline distribution, including the ordinary PE model (i.e., without frailty).

Fig. 4
figure 4

Cumulative hazard function for kidney dataset considering all the axis time (left panel) and zoom for the 100 first days (right panel).

Table 4 Maximized log-likelihood function (log-Like), AIC and BIC of the TN, GA, WL and IG models for kidney dataset.
Table 5 Estimates, standard errors (in parenthesis) and Kendall’s \(\tau\) for the TN, GA, WL and IG frailty model with baseline PE and the ordinary PE model for kidney dataset.

Note that the effect of not considering the dependence among the clusters is the underestimation of the effect for sex. On the other hand, the estimated Kendall’s \(\tau\) for the different models is around 0.13. However, the estimated frailty variance for GA, WL and IG is overestimated by at least 50% concerning the frailty TN model. In practical terms, this means that the TN frailty model estimates a greater effect of sex on the recurrence of infection at the catheter insertion point and less variability between the measures associated with the same individual.

Fibrosarcoma data set

Fibrosarcoma is a rare malignant tumor that originates from fibroblasts, the connective tissue cells responsible for the production of collagen and extracellular matrix. This neoplasm exhibits infiltrative growth, a high propensity for local recurrence, and metastatic potential. It can develop in any part of the body, although it is most commonly found in the extremities, trunk, and retroperitoneal region. Clinically, it typically presents as a progressively enlarging mass, initially painless. Diagnosis is based on histopathological findings, where tumor cells are arranged in a characteristic herringbone pattern, and is often supported by immunohistochemical studies to differentiate it from other soft tissue tumors30. The treatment of choice is surgical excision with wide margins, and adjuvant radiotherapy is frequently considered; chemotherapy is generally reserved for advanced or metastatic cases31.

This dataset includes information from 251 patients diagnosed with fibrosarcoma SOE (from the portugues “sem outra especificacão, meaning “not otherwise specified”) with diagnosis dates ranging from 2000 to 2022, and follow-up data extending through December 2022. The dataset was obtained from the Oncocenter Foundation of São Paulo, Brazil (Fundacão Oncocentro de São Paulo, FOSP), which oversees the Hospital Cancer Registry of the State of São Paulo (http://fosp.saude.sp.gov.br). This neoplasm is coded as 8810/3 Fibrosarcoma, NOS (not otherwise specified), according to the International Classification of Diseases for Oncology (ICD-O32), which is used in cancer registries to classify tumors that lack further histological subtyping at the time of diagnosis.

Cancer-specific death was defined as the event of interest, and time-to-event was measured from the date of diagnosis to the patient’s death (in years: mean\(=5.72\), standard deviation (SD)\(=5.78\), median\(=3.12\), range\(= 0.025-21.86\)). During the follow-up period, a total of 103 events (39%) occurred. As covariates we use the type of treatment, with eight possible labels: A - surgery (84 patients, 32.2%), B - Radiotherapy (14 patients, 5.4%), C - Chemotherapy (18 patients, 6.9%), D - Surgery \(+\) Radiotherapy (42 patients, 16.1%), E - Surgery \(+\) Chemotherapy (34 patients, 13.0%), F - Radiotherapy + Chemotherapy (11 patients, 4.2%), G - Surgery \(+\) Radiotherapy \(+\) Chemotherapy (29 patients, 11.1%) and I - other combination (29 patients, 11.1%). Figure 5 presents the KM estimator by both times and type of treatment. The clusters considered in this analysis correspond to the 26 clinical areas responsible for treating the patients, which are summarized in Table 6. Note that these clusters are highly unbalanced in terms of sample size. In this analysis, we consider the TN, GA, WL, and IG frailty models, using the Weibull distribution for the baseline hazard. The results are summarized in Table 7. Notably, the TN frailty model provides the lowest AIC among the models considered. Once again, the Kendall’s \(\tau\) values provided by the models are similar. However, the estimated intra-cluster variance (0.226) is lower for the TN model compared to the others. Finally, Figure 6 shows the survival functions (SF) for patients treated in neurology and clinical oncology centers, as well as the marginal SF (i.e., the SF for a patient randomly selected from the entire cohort).

Table 6 Number of records per medical specialty (cluster size in parenthesis).
Fig. 5
figure 5

Kaplan-Meier estimator for the fibrosarcoma data with 95% confidence interval (left panel) and stratified by treatment received (right panel).

Table 7 Parameter estimates, standard errors (in parentheses), and Kendall’s \(\tau\) for TN, GA, WL, and IG frailty models assuming a Weibull baseline hazard.
Fig. 6
figure 6

SF for patients treated with chemotherapy (left panel) and surgery + radiotherapy (right panel), in medical specialty neurology and clinical oncology, as well as the marginal SF.

Concluding remarks

A new survival model with TN frailty was proposed and studied in detail. This model can lead to a complex structure for the data, because allows to modelling of univariate and multivariate data, being adaptable even for groups of different sizes. For the baseline risk, the Weibull, and PE distributions were adopted as well as a non-parametric approach. For a fixed variance for the frailty, the TN frailty model provides a greater Kendall’s than the gamma and IG frailty models. We get a recursive closed-form expression for the derivatives of the Laplace transform for the TN model. Furthermore, the conditional distributions of frailties among the survivors and the frailty of individuals dying at time t were determined explicitly. The simulation studies, based on the EM algorithm, conclude that having more complete information relative to the censored information improves the accuracy and precision of the estimate. Scenarios 2 and 3 did not have a large difference in bias, this suggests that the bias depends on the sample size, not on the data configuration. On the other hand, concerning the RMSE and SE, Scenario 2 showed an improvement in precision for Scenario 3. This suggests that increasing the information in the clusters increases the precision compared to having clusters with little information but more numerous. We fitted the proposed frailty model to a real dataset on times to the first and second recurrence of infection at the catheter insertion point in patients with kidney problems using a portable dialysis machine to show the potential of using the new frailty model. This application demonstrates the practical relevance of the new regression model. In particular, the estimated frailty variance for GA, WL and IG is overestimated in the frailty TN model.