If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Recurrent event data analysis is most commonly used in biomedical research. However, the researchers dealing with recurrent events in survival analysis have ignored the assumption that the recurrent events are correlated. There are methods available that takes into account dependency between recurrent events. The main objective of this study was to demonstrate the recurrent event models using upper respiratory infection (URI) among Indian infants.
Methods
The frequency of URI among a birth cohort of 210 babies was evaluated monthly with nasopharyngeal swabbing. Data on 11 potential risk factors were noted. The extended Cox models, such as Andersen-Gill counting process (CP), Prentice-Williams-Peterson (PWP-CP), PWP–Gap time model, Marginal Model and Cox frailty model were applied. The better model was assessed based on Loglikehood test statistics.
Results
Of the four models, PWP-CP model had lower log likelihood value. The URI incidence rate was estimated to be 2.27 (95%CI: 1.70–3.03)times significantly higher in the month of July–October and 1.43 (95%CI: 1.19–1.71) times higher in the month of November–February as compared to March–June (p < 0.001). Nasopharyngeal colonization with S. pneumoniae was also another important risk factor (HR = 1.18, 95%CI:1.01–1.39, p = 0.03).
Conclusion
In the current study, PWP-CP model was found to be better model as compared to other models. Also biologically appropriate as subsequent events depend on the first event of URI. Hence, the choice of an appropriate method for analyzing the recurrent event data should not be decided only on statistical basis but also based on the research question.
Cox Proportional Hazard Model has been widely used by most researchers in the recent past due to its versatility and simplicity in nature for interpreting the results. This model is used in recurrent events such as repeated asthma attacks, episodes of upper respiratory infections, repeated myocardial infarctions, recurrent urinary tract infection among the renal transplant patients, etc. are very common in medical research. However, the researchers dealing with recurrent events in survival analysis have ignored the assumption that the recurrent events are correlated.
In such situation either they have used the latest event and the time related to that event as outcome or, they have assumed the recurrent events are independent and analysed data using survival analysis.
If the correlations between the recurrent events are ignored, then the null hypothesis is mostly rejected, because the Cox model does not incorporate within subject correlation. However, methods have been developed that make use of all available data, while accounting for the lack of independence of recurrent events within subjects. The two popular approaches are namely, “Variance-corrected Cox based models” and “Frailty/random effects” models.
Modeling repeated time-to-event health conditions with discontinuous risk intervals: an example of a longitudinal study of functional disability among older persons.
Variance-corrected models were developed to account for correlation by using robust (sandwich) standard errors. However, the theory behind frailty models is that some subjects are intrinsically more or less prone to experience events of interest than others; frailty can be considered as a random covariate in the model that corrects dependence among recurrent event times. Limitations of applying these variance-corrected models and frailty models include their complexity and difficulty in implementation. Generalised Estimating Equation (GEE) analysis is a variance corrected model that requires the specification of a working correlation matrix but are still inefficient as the information on time to event is ignored. Most statistical models have been developed for analysing the recurrent event data which largely focused on binary outcome having discontinuous risk intervals using GEE.
Modeling repeated time-to-event health conditions with discontinuous risk intervals: an example of a longitudinal study of functional disability among older persons.
Secondly, longitudinal data are analyzed using binary logistic regression ignoring the time, which is inappropriate. We intend to formulate a model which will consider the actual time of recurrence of each outcome. Though these concepts have been disseminated in statistical journals this is seldom practised in developing countries. Hence the aim of this paper is to summarize various methods for modelling recurrent event data. We would also show the differences in estimation and interpretation of recurrent event approaches, as well as to sensitize appropriate models, based on research objectives for the longitudinal study.
2. Materials and methods
2.1 Data
This study was conducted in K.V Kuppam rural development block, which belongs to the service area of RUHSA (Rural Unit for Health and Social Affairs) Christian Medical College, Vellore, India between February 2009 to August 2010. After taking an informed consent, a detailed socio-demographic history was obtained. Patient information was obtained from their parents who were interviewed at each visit regarding recurrent colds, allergic symptoms, overcrowding, family size, breast feeding, smoke exposure and day care attendance. At birth and at monthly scheduled visits, nasopharyngeal swabbing was performed with a calcium alginate swab stick. Then, the presence of upper respiratory infection was noted.
The standard Cox proportional hazard model for the survival data specifies the hazard of ith individual as,
λi(t) = λ0(t) exp{βxi}
(1)
Where (t) is an unspecified baseline hazard function and β is the vector of regression coefficients, xi is the vector of covariates of the ith subject.
2.3 Extended Cox model
The extended Cox models were used to model recurrent time-to-event outcomes within a subject comprehensively than the Cox model. The extended Cox models were: 1) the Andersen-Gill counting process (CP), 2) the Prentice-Williams-Peterson (PWP-CP/Total time),3) PWP – Gap time (PWP-GT) model, 4) Marginal (Wei, Lin and Weissfeld)Model and 5) Cox frailty model.
2.4 Andersen Gill model
Andersen Gill model assumes that the correlation between event times for a subject can be explained by the past events. AG model is suitable model when correlations among events for each individual are induced by measured covariates. The counting process style of data input is seen in AG model where each subject is represented as series of observation with recurrence time given as (t0, t1], (t1, t2] … (tm, last follow-up time] where, each recurrent event for the ith subject is assumed to follow a proportional hazard model is given as
λi(t) = λ0(t)exp{βk xi(t)}
(2)
Under this model, the risk of recurrent event for a subject follows the usual Cox PH assumption but the number of recurrence is not taken into account. Every subject risk intervals contribute to the risk set for every event, irrespective of the number of events for each individual (Fig. 1b).
PWP CP model (total time) and PWP Gap time model. PWP CP model is similar to the AG-CP model but stratified by events. The baseline hazards vary from event to event, the hazard function for the kth event for the ith subject with the PH form is written as
λik(t) = λ0(t)exp{βk xi(t)}
(3)
The PWP - GT model describes an intensity process from the occurrence of an immediately preceding event, with the gap time defined as (). Both PWP approaches are conditional models as an individual is not considered in the riskset for the kth event until experiencing the (k−1)th event.
λik(t) = λ0(t-tk-1)exp{βk xi(t)}
(4)
λ0k(t) represents the event-specific baseline hazard for the kth event over time. AG model and both PWP Models are adjusted by estimating the sandwich type estimators and hence they are known as variance corrected models
(Fig. 1c). The hazard function for the kth event for the ith individual is,
λik(t) = λ0(t)exp{βk xi(t)}
(5)
Unlike the AG model, this model allows a separate underlying hazard for each event. When an event is zero, it means that subject is no longer at risk after the last given event.
The term ‘frailty’ means that each subject has his/her own disposition to failure, in additional to any effects that will be quantified using regression. Hazard function (t) for the recurrent time of the kth event in the ith subject (j = 1,2,…ki; i = 1,2,…n) conditional on the frailty Zi, follows the PH form and its given by:
(6)
Where, λ0k(t) is the common baseline hazard function, Xi is a vector of observable covariates and β is a vector of unknown regression coefficients. Frailty Zi is the unobserved (random) common risk factors shared by all subjects in cluster ‘i’ and is assumed to be i.i.d random variable with unit mean and unknown variance θ.
The Frailty effects occur when the observed sources of variation in the observed or unobserved explanatory variables fail to account for the true difference in risk. That is, when there are other important but omitted variables presented, the effect of omitted variable can be captured by frailty.
3. Results
Number of recurrence experienced by infants ranged between zero to ten during the follow-up period. Seventeen infants (8.1%) out of 210 infants did not return to the study area after birth. The upper respiratory infection recurred at least once in 193 subjects and highest recurrence events (9 and 10 times) were observed in 7 patients.
Table 1 shows a summary of follow-up times and number of patients with and without URI event for the consecutive recurrent events. The median follow-up time to the first URI event were 98 days and starts increasing for the higher consecutive recurrent event.
Table 1Summary of time between consecutive URI recurrent Events.
A total of 163 infants (77.6%) had 6–13 visits where as 30 infants (14.3%) made <5 visits. The median number of visits for these 193 infants was 9 visits. In thousand days of life, 845 records from 747 upper respiratory patients were followed–up during the study period and three infants died during the period of the study. The socio-demographic data were obtained at birth and child characteristics were presented in Table 2a and Table 2b. More parents resided in tiled/pucca houses than thatched houses (66.7%), and were labourers/unemployed. The majority of parents (61.9%) had at least high school education (84.8% fathers and 86.7% mothers). Majority of household had (98.1%) < 3 under-five children, 76.9% of the households had more than 4 family members.
Table 2aSocio demographic and baseline characteristics.
Fig. 1a: We have considered 10 children to demonstrate the risk intervals using URI data. Among those 10 children, seven had at least three events. Remaining three of the children was censored at 365 days; Subject 4 and 6 had largest number of events (6 events). Subject 3 had only two event at times 154 and 336 days.
Variance corrected models and Frailty model results are presented in Table 3, the ‘seasonal’ variable was the only consistently significant risk factor for an URI. Cox PH model had lower log-likelihood values. However, these lower likelihood values do not represent ‘good fit’ because it does not consider the subsequent events within each child.
Table 3Risk factors for upper respiratory infection (URI) recurrent event data using Variance-Corrected model and frailty model in the first year of life.
Children were most predisposed to upper respiratory infection in the July–October months and November–February months, which is statistically significant with slight differences in their parameter estimates in all the models. Using AG model, the upper respiratory infection was estimated to be 2.27 (95%CI: 1.70–3.03) significantly higher in July–October months and 1.43 (95%CI: 1.19–1.71) in November–February months as compared to March–June Months (p < 0.001). The PWP gap time model showed HR of 2.22 (95%CI: 1.66–3.00) for the July–October months and 1.37 (95%CI: 1.11–1.69) for November–February months respectively compared to March–June Months, which is statistically significant (p < 0.01). The WLW model for total time up to the 10th URI recurrence since the study entry yielded a HR of 1.58 (95%CI: 1.05–2.37, P value = 0.027) in July–October months and 2.50 (95%CI: 1.87–3.32) in November–February months as compared to March–June Months (p < 0.001). Nasopharyngeal colonization with S. pneumoniae was another important risk factor which was significant in all recurrent event models except PWP Gap time model. (AG model: HR = 1.23, 95% CI = 1.07–1.42, p value = 0.003; PWP total time model: HR = 1.18, 95% CI: 1.01–1.39, p value = 0.03; Marginal model: HR = 1.49, 95% CI: 1.14–1.95, p value = 0.003). Birth weight of an infant <2.5 kgs had a risk of 1.14 (95% CI: 1.04–1.27) times of URI infection as compared to normal birth weight infant (Table 3).
3.2 AG model and frailty model comparison
The parameter estimates obtained from the frailty model with counting process time scale and AG models were almost same without frailty term. In other words, when the frailty model, with a variance almost close to zero (θ = 0) would indicate that the frailty component does not contribute to the model. Based on the AG model, children with nasopharyngeal colonization with S. pneumoniae positive had high risk of recurring URI, which was 23% higher as compared to children without nasopharyngeal colonization by S. pneumoniae.Children were most susceptible to URI in July–October months (HR: 1.43, 95%CI: 1.70–3.03) and November–February months (HR: 2.27, 95%CI: 1.19–1.71) as compared to March–June months, which is statistically significant (p < 0.001). The cumulative hazard plot in Fig. 2a showed that both AG model and frailty model have estimated same cumulative hazard in the study and it clearly showed that if frailty variance is not significant. The frailty variance θ was estimated to be 0 and 0.153 for counting process time scale and gap respectively. Fig. 2b shows that both models have the different estimated cumulative hazard over a time. Recurrent event data structure and how to organize the data for each recurrent event models, R code is given in the Appendix.
Fig. 2Cumulative hazard plot for upper respiratory infection recurrence over a time of follow-up for AG model and frailty models.
Upper respiratory infections are the most common cause of morbidity in the first year of life among the Indian infants. We described the relevant methods, its importance, how to fit and interpret the results for different methods. Cox PH model does not examine the effect of the risk factors on the number of recurrence over the follow up time.
Repeated occurrence of basal cell carcinoma of the skin and multifailure survival analysis: follow-up data from the Nambour Skin Cancer Prevention Trial.
Many researchers continuously used logistic regression, GEE, Poisson and negative binomial approaches for estimating the risk factors for recurrent events.
Modeling repeated time-to-event health conditions with discontinuous risk intervals: an example of a longitudinal study of functional disability among older persons.
Repeated occurrence of basal cell carcinoma of the skin and multifailure survival analysis: follow-up data from the Nambour Skin Cancer Prevention Trial.
However, they considered the total number of events per fixed period of time interval, ignoring the actual time to event concept between repeated occurrences.
Repeated occurrence of basal cell carcinoma of the skin and multifailure survival analysis: follow-up data from the Nambour Skin Cancer Prevention Trial.
Observations from the same child are expected to be correlated and hence Cox PH model is not suitable method to account of the extra variability of the recurrent event data. So, variance corrected models and frailty model were used.
Several methods have been proposed to account for intra-individual correlation that rises from recurrent events setting in survival analysis. The biological reason for the infection/disease is essential when selecting a model for the recurrent events for example if it is possible that after experiencing the first URI infection, the risk to the next infections may increase. If AG model is reasonable to assume that the risk of the repeated infections remains constant, irrespective of the number of previous infections, then the AG model is recommended.
AG Model provides more powerful inference for a covariate effect than the Cox model. A robust sandwich estimate is used to evaluate the standard errors.
The PWP models are reasonable to assume that the child can only be at risk for a given event after he/she experiences the previous event. The PWP model means that the underlying hazard function is assumed to be the same from event to event. Also that, when infections increase with subsequent recurrence, the PWP model may be more appropriate than the AG model.
The WLW model overestimates the covariate effects due to the fact that every child has as many records as the maximum number of the event occurred in our data.
Random effect/Frailty models leads to a person specific interpretation of the estimates which is similar to that of mixed models in longitudinal data in order to account for the dependency between the recurrent event and unobserved heterogeneity among patients properly as this cannot be explained by covariates alone.
Frailty model gives consistent estimates based on the distribution of the number of events and sample size. A small frailty variance implies very low correlation between the event times.
In this study the within subject correlation is very low (0.07). However, this need not necessarily be the case always, when dealing with data. Therefore use of the entire model is based on the concept of the problems.
There is strong evidence in the literature that if frailty is present but ignored, then the covariate effects will be underestimated.
If the common baseline hazard between each event time is not appropriate for repeated event data and also, when a robust variance to any of these models does not adequately account for within-subject correlation, then it has been suggested to apply the frailty model which is also a similar finding from the present study.
The difference between WLW model and frailty models is driven from the fact that, the frailty model is more naturally related to the fundamental performance of recurrences, while the unconditional WLW model does not provide understanding of the interrelationship among recurrences. However, it has been suggested that, if the frailty distribution is correctly defined, then the frailty model is expected to be more efficient than the WLW model.
In summary, the choice of appropriate model for analyzing recurrent event data will be influenced by many factors, such as number of events, relationship among subsequence events, within subject correlation and varying covariates and the sample size. AG model is appropriate, when the assumption of a common underlying hazard over recurrent event observation is reasonable and when only interested in the overall rate of recurrences. When the dependence from the past event is strong and consistent, then the PWP model is appropriate. However, when the distribution of event per subject is small or prediction of time to the next event is of interest, the PWP gap time model is the appropriate method. However, for the present study, each episode of URI within a child is biologically related though the estimated correlation was very small. This was because the risk of infection to the same sero group/sero type was less in subsequent events within a child; thereby the estimated correlation coefficient was close to 0. However, if the researcher was interested to study the measurement of the dependence between recurrent event times within the subject, the frailty model would be appropriate.
5. Conclusion
The present study finding suggests that the choice of an appropriate method for analyzing the recurrent event data should not mainly depend on statistical basis such as model with low likelihood values; rather the selection should also be based on the research question, a thorough clinical knowledge on the events of interest followed by organization of the data. Thus the PWP-CP model fit the data appropriately while the biological process also suggested the same model.
Conflicts of interest
None.
Appendix. Data structures for modelling recurrent event data
Organisation of the dataset is a more complication than the usual discontinuous risk intervals. Each subject is represented by several rows of data dependent on number of events child had, with time organized as intervals that represent (entry time, 1st event], (1st event, 2nd event],……, (kth event, end time]. A key difference for fitting recurrent event models is the creation of appropriate datasets. To show up the important features of data structure we present some information about the datasets that we used in the present paper.
Let's consider the example of first five children details from the Upper Respiratory Infection data are presented below:
A pair of variable (start, stop) is used to define the time interval of the URI. The start time is generally equal to 0 for the 1st URI and equals to the last recurrence time for further URI. The stop time is a recurrent URI time (URI status = 1) or censored time (URI-status = 0). The study ID variable identifies the child's. 1st child study ID = 1 from 226 to 282, 310 and 338 – with start time equal to 0 and stop time equal to follow up time, while child have four rows (study ID = 1 and 5). Child with no censoring in the end of the follow-up but the child five have 3 event and end of the visit he/she became censoring. Child study ID 4 had an event time at 154 days and second event time at 336 days. For five child data corresponding covariates are presented in the following column in the above table. This structure of the data can used to fit AG model and Frailty Model. In the PWP total time model, Gap time model addition information we added as URI counts based on the number of URI occurrence in the study duration, which is going to be used for stratification. The PWP Gap time model and frailty gap time model the time is defined as stop minus start time and the data structure as same.
Marginal approach focuses on total survival time from study entry until the occurrence of a specific (e.g., Kth) event. Suggested when recurrent events are viewed to be of different types also. Each subject is considered to be at risk for all failures that might occur, regardless of no: of events a subject actually experienced. For example in our study, every child to be at risk as the maximum number of recurrent events occurred in the study (k = 10) event if a child has one recurrence. i.e, every child has 10 observations, one in each stratum. In this data the URI event indicator, which is going to be used for stratification. Strata will correspond to the number of URI. Risk set determined from time since study entry. Marginal model is stratified model. The below data structure can be used to fit the marginal models.
Stratification Models: For the below models are stratified Models, the argument strata (URI_Count) identifies stratification variable to obtain their estimates. Estimates are obtained for event-specific effects for each covariates.
Frailty Model: By default gamma distribution is associated to the random effect for the frailty model in R software. However, we can specify the distributions such as gamma and Gaussian. Other way frailty.gamma (Study_ID) and frailty.gaussian (Study_ID) instead of frailty (id,dist = ”gamma”)
Modeling repeated time-to-event health conditions with discontinuous risk intervals: an example of a longitudinal study of functional disability among older persons.
Repeated occurrence of basal cell carcinoma of the skin and multifailure survival analysis: follow-up data from the Nambour Skin Cancer Prevention Trial.