If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Population-based cancer registry (PBCR) plays a significant role in burden of cancer estimation. Estimating the completeness of cancer registry is one of the criteria to assess the quality of data (results). Also, Capture-recapture is one of the methods used to assess completeness of cancer registry.
This study was aimed to estimate the completeness of lung and bronchial cancer registry using three-source capture-recapture method in Khuzestan province.
Materials and methods
This study was carried out using three-source capture-recapture method and data were obtained from three sources including medical records, death certificates and pathology labs (reports) in Khuzestan. In this study, total new cases of lung and bronchial cancer registered by the three sources in 2011, were enrolled. Among the sources, the common cases were identified and the completeness rate of lung and bronchial cancer registry was calculated by log-linear method using R software, finally.
Totally, 426 new cases for lung cancer had reported from the three sources in 2011. The completeness index of lung cancer registry was totally estimated 72%, as well as it was independently 40%, 25% and 37.6% for the medical records, pathology reports and death certificates sources, respectively.
According to the results, after collection and remove repetition of the cases related to lung cancer, it should be added their number registered by the pathology (labs), medical records and death certificates sources by 28%–40%, to estimate actual incidence of lung cancer. Also, the actual incidence of lung cancer cases related to each of medical records, pathology and death certificates was calculated by adding the reported values by 60%, 75% and 63%, respectively.
The incidence of non-communicable diseases is accounted for as one of the main challenges for health and development in the 21st century. Globally, the mortality rate due to non-communicable diseases was over 38 million (68% out of total mortality), in 2012.
The reasons for this underestimation may be lack of ease availability to lung lesions, equipment and facilities for diagnosis and sampling in some region of the country. Also, another reason is the screen of other cancers and the ease of their clinical diagnosis which causes abnormal less proportion of lung cancer cases than other cancers. In 2012, the World Health Organization (WHO) has highlighted the cancer prevention and improvement of the life quality of cancer patients. To reach the aim, the compilation of national plans to prevent and control the cancers is one of the health necessities as well as the most effective factor to reduce the incidence and burden of cancer in each community.
One of the problems to compile such programs is lack of adequate data on the cancer behavior and its incidence trend over various decades and geographies, especially in developing countries such as Iran. To design cancer registry systems, extensive efforts have been made in various countries including Iran, in recent years.
The aim for population-based cancer registry is estimation of the cancer burden in the covered population in order to monitor its trend and regional variations, as well as to supply a database to perform epidemiological researches.
The assessment of data quality for cancer registry is included two parts, the completeness estimation and the possibility to compare the data, its reliability and timeliness. The registry completeness index is defined as the ratio of incidence cases which has been recorded.
The Methods used for completeness estimation are included two groups of quantitative and qualitative methods. Qualitative methods are included date-based data methods, the ratio of mortality to incidence, the number of sources and their respective reports, and diagnostic tissue effect method; as well as quantitative methods are included independent case comparison (ascertainment), capture-recapture method, and death certificates method and follow up method.
The capture-recapture studies can be carried out directly and indirectly. In the direct method, a case is selected, marked and afterwards re-captured. In indirect method, the current data such as data registry lists or surveys is used. A viewed case is considered as a capture and then a total estimation of the whole population is made using estimators.
Generally, the capture-recapture methods are used in epidemiology in three parts: estimating the number of population, estimating the incidence of a condition or disease based on performed surveys and evaluation of the completeness of data registry systems.
There are four basic presumption in capture-recapture method: 1) the population be closed, 2) finding common cases between two or more lists be possible, and 3) the lists be independent to each other (it means that the presence of a person in a list should not cause to increase or decrease his presence possibility in other list), 4) the presence possibility of individuals in the lists be independent to their characteristics.
The current study aimed to estimate the completeness of lung and bronchial cancer registry in Khuzestan province by compare three sources for data registry (including pathology reports, death certificates and medical records) using capture-recapture method. Due to the difference between the incidence rate of lung cancer in Iran and the world, as well as considering the higher incidence of lung and bronchial cancer in Khuzestan province than other provinces (following the Markazi province), the lung cancer was selected to study in this research.
2. Materials and Methods
2.1 Area of study
The study was carried out by three-source capture-recapture method using data obtained from three sources including medical reports, death certificates and the pathology reports of respective labs in Khuzestan province, southwest of Iran. Total new cases of lung and bronchial cancer registered by the three sources in 2011, were enrolled in the study. Khuzestan is bounded to the west by Iraq and to the south by the Persian Gulf. Khuzestan province has an area of 63,238 km2. Ahvaz is center of this province and located in the Middle East, between 48° and 29′ east of the Greenwich meridian, 31°and 45′ minutes north of the equator.
Location of the Khuzestan province in southwest of Iran is presented in Fig. 1.
First, the data from each source were separately entered into Excel (Office 2010), and then repeated cases of each source were identified and deleted. To find common cases between these three lists, the patient data related to all of the sources were merged (each source was marked by a certain color). The four characteristics of the patients including the name, surname, father's name and their residence city were considered. Common cases were identified by sorting the patients in terms of different characteristics (name, surname, father's name and residence) using the Sort order as well as compare each case with others which were registered in another source.
In this study, it was used a new methodology such a way that the death data related to the year of study (2011) were compared to the previous years-at least five years-as well as to the pathological and medical sources. Although, some of the death data in 2011 are common with the medical and pathological data from same year, but in some cases of death, the incidence of disease may be related to previous years. So in this study, death data were compared with the pathology and medical data from previous years and then if the names of these persons were found, these common names were excluded from the list of death data related to the year of study. Also, considering the survival rate of lung cancer, patients who have been registered in the pathology or medical lists in the year of study (and so they are not present in the source of death list of that year) may be die in the next years and be recorded in the death list of the future years. Therefore, their presence were also compared with death data of next years after the year of the study-at least five years- and if there were (common) their names in the future lists, they were added in to the death list of the study year. If the mentioned methodology is not followed, it will lead to an overestimation in the size of population.
Afterwards, a cross table (contingency table) with 8 cell (23 = 8) was drawn. 7 of these cells were defined (including the number of individuals presented in each source and the number of common individuals among the sources) and one cell (including the number of individuals who were not present in any of the three sources) was not marked. In Venn Figure (Fig. 2), the number of cases presents in each resource has been shown as well as how the resources overlaps.
Data analysis was performed using log-linear model in Stata 13 and R software. 8 various log-linear models were fitted on these seven cells in the cross table to estimate the abundance of the 8th cell (individuals which not registered in any of the three sources) in the table. Then expected abundance of 8th cell was estimated using each of these models.
Usually, the most fitness model using AIC, is the saturated model-a model with the degrees of freedom by zero and the interactions by K-1. But, in this model the population size has an overestimation and gives more extensive reliability. In the cases where sample size is small, the two other criteria especially DIC represent better results.
were used to select a better fitness model. Finally, according to AIC- which has been mostly used by researchers-a model including the interactions between the medical and death sources and an independent effect of pathology source was selected.
In this study, log-linear method was used using three sources to control bias due to lack of sources independency. The benefit of this method is that by mediating the interactions (between different sources) it can be consider the effect of the dependency (positive or negative) between the resources in estimations as well as can be greatly overcome on the bias caused by lack of the presumption of sources independency.
The cases of three sources were compared and the common cases between them deleted (the common cases considered once) and then 426 new cases of lung cancer were totally registered in 2011 from three sources. The medical, pathology and death certificates sources were registered 238, 148 and 222 cases, respectively. Among them, there were 280 men (65.7%) and 146 women (34.3%) with the men to women ratio of 1.9. The most abundant age group was 65–80 years-old with 35% followed by 45–64 years-old with 32% of total cases.
Table 1 represents the demographic characteristics of patients registered by the three sources of cancer registry.
Table 1Comparison of demographic data of patients obtained from three sources of lung cancer registry in 2011.
The three-source analysis was performed using the Log-linear model and its results are given in Table 2. Among the 8 fitted models based on AIC, the third model including an interaction between the medical and death sources as well as an independent effect of pathology source with the highest fitness, was selected.
Table 2Log-linear models fitted to the data obtained from three sources of lung and bronchial cancer registry in 2011.
Estimated number of lung cancer cases registered in none of the sources
Estimated total number of lung cancer cases
95% confidence interval
h: medical source, p: pathology source, d: death source, Infofit: represents existence an error in model.
According to this model, the number of lung cancer cases which was registered in none of these sources was estimated to be 163. Consequently, the total number of lung cancer cases in 2011 was estimated to be 589 (with a confidence interval of 667-529).
Accordingly, the total completeness index for lung cancer registry after delete repeated cases was 72% for all three sources and it was separately for each medical (with 238 cases), pathology (with 148 cases) and death certificates (with 222 cases) source was 40%, 25% and 37.6%, respectively. The incidence rate which was reported by three sources of medical, pathology, and death records following the deletion of repeated cases, was 9.4 of 100000, in 2011. Also, using log-linear model, the incidence rate was estimated to be 13 of 100000. The incidence rates in age subgroups and the completeness of cancer registry were separately calculated based on gender and age subgroups (Table 3).
Table 3Estimation of lung cancer cases based on gender and age groups in 2011.
The highest underestimation was occurring in the age group of older than 80.
In this study, the completeness of lung cancer registry was estimated using three capture-recapture method and log-linear models. The completeness of lung cancer registry in Khuzestan province was estimated 72% using data obtained from three sources of medical records, pathology reports and death certificates following deletion of repeated cases.
This estimation is not consistent with Khodadost and colleagues study in 2011 which had reported the completeness of lung cancer registry by 24.22% in Ardabil province in 2006 and 2008 using capture-recapture method.
The study of Dortaj and colleagues (2011), which was conducted on the data obtained from 2004 to 2006 years in Fars province using three-source method, has been reported the completeness of lung cancer registry by 50% that is closer to the results of current study.
Sharifian and colleagues (2009) in Shiraz using two-source method found that the completeness of lung cancer registry by Chapman method was 70.4% and by Chao method was 59.9% which these results are consistent with the present study.
The completeness of lung cancer registry which reported in the present study is lower than other countries as well as almost similar to them in some cases. For example, in a study entitled "Completeness of cases registry in the national program of cancer registry in Ireland" was published by Brin and colleagues in 2014, the completeness of lung cancer registry had reported 98.7%.
In a study which has been carried out by Crowsti and colleagues in Tuscany (Italian) in 2001 using the two-source capture-recapture method (in order to estimate completeness of cancer registry of Tuscany), completeness of lung cancer registry was estimated to be 96.6%.
In the study entitled “estimation of non-registered cases in cancer registry program using three-source capture-recapture method”, the completeness of lung cancer registry was reported 74% in Virginia, United State in 2004.
In the present study, the highest completeness of lung cancer registry (40%) was related to the medical source, but in the Khodadost's study, the death source had the highest completeness of registry by 55.12%.
The incidence rate which was reported based on the three sources of medical, pathology, and death records following the deletion of repeated cases, was 9.4 of 100000 in 2011. Also, using log-linear model, the incidence rate was estimated to be 13 of 100000.
Based on the national report in 2009, the incidence of lung cancer in Khuzestan province in men and women had been reported 10 and 3.9 of 100000, respectively. But, based on the present study, the reported incidence in men was 12.2 of 100000 and in women was 6.5 of 100000. These estimations using log-linear model were calculated for men and women 17.3 and 9.4 of 100000, respectively.
The estimated incidence based on the age subgroups represents an increase in disease incidence as the age increases. Also in the current study, the highest incidence is related to the age group of 65 years and older which is in agreement with the study was done in Ardebil province in 2004 and 2006. The highest underestimation was observed in the age group of 80 and older.
The mean age in men and women was 64.3 and 62 years. The age distribution of disease incidence in men and women was not significantly (greatly) different.
6. Strengths and limitations
In spite of the mentioned considerations, there are still death cases which have not been registered in pathological and medical sources in the same year or in the past. These cases may be related to the year of study or previous years which it could affect the results of the study. Therefore, it is suggested that this limitation should be considered in the next researches.
Findings of this study represent essential use of capture-recapture method to determine underestimations in lung cancer incidence and also to estimate its incidence more accurately which be closer to reality.
The results of this study showed that to estimate actual incidence of lung cancer in Khuzestan province, it should be added 28%–40% to the numbers of case registered by the pathology (labs), medical records and death certificates sources, after delete repeated cases. Also, if one of each medical records, pathology and death certificates was considered it should be respectively added 60%, 75% and 63% to each source (data) in order to estimate actual incidence of lung cancer. In current study considering the used methodology is important such a way that the death data related to the year of study were compared to the cases registered in pathological and medical sources in the previous years and the common cases were excluded from the death data. Also, pathological and medical cases in the year of study were compared to death data in the next years and the common cases were added in to the death data. If this methodology is not followed, it will lead to an overestimation.
The authors gratefully acknowledge Abadan School of Medical Sciences for financial sup-port and providing necessary facilities to accomplish thus research with project number of 92/p/p.A/919 .
Declaration of competing interest
There are no conflicts of interests among the authors.
This paper is derived out from thesis of Homayoun Amiri for postgraduate degree of Infectious Diseases specialist (Grant; 92/p/p.A/919). The authors wish to thank the chief and personnel of Shahid Beheshti University of Medical sciences for supporting of this study as well as the research affair deputy of medical college for approving the study. This study was supported by Shahid Beheshti University of Medical Sciences for providing ethical issue: ( IR.SBMU.RAM.REC.1395.91 ) of this research.