Introduction

Several studies have previously reported the estimates of relative risks (RRs) for cancer among both close and distant relatives of cancer cases. The most powerful of these studies have used population-based genealogical resources linked to cancer data, specifically the Utah, Icelandic, and Swedish populations.1,2,3,4,5,6,7,8,9,10,11,12 The Utah resource was created in the 1970s to define familial clustering and identify evidence for heritable contribution to cancer.1,2,13,14,15,16,17 Studies of high-risk pedigrees identified in Utah have led to the identification of several of the common cancer predisposition genes known to date.18,19,20,21 This survey is unique in that our analysis methods and cancer sites differ from those of a recent analysis of Utah data.3 Differences in population distinguish our study from a similar recent investigation in the Iceland population.4 In terms of the first similar investigation carried out in Utah,2 sample sizes are now significantly larger; we have included several new cancer sites, and we estimate risks in relatives for first-, second-, and third-degree relatives. We also estimate risk for cancer of the same site, for cancers of different sites, and coaggregation of cancers within an individual.

This survey of familial clustering and RRs for cancer is the largest and most comprehensive to date, and has been conducted on a large homogeneous population with nearly complete ascertainment and characterization of cancer phenotypes for the genealogical relationships analyzed. Evaluation of clustering of cancers within individuals and among their close and distant relatives allows identification of those cancer sites for which the most evidence exists for a heritable contribution to predisposition, and provides a unique view of interrelationships between cancer sites that may help identify common mechanisms, pathways, and genes.

Materials and Methods

The Utah Population Data Base

The Utah Population Data Base (UPDB) is a computerized database that represents over 6.5 million individuals with records originating from various sources that collectively describe the Utah population. Among the UPDB records are 1.6 million original genealogical records that represent the Utah founding pioneers and their descendents.13 The pioneer genealogies in the UPDB are typically large, spanning 15 generations in some cases.

We have selected a subset of the genealogy data to appropriately match cases and controls with respect to quality and quantity of genealogy data. We included all individuals born before 1972 with records for both parents, all four grandparents, and six of eight great-grandparents. We then included all ancestors of these individuals who met the same criteria and all of their descendents who met the same criteria. This identified approximately 1 million individuals born in Utah after 1850.

The genealogy records in the UPDB have been record-linked to various statewide data resources including the Utah Cancer Registry (UCR), Utah vital records, and statewide collections of electronic medical records. These combined resources have been extensively used to identify familial clustering of cancer and other diseases.16,17,22,23,24,25,26,27,28,29,30 The Utah population contained in the UPDB is genetically representative of Northern Europe and has similar inbreeding levels as compared with other parts of the United States.31,32

The UCR

The UCR was initiated in 1958 and includes cancer data for 190,000 individuals diagnosed with cancer in Utah; some cancer diagnosis data extend to 1952. In 1973, the UCR became one of the now 11 registries forming the National Cancer Institute’s Surveillance Epidemiology and End Results (SEER) program. The UCR provides nearly 100% ascertainment of independent primary cancer. Records within the UCR are documented according to the International Classification of Diseases for Oncology, third revision, and contain diagnosis data including primary site, histology, stage, grade, and age, and also include treatment and survival data.33 Over 65,000 of the 190,000 registered individuals with cancer link to an individual record that is part of the restricted subset of cases we have identified as appropriate for this analysis.

We identified individuals with cancer using definitions that included primary site, histology, and behavior of the cancer (see Supplementary Table S1 online).

Statistical analysis

Estimating the RR to relatives of cases is a common approach to investigating the familial clustering of disease and is a well-established approach for analyzing genealogical data in the UPDB.2,29,30 The RR is defined as the ratio of the observed number of cases among the relatives of probands to the expected number based on the population rate of disease.

To estimate the RR, we first assign all individuals with genealogy data to 132 cohorts based on sex, year of birth (5-year cohorts), and place of birth (Utah or not), as these are the characteristics that affect the quality and quantity of genealogy data, record-linking success, and rate of cancer. We determine internal cohort-specific rates of a specific cancer by summing the number of cases in a cohort and dividing by the total number of individuals in the cohort. The expected number of cancers among the relatives of cancer cases of a specific site is then calculated by multiplying the total number of relatives of cases in a cohort (counted without duplication) by the cohort-specific rate of the cancer, and summing over all cohorts. We then count, without duplication, the number of observed cases among the relatives of cases. The ratio of the observed number of cancers to the expected number of cancers is an unbiased estimator of RR. We calculate one-sided probabilities for the alternative hypothesis RR >1 under the null hypothesis RR = 1, where we assume that the number of observed cases follows a Poisson random variable with mean equal to the expected number of cases. To avoid issues related to small sample sizes, we only consider cancer sites with sample sizes of 200 or more cases. For each cancer site, we estimated the RR for all cancer sites in each individual with cancer, and in their first-, second-, and third-degree relatives. Risks for cancers of different sites among relatives were calculated similarly, using the same cancer rates. This study was approved by the University of Utah Institutional Review Board.

Results

The most straightforward examination of a genetic contribution for a specific cancer involves the estimation of the RR for that specific cancer among both close and distant relatives of individuals diagnosed with the same cancer.

RR estimates for cancer of the same site among first-, second-, and third-degree relatives of cancer cases for all sites are shown in Table 1 . Table 1 includes the cancer site (grouped by system), the number of cancer cases, and, for each degree of relationship, the number of relatives, RR, and the one-sided 95% confidence interval. First-degree RRs were significantly elevated for all cancer sites evaluated, with the exception of a few rare cancers (n < 316).

Table 1 Estimated RR for cancer of the same site among first-, second-, and third-degree relatives of cancer cases by site

Those cancer sites with significantly elevated RRs for third-degree relatives, which indicate strong support for a heritable contribution, include lip, melanoma, breast, female genitals, ovarian, small intestine, colon, non-Hodgkins lymphoma, chronic lymphocytic leukemia, thyroid, lung/bronchus, larynx, prostate, and renal.

We also estimated RRs for cancers of different cancer sites among relatives of cases for each site. The numerous RR estimates are shown in Supplementary Table S2 online. We have summarized all of the significant between-cancer results graphically in Figure 1 . Figure 1 depicts associations between cancer sites based on the estimated risk of specific cancer sites among the relatives of individuals with cancer of a different site. Figure 1 includes those associations between cancer sites for which the RR was significantly >1.0 (P < 0.05) for first-, second-, and third-degree relatives analyzed separately (combined P < 0.000125). By this definition, 21 of 36 cancer sites analyzed had a significant association with at least one other cancer. Also shown in Figure 1 , with darker connecting lines, are the most significant associations for which significant RRs (with P < 0.005) were observed for first-, second-, and third-degree relatives. Viewed with either threshold, prostate cancer was the most interconnected site; a significantly excess risk of prostate cancer was observed in the first-, second-, and third-degree relatives of cases for 11 different cancer sites.

Figure 1
figure 1

Associations between cancer sites as measured by significantly increased relative risks (RRs) for first-, second-, and third-degree relatives. Associations with P < 0.05 for each of first-, second-, and third-degree relationships are shown. Associations with significantly elevated RRs (P < 0.005) for all three degrees of relationship are shown in bold connecting lines. Single connecting lines show significant excess risk between cancer sites in the direction of the arrow. Sites with two connecting lines between them are significant in both directions. Loops indicate significance for the same site among relatives of probands.

It is well-recognized that cancer genes are not always cancer site–specific (e.g., increased risk for melanoma as well as pancreatic cancer in CDKN2A carriers). To better understand the genetic contributions to cancer site associations, we also estimated the RR for cancer of different sites among individuals diagnosed with more than one primary cancer. The estimated RRs for all types of cancers considered in all individuals with multiple cancers are shown in Supplementary Table S3 online. Figure 2 graphically illustrates those associations of cancers for which, among all individuals with a cancer of a specific type, we observed a significant excess of another independent cancer of a different site in the same individual. For clarity of presentation, only the most extreme associations are depicted (P < 1 × 10−6). Some of the associations include those for which environmental causes might be argued; larynx with lung cancer, and lung cancer with lip cancer might be cancer pairings observed in a smoking individual, for example. Other combinations such as prostate and melanoma (observed in both directions) might suggest an as-yet-unidentified gene affecting the risk for both cancers, as has previously been suggested.12

Figure 2
figure 2

Associations between cancer sites within an individual with multiple primary cancers by cancer site for P values <1 × 106.

Discussion

This analysis of cancer clustering in the Utah population is the largest such comprehensive survey published to date. Our analysis of risks for cancer of the same site in first-degree relatives confirms our earlier findings,2 and the findings of others,3,4 that for most cancer sites considered, significantly elevated RRs are observed among first-degree relatives. Although analyses concerning only first-degree relationships may suggest a shared genetic effect, it is equally likely that excess risks are due to a shared environmental effect, or some combination. Therefore, our investigation expanded the analyses to include more distantly related cancer cases, providing a more stringent test for evidence of a heritable contribution to cancers. The validity of this test is suggested by the observation that third-degree relatives are expected to share environmental exposures at no more than a population level, given that they are unexpected to live in the same house or work in the same occupations, for example. It follows that any observed excess risk among third-degree relatives is an artifact of a genetic effect, the true extent of which is drastically understated by the RR estimate, given that third-degree relatives share on average only a small fraction of identical genomes by descent, on the order of 1/64 for a typical pair of such individuals. Using the existence of significantly increased risk of cancer at the same site among third-degree relatives as our criterion for evidence, we propose the following cancers as having strong evidence for a heritable contribution to cancer: lip, melanoma, breast, female genitals, ovarian, small intestine, colon, non-Hodgkin’s lymphoma, chronic lymphocytic leukemia, thyroid, lung/bronchus, larynx, prostate, and renal cancers.

For cancer sites for which neither second- nor third-degree RRs were significantly elevated, we refrain from conclusions regarding evidence for a heritable contribution. These cancers include tongue, esophagus, stomach, rectum, multiple myeloma, acute myeloid leukemia, brain, testis, salivary, gallbladder, liver, anus, chronic myeloid leukemia, spinal cord, and bone cancers. It is noteworthy that although some of these cancers have small sample sizes and may simply be underpowered in our analysis, others have more than 1,000 cases available for analysis and still do not show evidence for a genetic contribution to risk.

Another familial investigation of clustering of cancer was conducted in this same Utah resource with a different familial aggregation approach. Kerber and O’Brien3 used a conditional logistic regression method to estimate the proportion of overall cancer risk that is attributable to family history and reported recurrence risks within high-risk pedigrees for various cancer sites.3 The analysis of first-degree relatives provided similar risk estimates, with 23 of 32 cancer sites showing significant excess risk. Concerning more distantly related individuals, the analysis of third-degree relatives reported by Kerber and O’Brien3 revealed 11 significant sites among third-degree relatives with some overlap to our results (specifically, lip, ovarian, colon, prostate, and chronic lymphocytic leukemia), but also with some disparity. For instance, we did not find significant elevated risk to third-degree relatives for cancers of testis, liver, or gallbladder, nor did their analysis report elevated risk among third-degree relatives for breast, melanoma, non-Hodgkins lymphoma, lung, cervix, renal, female genitals, small intestine, thyroid, or larynx cancers. The differences between the two investigations could be attributed to the difference in methodologies or the increase in the size of the resource over time.

An analysis using methods similar to our own was conducted in the computerized population genealogy of Iceland.4 The Iceland study similarly estimated RR for first-, second-, and third-degree relatives, as well as more distant relationships. Although there were some differences in the specific cancer sites selected or in their definitions, a comparison of first-, second-, and third-degree RRs for cancer of the same site showed remarkable similarity. Those sites for which differences were noted include two sites for which the Iceland study showed significant evidence for a genetic contribution when we did not—stomach (for which Iceland has three times as many cases as Utah, within a data set of one half as many cancers) and rectal cancer (for which we have previously published evidence for a significant excess in first-, second-, and third-degree relatives using a larger set of cases not so stringently screened for amount of genealogy data available28)—and one site for which the Iceland study observed suggestive evidence and we did not: esophagus. Three other cancer sites for which we observed significant evidence for a genetic contribution and the Iceland study did not are larynx, Hodgkin’s disease, and lip (for which we have four times as many cases within a data set that has twice as many total cancers).

Because we recognize that cancer predisposition genes may not be site specific, we also investigated evidence for a genetic contribution to different sites of cancer. We investigated the excess of different cancer sites among relatives of individuals diagnosed with a different cancer site (graphically displayed in Figure 1 ). Whereas the graphical representation focuses primarily on prostate cancer (with 11 related cancers), a similar graphical representation from the Iceland study ( Figure 1 in ref. 4) centers on stomach cancer (with seven related cancers) as well as prostate cancer (with six related cancers). In the Utah data, we observed all of the prostate/other cancer associations shown in Amundadottir et al.4 (as well as the lung cancer and cervix cancer association), but we observed only one of the stomach associations they report (with brain cancer).

These two unique resources seem to have differing power based on differing rates for several cancers. When loosely compared, the Utah data set included twice as many total cancers as the Iceland data set (65,000 vs. 32,000), so assuming a similar age structure it might be expected to have twice as many cancers per site. Those cancers with the biggest discrepancy (Iceland vs. Utah) include prostate (3,380 vs. 13,933), stomach (2,890 vs. 1,219), thyroid (957 vs. 1,242), endometrial (753 vs. 2,617), esophagus (535 vs. 382), and lip (244 vs. 986) cancers.

Prostate cancer was observed in significantly excess rates among relatives of melanoma, breast cancer, non-Hodgkins lymphoma, thyroid cancer, and brain cancer cases, and was also the most interconnected cancer site observed in our analysis. Prostate cancer also had the largest sample size analyzed, which increases the statistical power of tests concerning this site. Several of the connections between prostate cancer and other sites were also observed in the Swedish resource, including melanoma, breast, ovary, leukemia (histology not specified), and brain cancers.34,35 Apart from the overwhelming associations of prostate cancer, two main anatomical clusters appear in the association analysis for risk of cancer of the same site in relatives. The first involves gastrointestinal cancer sites, including colon, rectum, stomach, and anus; the second includes lip, lung/bronchus, and larynx. The Swedish resource previously identified a significant association between stomach and colon cancer in a similar analysis involving first-degree relatives.36 Another cluster of interest involves hormone-related sites including prostate, breast, ovary, endometrial, thyroid, and possibly renal and brain cancers. Amundadottir et al.4 reported clustering of hormone-related sites, including pancreatic cancer, which was not detected in our analysis. A significant association between breast and ovarian cancers was also recently detected in the Swedish resource.37 Our analysis appears to confirm the probable association of common (possibly heritable) factors for cancers of hormone-related sites. In this Utah analysis, multiple myeloma was significantly associated with lung, melanoma, and prostate cancers. Interestingly, none of the hematological sites was reported to show significant associations with other cancers in the Iceland study.

Our analysis of cancer coaggregation within individuals with multiple primary cancers shows some evidence for site clustering (colon/rectum/anus, lip/lung/larynx, tongue/pharynx, and female genitals/cervix) that could be due to shared risk factors for these cancers, or cancer predisposition that is not site specific, or some combination of both genetic and environmental causes. The occurrence of multiple primary cancers in the same individual has been extensively documented in the Swedish genealogy. Cancer sites with significantly more primary cancers within the same individual that were identified in this analysis and have also been observed in the Swedish genealogy include bladder cancer associated with renal, cervix, and prostate, and endometrial cancer associated with colon cancer.38,39

We expect that our results are conservative for several reasons. Because we used strict guidelines for the amount of genealogical data to ensure high-quality genealogy structure, we limited the overall sample sizes available for analysis. In addition, we expect our results to be affected by the fact that because some cancers are more common, they are possibly more detectable. That is, we may not be able to detect risk among distant relatives for cancer sites with small sample sizes, and therefore, we cannot assume that no heritable contribution exists for these sites. Furthermore, for sites with small sample sizes, results are likely to be underestimated. Because our analysis relies on near-complete ascertainment for cancer data and outcomes are collected on a population basis, we assume that the results of our study do not suffer from the same sources of bias that are generally present in other risk assessment designs, such as a patient’s ability to recall information, or knowledge of his or her relatives’ health status.

This study used a uniform, consistent source for all diagnoses, and is not limited by bias introduced by study designs involving selected ascertainment of cases or requiring recall for diagnoses. The most significant limitation of this analysis is the very narrow window of view provided to identify individuals diagnosed with cancer (in Utah from 1966 to present). This might limit our ability to identify cases who might be related across different generations (e.g., grandparent/grandchild or avunculars). Although cancer cases may have been censored from our observation in this resource for these reasons, cases are similarly censored for our control analyses, leading to conservative but unbiased estimates of familiality.

We also acknowledge that the RR estimates reflect cancer rates within the UDPB and our results may not specifically generalize to wider populations. Regardless, we anticipate that the reported risk estimates may be of value in a clinical setting in which cancer family history data are available, and that the results will be valuable in guiding future efforts to identify genetic factors contributing to cancers. The results suggest that there is still much to be learned by the study of high-risk cancer pedigrees.

Disclosure

The authors declare no conflict of interest.