Main

Large genome-wide association studies have identified several common genetic variants associated with complex diseases. To date, more than 60 common breast cancer (BC) susceptibility alleles have been identified (Cox et al, 2007; Easton et al, 2007; Stacey et al, 2007, 2010; Ahmed et al, 2009; Thomas et al, 2009; Antoniou et al, 2010; Turnbull et al, 2010; Fletcher et al, 2011; Milne et al, 2011; Ghoussaini et al, 2012; Hein et al, 2012; Michailidou et al, 2013). At the time of the present analysis, 24 common alleles were known to be involved in BC susceptibility. However, recent studies based on genotyping of the iCOGS custom array have since identified 47 additional common BC susceptibility alleles (Couch et al, 2013; Garcia-Closas et al, 2013; Gaudet et al, 2013; Michailidou et al, 2013).

Genome-wide association studies have usually used samples of unrelated cases and unrelated controls to evaluate evidence of associations and obtain relative risk (RR) estimates. Family-based data, where several family members are genotyped, could be an additional resource to assess such associations and for characterising the risks conferred by genetic susceptibility variants, yet they are underutilised (Galvan et al, 2010). This approach is appealing because common alleles conferring increased disease risk are expected to cluster in families exhibiting disease family history (FH). Furthermore, with pedigree data it is possible to estimate genetic parent-of-origin specific risks depending on whether a risk allele was inherited from the father or mother, which is not possible under a population-based study design. Standard case–control analysis methods are not optimal for estimating the risks conferred by single-nucleotide polymorphisms (SNPs) in situations where families are ascertained on the basis of multiple disease cases. Analysing pedigree data using standard analytical methods (e.g., logistic regression) could lead to biased association estimates as they do not account for correlations in genotypes between related individuals. In addition, they do not adjust for the fact that families may be ascertained on the basis of multiple affected family members and that SNPs (or other genetic factors) are expected to be correlated with FH of the disease. The retrospective likelihood (RL) approach has been shown to adjust for ascertainment bias when ascertainment of individuals or families is non-random with respect to disease phenotype (Carayol and Bonaïti-Pellié, 2004). This approach involves modelling the likelihood of the observed family genotypes conditional on family disease phenotypes. We developed pedigree RL methods for assessing associations with genetic variants and estimating the associated risks in the context of genetic susceptibility to BC. This approach takes the form of a modified segregation analysis that accounts for explicit correlations in genotypes between related individuals while adjusting for ascertainment.

At the time of analysis, 24 SNPs had been shown to be associated with BC risk, primarily through large population-based case–control studies (Supplementary Table 1) (Cox et al, 2007; Easton et al, 2007; Stacey et al, 2007, 2010; Ahmed et al, 2009; Thomas et al, 2009; Antoniou et al, 2010; Turnbull et al, 2010; Fletcher et al, 2011; Milne et al, 2011; Ghoussaini et al, 2012; Hein et al, 2012). We applied the pedigree RL approach to estimate SNP associations with BC risk using data from 736 families recruited on the basis of strong FH of BC and a set of unrelated unaffected controls. Our results were contrasted to those obtained from standard analytical methods such as logistic regression.

There has been criticism of the assumption in association studies that maternally and paternally inherited alleles are functionally equivalent (Guilmatre and Sharp, 2012). Three mechanisms to describe parent-of-origin effects (POEs) have been suggested: (i) the influence of the maternal intrauterine environment on fetal developments; (ii) expression of genetic variation from the maternally inherited mitochondrial genome; and (iii) epigenetic regulation of gene expression, for example, genomic imprinting (suppression of gene expression that has been passed from one parent’s germline) (Falls et al, 1999; Haghighi and Hodge, 2002; Rampersaud et al, 2008). Classic examples of imprinting are Prader–Willi and Angelman syndromes, which can occur when the same region on chromosome 15 is either maternally or paternally imprinted, respectively (Falls et al, 1999). A previous study found that one of the BC susceptibility variants that we analysed, SNP rs3817198 in the 11p15 region (LSP1 gene), displayed POE with BC risk (Kong et al, 2009). Analysing data under a POE-type analysis, the paternally inherited allele expressed a significant association (OR=1.17, 95% CI: 1.05–1.30, P=0.0038), whereas the maternally inherited allele did not (OR=0.91, 95% CI: 0.81–1.02, P=0.11). These observations are consistent with reports that the 11p15 region hosts a cluster of imprinted genes, some of which may be related to BC risk (Berteaux et al, 2008). The results presented by Kong et al (2009) indicate a paternal effect of this locus on BC risk. These findings have not yet been replicated. We extended our pedigree RL framework to examine POE by estimating RRs separately for a maternally and paternally inherited risk allele. This is not possible under a standard case–control analytical design. We evaluated these associations for all BC susceptibility alleles investigated.

We further used the available genotype data to compute a combined observed genotype risk score to investigate whether this risk score can discriminate between women with FH of BC and unaffected women.

Materials and methods

Study sample

The Kathleen Cuningham Foundation Consortium for Research into Familial Breast Cancer (kConFab) enrols families with multiple cases of breast and/or ovarian cancer from Australia and New Zealand (Kathleen Cuningham Foundation Consortium for research into Familial Breast Cancer (kConFab), 2012). To date, kConFab has enrolled over 1400 families. The Australian Ovarian Cancer Study (AOCS) has recruited over 1800 ovarian cancer cases and 1000 population-based controls (Australian Ovarian Cancer Study (AOCS), 2012).

Our analyses considered data from 798 kConFab families. Eligibility was restricted to families with at least one family member genotyped for the SNPs of interest. Families were systematically screened for and excluded if found to contain a mutation in BRCA1, BRCA2 or ATM. We excluded families if at least one family member was found to have a mutation in any of the CHEK2, TP53, PTEN, RAD51C, MLH1 or MSH2 genes, but screening of these genes was less systematic. In total, 736 families were eligible for analysis. A total of 897 unaffected population-based controls from AOCS were also included.

Mendelian inconsistencies in genotype transmission from parents to offspring were tested using PedCheck (O’Connell and Weeks, 1998). Detected Mendelian inconsistencies were rectified by first clarifying family relationships. Where this was not possible we replaced inconsistent genotypes as missing such that as little genetic data were lost and Mendelian consistency throughout the remainder of the pedigree held.

Genotyping

SNPs were genotyped using MALDI-TOF spectrophotometric mass determination of allele-specific primer extension products with Sequenom MassARRAY platform Sequenom, Inc., San Diego, CA, USA and iPLEX Gold technology (Sequenom, Inc.,). Primer design was carried out according to Sequenom guidelines using MassARRAY Assay Design software (version 3.0). Multiplex PCR amplification of fragments containing target SNPs was performed using Qiagen HotStart Taq Polymerase (QIAGEN, Hilden, Germany) and a PerkinElmer GeneAmp 2400 thermal cycler (PerkinElmer, Waltham, MA, USA) with 10 ng genomic DNA in 384 well plates. Shrimp Alkaline Phosphatase and allele-specific primer extension reactions were carried out according to the manufacturer’s instructions for iPLEX Gold chemistry. Assay data were analysed using Sequenom TYPER software (version 3.4). Cluster plots were visually inspected and standard quality-control measures were checked, including Hardy–Weinberg equilibrium P0.01, plate call rate 95% and duplicate concordance rate 98% (of 5% duplicated samples).

Analytical framework

We assumed an underlying genetic model where BC susceptibility is explained by the genetic variant of interest and a residual polygenic component that represents the multiplicative effects of several loci, each of which have small contributions to disease risk. The disease incidence, λi(t), was assumed to depend on the genetic effects through a model of the form:

where λ0(t) is the baseline incidence, β is the per-allele log RR, gi={0,1,2} is the SNP genotype for individual i and Pi is the polygenic component assumed to be normally distributed:

where is the residual polygenic variance. Because all families were found to segregate BRCA1 or BRCA2 mutations, as well as some rarer mutations in other susceptibility genes were excluded, this model is plausible for the families we analysed. We constrained the sum of the variance of the measured locus of interest, , and the residual polygenic variance, , such that they agree with external estimates of the total polygenic variance (Antoniou et al, 2002). Hence,

This is in line with a multiplicative assumption between the measured locus and polygenic component. A previous segregation analysis estimated σP=1.29 (Antoniou et al, 2002). Under the polygenic model, is the coefficient of variation in incidences (Risch, 1990). is also the familial RR (FRR) to the monozygotic twin of an affected individual (λM), such that . Under the assumed model, it has previously been shown that the variance of the locus of interest, , will be given by where is the FRR to a monozygotic twin due to the locus on its own (Risch, 1990; Antoniou and Easton, 2003). Therefore, the known component of the polygenic variance was calculated as;

where τg is the frequency of genotype g={0,1,2} calculated under the Hardy–Weinberg equilibrium assumption (Antoniou and Easton, 2003). The polygenic component was approximated by the hypergeometric polygenic model (Fernando et al, 1994; Lange, 1997; Antoniou et al, 2001).

We assumed a censoring process such that an individual was followed from birth until the age at first BC diagnosis, age of death, age at last observation or at 80 years of age, whichever occurred first. Individuals censored at 80 years of age were censored as unaffected at this time point. We assumed men were not at risk of developing BC. In the instance of no available censoring age, we censored at 0 years.

The BC incidences were constrained over all genetic effects (Antoniou et al, 2001) to agree with the Australian female BC incidences for the 1993–1997 calendar period (International Agency for Research on Cancer (IARC), 2010).

Retrospective likelihood segregation models

Because families were ascertained on the basis of multiple affected family members, we modelled the RL of observing family genotypes conditional on family disease phenotypes. The likelihood was parameterised in terms of the allele frequency and per-allele log RRs (β). To obtain parameter estimates, we maximised the likelihood over the genotype frequencies and log RR. We also fitted models where no residual polygenic effect was assumed in order to investigate the effect on parameter estimates when no assumptions were made about the residual familial clustering of BC.

Parent-of-origin effects

The pedigree RL framework was extended to account for POE. Here we simultaneously model the risk associated with a maternally inherited allele and paternally inherited allele. We denote the maternal log RR as βm, the paternal log RR as βp, a maternally inherited risk allele indicator variable taking values 0 if no maternally inherited risk allele is present and 1 if a maternally inherited risk allele is present as and similarly a paternally inherited risk allele as . Under this model, the disease incidence had the form:

We jointly maximised the likelihood over allele frequencies and both the maternal and paternal log RRs to obtain estimates for these parameters.

We evaluated evidence for POE by testing for differences between the maternal log RR and paternal log RR using a likelihood ratio test. For this purpose, the likelihood obtained from the POE model was compared with the likelihood under a single gene model that estimated a single per-allele HR assuming the same effect for maternally and paternally inherited risk alleles.

As the primary aim of the POE analysis was to test for equality in the paternal and maternal log RRs, the polygenic component was omitted. This was in order to reduce the computational complexity.

Logistic regression analyses

Standard logistic regression analyses were performed for comparison purposes. To account for relatedness within families, we estimated robust s.e. (Huber, 1967; White, 1980, 1982). Two types of analyses were undertaken: (i) unaffected AOCS controls vs all affected kConFab female family members and (ii) unaffected AOCS controls vs one selected affected kConFab female per family (usually the female family member that led to family ascertainment).

Assessing discrimination based on SNP profiles

To evaluate the ability of SNP profiles to discriminate between unaffected women and affected women with FH of BC, we computed an observed risk score (ORS) for each individual. The score, Si, for individual i based on the combined effects of all SNPs was given by:

where S is the number of SNPs, is the published population-based estimate of the per-allele log OR (Supplementary Table 1) and gji={0,1,2} is the observed genotype for individual i at SNP j. The ORS was calculated for a single affected female family member who had been genotyped for all SNPs and all controls. The discriminatory ability of the ORS was evaluated using receiver operating characteristic (ROC) curves by calculating the area under the curve (AUC).

Statistical software

Logistic regression and ROC analyses were performed using Stata version 11.1 (StataCorp LP, 2009). The segregation and POE models were implemented using pedigree analysis software MENDEL (Lange et al, 1988).

Results

Study population

After quality-control checks, 736 kConFab families with at least one genotyped individual, comprising 45 822 individuals, and 897 unrelated unaffected controls from AOCS were eligible for analyses. Sample characteristics are summarised in Table 1. In brief, 6907 individuals were genotyped for at least one SNP. Of these, 1673 (24.2%) were male and 5234 (75.8%) were female. In total, 1590 (30.4%) affected females and 3644 (69.6%) unaffected females were genotyped. The average number of individuals genotyped in these families was eight.

Table 1 Summary of the kConFab and AOCS study populations

Single SNP association results using logistic regression and segregation analyses

Tables 2 and 3 display logistic regression and segregation analysis results. Figure 1 shows a comparison of log RR estimates under different analytical models.

Table 2 Logistic regression analysis results
Table 3 Single gene and polygenic segregation analysis results
Figure 1
figure 1

Scatter plots of log RR estimates from published population-based studies ( Supplementary Table 1 ) (all x -axes) vs: (A) logistic regression estimates comparing AOCS controls against all familial cases (Table 2); (B) logistic regression estimates comparing AOCS controls against one selected female case per family (Table 2); (C) single gene segregation model estimates (Table 3); and (D) polygenic segregation model estimates (Table 3). The dashed line is y=x, the line of equality. ICC=intraclass correlation coefficient.

Single gene models

Fourteen SNPs were significantly associated with BC risk at the 5% significance level when data were analysed under a single gene model that does not allow for residual polygenic effects. The most significant association was FGFR2 SNP rs2981582 (HR=1.20, 95% CI: 1.13–1.27, P=6.75 × 10−10).

Incorporating residual polygenic effects

Thirteen SNPs were significantly associated with BC risk (5% significance level) when data were analysed under the model allowing for residual familial clustering in terms of a polygenic component. All these SNPs were significantly associated when the data were analysed under the single gene model. C6orf97 SNP rs12662670 was the only SNP significantly associated under the single gene model that was not associated with risk under the model that incorporates polygenic background (single gene P=3.64 × 10−4; polygenic P=0.086). Overall, P-values of association were similar under both pedigree analysis models (Figure 2). As with the single gene model, FGFR2 SNP rs2981582 provided the strongest association with BC risk (HR=1.26, 95% CI: 1.17–1.36, P=9.04 × 10−10). For SNPs providing evidence of association (P<0.05), the effect size estimates were somewhat larger under the model allowing for polygenic background but the strength of association was generally similar. The estimated HRs under the polygenic model were closer to OR estimates obtained from population-based studies than the estimates under the model that did not allow for polygenic background (Figure 1).

Figure 2
figure 2

Scatter plot of −log10 P -values from the: (i) polygenic segregation model (Table 3); (ii) single gene segregation model (Table 3); (iii) logistic regression A: logistic regression estimates comparing AOCS controls against all familial cases (Table 2); and (iv) logistic regression B: logistic regression estimates comparing AOCS controls against one selected female case per family (Table 2). The dashed line represents a P-value of 0.05, the nominal significance level. SNPs are ordered by the P-values of the polygenic segregation analysis model. The segregation models generally yielded smaller P-values, indicating that these models have greater power to detect associations. 19p13 SNPs rs2363956 and rs8170 are not displayed as they are associated with ER-negative BC.

SNPs that were significantly associated with risk accounted for between 0.20 and 1.62% of the total polygenic variance, but most SNPs accounted for <1%. Only two SNPs, rs2981582 in FGFR2 and rs13387042 at 2q35, accounted for >1% of the total polygenic variance.

A comparison of estimates of association from the segregation analyses to those obtained from the naive standard case–control analyses revealed that logistic regression typically overestimated associations. For almost all SNPs, the absolute value of the estimated log OR from the logistic regression comparing AOCS controls against all female cases exceeded those obtained under the segregation models. Moreover, the estimated ORs more often lay outside the CIs of the population-based OR estimates compared with the segregation analysis models (Supplementary Figure 1).

Parent-of-origin effects

The POE segregation analyses were performed assuming no residual polygenic background. This is a reasonable assumption as the primary aim was to test for differences in paternal and maternal HRs. Moreover, the pedigree analysis becomes complex because of the implementation of the hypergeometric approximation to the polygenic model. Results for POE analyses are given in Table 4.

Table 4 Segregation analysis results allowing for parent-of-origin effects

Two SNPs showed significant associations with the paternally inherited allele only. Five SNPs yielded significant associations with the maternally inherited allele only. The HR estimate for the paternally inherited allele of SNP rs3817198 in LSP1 was 1.12 (95% CI: 0.99–1.27, P=0.081). Under a one-sided hypothesis testing HR >1, the P-value was 0.04.

One SNP, rs13387042 at 2q35, showed statistically significant associations for both a paternally inherited (HR=1.20, 95% CI: 1.04–1.37, P=0.0096) and maternally inherited risk allele (HR=1.16, 95% CI: 1.03–1.31, P=0.014). No SNP exhibited significant differences between HR estimates for the maternally and paternally inherited allele (P-value range: 0.07–0.95).

Risk score comparisons

Two SNPs at 19p13 (rs2363956 and rs8170) were excluded when constructing risk scores as they are primarily associated with ER-negative BC risk (Antoniou et al, 2010). The mean (s.d.) ORS was 2.47 (0.40) in 1147 individuals (715 unaffected and 432 affected) genotyped for all 22 SNPs. There was a significant difference in the mean ORS between unaffected (mean ORS (s.d.)=2.40 (0.39)) and affected (2.60 (0.39)) women (P=6.38 × 10−17). The estimated AUC was 0.642 (95% CI: 0.610–0.675) (Figure 3).

Figure 3
figure 3

(A ) Density plots of the ORS based on 22 SNPs for women with FH of BC ( n =432) and controls ( n =715). (B) ROC curve for the ability of the ORS based on 22 SNPs to discriminate between cases with FH and controls. The x-axis is 1-specificity (false-positive rate) and the y-axis is the sensitivity (true-positive rate). The dashed line represents an AUC of 0.50, indicating prediction no better than chance alone.

As expected, the distribution of the ORS for unaffected women from the kConFab families, that is women with FH of BC, lies between the risk distributions of the population-based controls and affected women (Supplementary Figure 2).

Discussion

In this article, we developed an analytical framework to estimate associations between SNPs and BC risk within a pedigree setting. This approach provides an efficient method for investigating associations of polymorphisms on disease risk. We extended these methods to estimate parent-of-origin associations by separately estimating HRs for maternally and paternally inherited risk alleles. This is the first time POE have been evaluated for most of the common genetic variants found to be associated with BC risk. Although we demonstrate these methods in the context of evaluating associations with BC risk, the principles are applicable to other cancers but also other complex diseases that exhibit familial aggregation.

We applied these methods to family data from kConFab, a family-based study in which families were recruited through multiple relatives diagnosed with breast and/or breast/ovarian cancer. Analysing such associations using standard analytical methods could yield biased association estimates due to non-random ascertainment of families with respect to disease phenotype and that genetic variants are likely to be correlated with FH of disease. Analysing data within a pedigree RL framework accounts for relatedness and adjusts for ascertainment bias.

Our results demonstrate that standard logistic regression analyses applied in this context generally overestimate the magnitude of disease associations when compared with estimates published by large collaborative studies. More often, those were outside the published CIs. However, estimates from the modified segregation analysis were, generally, very close and within the CIs of the reported estimates by the population-based studies (Cox et al, 2007; Easton et al, 2007; Stacey et al, 2007, 2010; Ahmed et al, 2009; Thomas et al, 2009; Antoniou et al, 2010; Turnbull et al, 2010; Fletcher et al, 2011; Milne et al, 2011; Ghoussaini et al, 2012; Hein et al, 2012).

In addition, the segregation models generally yielded smaller P-values for association than those obtained through the logistic regression analysis. This suggests that this approach has greater power to detect associations than using standard case–control analysis that ignores pedigree structure. Likely explanations include the fact that pedigree analysis methods model exact genetic correlations between relatives, and the additional information is extracted by phenotypes of family members that had not been genotyped. Additional gains in power would be expected by the use of pedigree-based methods in settings where a clear ascertainment process exists, which would involve conditioning on the phenotypes of all family members. Therefore, a family-based approach is a useful and efficient method to investigate the contribution of genetic variants to disease risk.

Our models used external data on population BC incidences and for the magnitude of the assumed polygenic variance in the polygenic model. Sensitivity analysis by misspecifying the assumed population incidences to be half or double the true population incidences revealed small deviations in the RR estimates (relative bias <3%). Similarly, varying the assumed polygenic variance to be up to 80% of the assumed polygenic variance in our models had a negligible effect on the RR estimates (relative bias <1%). This suggests that the estimates obtained under the methods presented are robust against misspecifications in the external model parameters.

Alternative association methods using pedigree data have been suggested. A case-only pedigree RL approach had been suggested and applied to the analysis of associations with prostate cancer risk (Schaid et al, 2010). However, this differs from our approach in that it does not consider genotype data from unaffected family members. Our approach allows for estimation of allele frequencies and RR parameters simultaneously, whereas Schaid et al used external allele frequency estimates. Unlike Schaid et al, our analyses incorporated all genetic information provided from all family members, therefore providing more information in the estimation process. The genetic model employed by Schaid et al was similar to our model by allowing for residual correlations between family members using a random baseline risk parameter. Schaid et al found that RRs estimated under the pedigree RL were consistent with ORs estimated by large case–control studies, agreeing with our findings.

After accounting for ascertainment and the residual polygenic variance, the RR estimates for the known common BC susceptibility alleles were similar to those obtained from population-based case–control studies (Cox et al, 2007; Easton et al, 2007; Stacey et al, 2007, 2010; Ahmed et al, 2009; Thomas et al, 2009; Antoniou et al, 2010; Turnbull et al, 2010; Fletcher et al, 2011; Milne et al, 2011; Ghoussaini et al, 2012; Hein et al, 2012). This observation suggests that the polygenic model of inheritance provides a good fit to the observed familial aggregation of BC. First, it implies that the residual genetic susceptibility to BC is unlikely to be due genes conferring large contributions to the familial risk of the disease of magnitude similar to that of BRCA1 or BRCA2 mutations. Instead, the residual genetic variability is likely to be due to genetic effects that have small contributions to the BC familial risk. That is, either common alleles conferring low risks or rare variants conferring moderate risks. Second, our findings suggest a general model of genetic susceptibility where the joint effects of the common alleles studied in the present study and other, as yet unidentified, BC susceptibility variants are multiplicative. Therefore, we can infer that interactions between the studied common alleles and other residual genetic effects are unlikely.

The pedigree RL was adapted to estimate parent-specific genetic effects for each common allele. This was achieved by separately estimating the risk for a maternally and paternally inherited risk allele. Although other methods have been suggested for evaluating POE, those involve direct genotyping of parents and offspring, and they may not make full use of multigenerational pedigree data or do not adjust adequately for ascertainment (Haghighi and Hodge, 2002; Belonogova et al, 2009; Kong et al, 2009; Feng et al, 2011; He et al, 2011; Li et al, 2011).

Our analyses suggested no significant differences between estimated HRs for maternally and paternally inherited alleles for any of the 24 SNPs. The LSP1 SNP rs3817198 had previously been shown to display POE with BC risk where the paternally inherited allele was associated with increased BC risk (OR=1.17, 95% CI: 1.05–1.30, P=0.0038) (Kong et al, 2009). They also found a decreased BC risk if the risk allele was maternally inherited, but this was not significant (OR=0.91, 95% CI: 0.81–1.02, P=0.11). The magnitude and direction of our estimates for this SNP are comparable to those reported by Kong et al (paternal HR=1.12, 95% CI: 0.99–1.27, P=0.081; maternal HR=0.94, 95% CI: 0.84–1.06, P=0.33). Our analyses did not detect a significant difference between the maternal and paternal effect (P=0.11). This is possibly because of the much greater sample size employed by Kong et al – 34 909 controls and 1803 BC cases, all of whom were genotyped or had imputed genotype data available. Our analyses included 5251 unaffected individuals and 1463 BC cases. It is worth noting that the paternal HR for LSP1 SNP rs3817198 was significant under a one-sided test for the hypothesis that the paternal HR >1 (P=0.04). We meta-analysed our LSP1 SNP RR estimates with those reported by Kong et al (Supplementary Table 2). The meta-analysis yielded a maternal RR=0.93 (95% CI: 0.85–1.01, P=0.066) and a paternal RR=1.15 (95% CI: 1.06–1.24, P=7.8 × 10−4). These analyses suggest no association with the maternally inherited C allele but provides stronger evidence of association with the paternally inherited C allele. Although no significant differences were observed between the estimates for the paternally and maternally inherited alleles at other loci, we observed associations for several SNPs with either the maternally or paternally inherited alleles. The current approach for evaluating POE could, potentially, be useful in the fine mapping efforts of these loci in determining causal variants.

Recent studies have estimated the ROC AUC to investigate the effect of SNPs on discriminating between affected and unaffected women. Wacholder et al (2010) used a modified Gail model to demonstrate an increase in AUC from 0.580 to 0.618 when the effects of the (at the time) 10 known genetic variants associated with BC risk were incorporated into the model. Sawyer et al (2012) have described the largest AUC (0.654, 95% CI: 0.628–0.680) based purely on genetic factors. Their analyses included 22 genetic variants in women with FH of BC in the absence of a known BRCA1 or BRCA2 mutation. We describe a similar AUC when considering the ORS as the sole risk predictor for individuals genotyped for all 22 SNPs. This is consistent with the fact that women with FH of BC are expected to have a higher polygenic load due to familial aggregation of the disease. This suggests that a high polygenic score in combination with a FH of the disease could jointly provide a way to identify those who may be at higher risk of developing the disease, rather than SNPs alone.

In summary, we have presented a novel analytical framework for evaluating associations between common genetic variants and disease risk that harnesses the power and efficiency of family data. Although the methods have been presented in the context of BC susceptibility, the general principles are applicable to other cancers and other complex diseases that have a heritable component. We applied these techniques to data on common susceptibility alleles, although, in principle, the methods could be applied to analyse rare variants conferring moderate cancer risks. We have further demonstrated that combined SNP profiles discriminate more effectively BC-affected status in individuals with FH of the disease compared with the general population, taking us closer to the goal of incorporating SNP profiling into clinical practice.