Cell Migration Gateway Logo

Letter

Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor–positive breast cancer

doi:10.1038/ng2064

Genome-wide association studies have identified several new loci that increase our understanding of the genetics underlying breast cancer risk.

Familial clustering studies indicate that breast cancer risk has a substantial genetic component1, 2, 3. To identify new breast cancer risk variants, we genotyped approximately 300,000 SNPs in 1,600 Icelandic individuals with breast cancer and 11,563 controls using the Illumina Hap300 platform. We then tested selected SNPs in five replication sample sets. Overall, we studied 4,554 affected individuals and 17,577 controls. Two SNPs consistently associated with breast cancer: approx25% of individuals of European descent are homozygous for allele A of rs13387042 on chromosome 2q35 and have an estimated 1.44-fold greater risk than noncarriers, and for allele T of rs3803662 on 16q12, about 7% are homozygous and have a 1.64-fold greater risk. Risk from both alleles was confined to estrogen receptor–positive tumors. At present, no genes have been identified in the linkage disequilibrium block containing rs13387042. rs3803662 is near the 5' end of TNRC9, a high mobility group chromatin–associated protein whose expression is implicated in breast cancer metastasis to bone4.

Mutations in breast cancer susceptibility genes BRCA1 and BRCA2 account for 15%–25% of the familial component of breast cancer risk5, 6. Much of the genetic component of risk of breast cancer remains uncharacterized and is thought to arise from combinations of less penetrant variants that, individually, may be quite common7. Many searches for less penetrant breast cancer risk variants have been carried out using a candidate gene, case-control association approach. Findings from these studies have often proven difficult to replicate8. Recently, common missense variants in two genes, CASP8 and TGFB1, have been shown to be associated with breast cancer risk through well-powered, multicenter analyses9. These reports emphasize the importance of large-scale studies with adequate replication when the goal is to identify common variants conferring modest increases in the risk of breast cancer.

In order to search widely for alleles of common SNPs associating with breast cancer susceptibility, we carried out a genome-wide SNP association study using Illumina HumanHap300 microarray technology. We genotyped 1,600 Icelandic individuals with breast cancer and 11,563 controls; we designated this discovery sample set 'Iceland 1'. After removing SNPs that failed quality control checks, we tested 311,524 SNPs for association with breast cancer. We adjusted the results for relatedness among individuals and for potential population stratification using the method of genomic control10 (see Methods; a Q-Q plot showing the chi2 statistics before and after adjustment is shown in Supplementary Fig. 1 online). We ranked signals by P value and selected SNPs representing the best ten loci for a fast-track follow-up replication strategy. One of the SNPs was in strong linkage disequilibrium (LD) with the known Icelandic 999del5 mutation in the BRCA2 gene and is not discussed further here. We genotyped the remaining SNPs in an independent sample of Icelandic individuals with breast cancer and controls ('Iceland 2') and in two or three independent European case-control sets from Sweden, Spain and The Netherlands. We then tested SNPs that returned nominally significant signals in each replication set in a fifth case-control set comprising European Americans from the US Multiethnic Cohort (see Supplementary Methods online for a description of the case-control sample sets).

Two of the SNPs showed significant signals in all five of these replication sets: the A allele of rs13387042 located on chromosome 2q35 (A-rs13387042) and the T allele of rs3803662 on 16q12 (T-rs3803662) (Tables 1 and 2). In a combined analysis of Iceland 1 and Iceland 2, the variants showed odds ratios (ORs) of approx1.2 and adjusted P values of 1.0 times 10-7 and 6.8 times 10-8, respectively. Owing to the 'winner's curse'11, 12 arising from the selection of the best-performing SNPs from the massive number examined, the effects in Iceland 1 may have been slightly inflated. However, for both rs13387042 and rs3803662, the point estimates of the ORs for Iceland 2 were nominally (although not significantly) greater than in Iceland 1. This suggests that the initial observations in Iceland 1 were not boosted excessively through selection bias. The remaining seven SNPs that were tested in the European samples did not replicate consistently outside Iceland (Supplementary Table 1 online). The SNP associations that did not replicate consistently may have resulted from statistical fluctuations, or they may, like the BRCA2-associated SNP, indicate the presence of underlying mutations that are detectable only in the sample from the Icelandic population. The CASP8 D302H SNP rs1045485, which recently has been shown to confer protection against breast cancer9, has an equivalent SNP (r2 = 1.0 in the Utah Centre d'Etude du Polymorphisme Humain (CEPH) sample) on the Illumina Hap300 chip: rs17468277. In Iceland 1, this SNP returned an allelic OR of 0.92 and a P value of 0.128, which, although not significant, are consistent with the published report.



We combined the results for A-rs13387042 and T-rs3803662 from the four non-Icelandic replication sample sets using the Mantel-Haenszel model, which takes into account the differing frequencies of the alleles in the contributing populations. The combined ORs for the non-Icelanders were 1.21 (P = 2.5 times 10-7) for rs13387042 and 1.33 (P = 6.0 times 10-13) for rs3803662 (Tables 1 and 2). Again, the point estimates of the ORs were nominally greater than the ORs observed in Iceland 1. We did not observe any indication of heterogeneity among the five groups of European descent for the effect of A-rs13387042 (Phet = 0.98). With T-rs3803662, the estimated effect in the Swedish group was notably higher than the estimates for the other four groups, but the test for heterogeneity among the five groups was not significant (Phet = 0.11).

Combining all the Icelandic and non-Icelandic samples gave P values of 1.3 times 10-13 for rs13387042 and 5.9 times 10-19 for rs3803662. Given that 311,524 SNPs were initially tested, applying Bonferroni correction13 gave Padj values of 4.1 times 10-8 and 1.8 times 10-13 for A-rs13387042 and T-rs3803662, respectively, which are highly significant. We concluded that the A-rs13387042 and T-rs3803662 alleles confer increased risks of breast cancer.

We reviewed medical records of the affected individuals, if available. Using the combined sample sets, we looked for associations between the A-rs13387042 and T-rs3803662 alleles and age at diagnosis, estrogen receptor and progesterone receptor status, grade, stage and histopathological subtype. Neither variant showed an association with age at diagnosis (data not shown). Significant breast cancer risk involving A-rs13387042 and T-rs3803662 was clearly confined to individuals diagnosed with estrogen receptor–positive tumors, and the difference between the ORs for estrogen receptor–positive and estrogen receptor–negative tumors was significant (Table 3). Similarly, there was an apparent trend toward breast cancer risk preferentially among patients diagnosed with progesterone receptor–positive tumors, but the difference between progesterone receptor–positive and progesterone receptor–negative ORs was not significant (Table 3). We note that BRCA1-associated breast cancers tend to be estrogen receptor negative, whereas the frequency of estrogen receptor–positive tumors in BRCA2 carriers more closely approximates that of 'sporadic' patients14, 15. This, together with the present findings that A-rs13387042 and T-rs3803662 confer preferential risk for estrogen receptor–positive tumors, supports the notion that estrogen receptor–positive and estrogen receptor–negative tumors may have different genetic backgrounds. The risk alleles were not significantly associated with differences in histopathological subtype, stage or grade, and there was no significant difference in allele frequencies between in situ and invasive tumors (Table 3).


The Iceland 1 and Iceland 2 groups were a prevalence cohort, and some of the members were long-term cancer survivors. To investigate the possibility of a survival bias in allele frequencies, we identified 844 individuals who were diagnosed after 1 January 2000 but no more than 5 years before recruitment (recently diagnosed patients). We also identified 827 patients who were diagnosed before 1 January 1995 and who survived at least 5 years before recruitment (survivor patients). There was no significant difference in frequencies of the A-rs13387042 and T-rs3803662 alleles between recently diagnosed and survivor patients (Supplementary Table 2 online).

The Swedish breast cancer cohort comprised consecutive and familial breast cancer recruitment groups (see Methods). Both groups showed significant association of T-rs3803662 with breast cancer (Supplementary Table 3 online). A-rs13387042 showed significant association with only the consecutive breast cancer group. It is possible that an effect of A-rs13387042 is masked by the presence of high-penetrance predisposition gene variants in the familial breast cancer group. However, the difference in frequencies between the consecutive and familial breast cancer groups was not significant.

We tested breast cancer case-control samples from the US Multiethnic Cohort, comprising individuals of European, Latina, Japanese, African American and Native Hawaiian ancestry, for association with rs13387042 and rs3803662 (Table 4). The frequency of A-rs13387042 varied markedly between ethnicities, from 0.702 in African Americans to 0.119 in Japanese Americans. In Latinas, for both variants, the ORs for breast cancer were marginally significant and were similar to the ORs in European Americans (Table 4). The frequencies in Latinas were intermediate between European and Japanese Americans for both variants. In African Americans, the T-rs3803662 allele was clearly not associated with increased breast cancer risk; indeed, the T allele was significantly protective in the African American sample (OR = 0.77, P = 0.0076). This suggests that the LD relation between the T allele and the putative pathogenic mutation is quite different in African Americans.


We did not observe any interaction between the 2q35 and 16q12 loci; that is, a multiplicative (or log-additive) model provided an adequate fit for the joint risk of the A-rs13387042 and T-rs3803662 alleles. Similarly, neither locus interacted detectably with the BRCA2 999del5 variant (Supplementary Table 4 online). For each locus individually, we evaluated genotype-specific ORs (Table 5). Combining the results for all sample sets, the multiplicative model provided an adequate fit for rs3803662; the risks predicted by the full (genotype-specific) model were not significantly different from those predicted by the multiplicative model (P = 0.89). The A-rs13387042 risk allele tended toward the recessive, as evidenced by a significant deviation from the multiplicative model (P = 0.01). The corresponding population attributable risk (PAR) is about 14% for A-rs13387042 and 13% for T-rs3803662. The joint PAR is approximately 25%, which is substantial from a public health point of view. However, as the relative risks are low, they can explain only a small fraction of the familial clustering of the disease: the sibling risk ratios accounted for by A-rs13387042 and T-rs3803662 are 1.009 and 1.013, respectively. Hence, other important susceptibility variants remain to be identified.


We examined the LD blocks containing each of the new risk SNPs for genes and other features using publicly available databases. LD blocks were considered to extend between flanking recombination hotspots, as defined previously16, 17. For the LD block containing rs13387042, we did not find any known genes or human RNAs. We observed a single spliced EST (BF591107) that was originally cloned from colon tumor material. Moving proximally (left) outside the LD block, the nearest known genes to rs13387042 are TNP1 (181 kb proximal), IGFBP5 (345 kb proximal) and IGFBP2 (376 kb proximal). Moving distally outside the LD block, the nearest known gene is TNS1 (761 kb distal). Analysis in the Iceland 1 sample of Illumina Hap300 SNPs in the LD blocks containing these genes did not show any signals that could account for the observed signal at rs13387042.

The q arm of chromosome 16 is frequently lost in breast tumors, and there has long been a suspicion that one or more tumor suppressor genes may be present there18. The rs3803662 SNP on 16q12 occurs in the fourth exon of a poorly characterized mRNA, BC029912. The only known gene in the block is the 5' end of TNRC9, a member of the high mobility group family of non-histone chromatin proteins. Increased expression of TNRC9 is predictive of metastasis of breast cancer to bone4. Notably, estrogen receptor–positive tumors have a higher propensity to metastasize to bone than to other sites; indeed, estrogen receptor positivity is the strongest known histopathological predictor of bone metastases19, 20. The potential relationships between TNRC9, estrogen receptor positivity, bone metastases and rs3803662 remain to be elucidated.

[an error occurred while processing this directive]

Methods

Selection of affected individuals and controls.

Details of the selection and recruitment of affected individuals and controls are given in Supplementary Methods. Approval for this study was granted by the National Bioethics Committee of Iceland and the Icelandic Data Protection Authority and by the Institutional Review Boards of Zaragoza University Hospital, the Karolinska Institute, the Radboud University Nijmegen Medical Center, the University of Southern California and the University of Hawaii. All subjects provided written informed consent.

Illumina genotyping.

DNA samples were genotyped according to the manufacturer's instructions on Illumina Infinium HumanHap300 SNP bead microarrays containing 317,503 SNPs derived from Phase I of the International HapMap project. This chip provides about 75% genomic coverage in the Utah CEPH (CEU) HapMap samples for common SNPs at r2 greater than or equal to 0.8 (ref. 21). Of all the SNPs on the chip, 5,979 were deemed unsuitable either because they were monomorphic (that is, the minor allele frequency in the combined case and control set was <0.001) or had low (<95%) yield or showed a very significant distortion from Hardy-Weinberg equilibrium in the controls (P < 1 times 10-10). All of these problematic SNPs were removed from the analysis. Thus, 311,524 SNPs were used in the association analysis. Any chips with an overall call rate below 98% of the SNPs were also excluded from the genome-wide association analysis.

SNP genotyping and BRCA2 mutation detection.

Centaurus assays22 were designed for rs13387042 and rs3803662 and were validated by genotyping the HapMap CEU sample and comparing the genotypes with published data. The assays gave <1.5% mismatches with HapMap data. Genotyping of the BRCA2 999del5 mutation was carried out using an automated microsatellite-type PCR assay. All calls of the 999del5 mutation identified by the automated systems were confirmed by visual inspection of the primary signal traces. Primer sequences are given in Supplementary Table 5 online. All genotyping was carried out at the deCODE genetics facility.

Statistical methods.

We calculated the OR of a SNP allele assuming the multiplicative model: that is, assuming that the relative risk of the two alleles that a person carries multiplies. Allelic frequencies rather than carrier frequencies are presented for the markers. The associated P values were calculated with a standard likelihood ratio chi2 statistic as implemented in the NEMO software package23. Confidence intervals were calculated assuming that the estimate of the OR has a log-normal distribution. Joint analyses of multiple case-control replication groups were carried out using a Mantel-Haenszel model in which the groups were allowed to have different population frequencies for alleles or genotypes but were assumed to have common relative risks.

When calculating genotype-specific ORs, we estimated the genotype frequencies in the population assuming Hardy-Weinberg equilibrium. Potential interactions between loci were examined using correlation tests of allele counts and by case-control association of carriers and noncarriers. The PARs were calculated by treating the genotype-specific ORs obtained for the combined groups (populations) as relative risks and by defining the allelic frequencies in controls as simple (arithmetic) averages of the allelic frequencies observed in each population group separately. The joint PAR was calculated as 1 - (1 - PAR1) times (1 - PAR2), where PAR1 and PAR2 are the individual PARs for each SNP calculated under the full model and assuming no interaction between the SNPs.

Some Icelandic affected individuals and controls are related, both within and between groups, causing the chi2 test statistic to have a mean >1 and a median larger than 0.6752. We estimated the inflation factor for Iceland 1 using a method of genomic control10 by calculating the average of the observed chi2 statistics for the genome-wide SNP set, which accounts for relatedness and for potential population stratification. For Iceland 2, which was not typed with a genome-wide set of markers, the inflation factor was estimated by simulating genotypes through the Icelandic genealogy24. The estimated inflation factors were 1.105 for Iceland 1 and 1.11 for Iceland 2. The estimated inflation factor for the joint analyses of the Iceland 1 and Iceland 2 sample sets was 1.08, obtained by simulation.

Tests of heterogeneity of pathological subtypes were performed by assuming that allele frequencies were the same in all subtypes under the null hypothesis but that each subtype had a different allele frequency under the alternative. Joint analyses of multiple groups of cases were performed using an extended Mantel-Haenszel model that corresponds to a polytomous logistic regression using the group indicator as a covariate.

All P values are reported as two-sided.

Requests for materials:

Email: simon.stacey@decode.is or Email: kari.stefansson@decode.is

Note: Supplementary information is available on the Nature Genetics website.

Simon N Stacey, Andrei Manolescu, Patrick Sulem, Thorunn Rafnar, Julius Gudmundsson, Sigurjon A Gudjonsson, Gisli Masson, Margret Jakobsdottir, Steinunn Thorlacius, Agnar Helgason, Katja K Aben, Luc J Strobbe, Marjo T Albers-Akkers, Dorine W Swinkels, Brian E Henderson, Laurence N Kolonel, Loic Le Marchand, Esther Millastre, Raquel Andres, Javier Godino, Maria Dolores Garcia-Prats, Eduardo Polo, Alejandro Tres, Magali Mouy, Jona Saemundsdottir, Valgerdur M Backman, Larus Gudmundsson, Kristleifur Kristjansson, Jon T Bergthorsson, Jelena Kostic, Michael L Frigge, Frank Geller, Daniel Gudbjartsson, Helgi Sigurdsson, Thora Jonsdottir, Jon Hrafnkelsson, Jakob Johannsson, Thorarinn Sveinsson, Gardar Myrdal, Hlynur Niels Grimsson, Thorvaldur Jonsson, Susanna von Holst, Barbro Werelius, Sara Margolin, Annika Lindblom, Jose I Mayordomo, Christopher A Haiman, Lambertus A Kiemeney, Oskar Th Johannsson, Jeffrey R Gulcher, Unnur Thorsteinsdottir, Augustine Kong & Kari Stefansson

  1. Amundadottir , L.T. et al. Cancer as a complex phenotype: pattern of cancer distribution within and beyond the nuclear family. PLoS Med. 1, e65 (2004). | Article | PubMed |
  2. Lichtenstein , P. et al. Environmental and heritable factors in the causation of cancer–analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med. 343, 78–85 (2000). | Article | PubMed | ISI | ChemPort |
  3. Cannon-Albright , L.A. et al. Familiality of cancer in Utah. Cancer Res. 54, 2378–2385 (1994). | PubMed | ChemPort |
  4. Smid , M. et al. Genes associated with breast cancer metastatic to bone. J. Clin. Oncol. 24, 2261–2267 (2006). | Article | PubMed | ISI | ChemPort |
  5. Easton , D.F. How many more breast cancer predisposition genes are there? Breast Cancer Res. 1, 14–17 (1999). | Article | PubMed | ChemPort |
  6. Balmain , A. , Gray , J. & Ponder , B. The genetics and genomics of cancer. Nat. Genet. 33 Suppl., 238–244 (2003). | Article |
  7. Pharoah , P.D. et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat. Genet. 31, 33–36 (2002). | Article | PubMed | ISI | ChemPort |
  8. Breast Cancer Association Consortium. Commonly studied single-nucleotide polymorphisms and breast cancer: results from the Breast Cancer Association Consortium. J. Natl. Cancer Inst. 98, 1382–1396 (2006).
  9. Cox , A. et al. A common coding variant in CASP8 is associated with breast cancer risk. Nat. Genet. 39, 352–358 (2007). | Article | PubMed | ChemPort |
  10. Devlin , B. & Roeder , K. Genomic control for association studies. Biometrics 55, 997–1004 (1999). | Article | PubMed | ISI | ChemPort |
  11. Lohmueller , K.E. , Pearce , C.L. , Pike , M. , Lander , E.S. & Hirschhorn , J.N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003). | Article | PubMed | ISI | ChemPort |
  12. Trikalinos , T.A. , Ntzani , E.E. , Contopoulos-Ioannidis , D.G. & Ioannidis , J.P. Establishment of genetic associations for complex diseases is independent of early study findings. Eur. J. Hum. Genet. 12, 762–769 (2004). | Article | PubMed | ISI | ChemPort |
  13. Skol , A.D. , Scott , L.J. , Abecasis , G.R. & Boehnke , M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38, 209–213 (2006). | Article | PubMed | ISI | ChemPort |
  14. Johannsson , O.T. et al. Tumour biological features of BRCA1-induced breast and ovarian cancer. Eur. J. Cancer 33, 362–371 (1997). | Article | PubMed | ISI | ChemPort |
  15. Honrado , E. , Benitez , J. & Palacios , J. Histopathology of BRCA1- and BRCA2-associated breast cancer. Crit. Rev. Oncol. Hematol. 59, 27–39 (2006). | PubMed |
  16. McVean , G.A. et al. The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584 (2004). | Article | PubMed | ISI | ChemPort |
  17. Winckler , W. et al. Comparison of fine-scale recombination rates in humans and chimpanzees. Science 308, 107–111 (2005). | Article | PubMed | ISI | ChemPort |
  18. Rakha , E.A. , Green , A.R. , Powe , D.G. , Roylance , R. & Ellis , I.O. Chromosome 16 tumor-suppressor genes in breast cancer. Genes Chromosom. Cancer 45, 527–535 (2006). | Article | PubMed | ChemPort |
  19. James , J.J. et al. Bone metastases from breast carcinoma: histopathological - radiological correlations and prognostic features. Br. J. Cancer 89, 660–665 (2003). | Article | PubMed | ISI | ChemPort |
  20. Koenders , P.G. et al. Steroid hormone receptor activity of primary human breast cancer and pattern of first metastasis. The Breast Cancer Study Group. Breast Cancer Res. Treat. 18, 27–32 (1991). | Article | PubMed | ISI | ChemPort |
  21. Barrett , J.C. & Cardon , L.R. Evaluating coverage of genome-wide association studies. Nat. Genet. 38, 659–662 (2006). | Article | PubMed | ISI | ChemPort |
  22. Kutyavin , I.V. et al. A novel endonuclease IV post-PCR genotyping system. Nucleic Acids Res. 34, e128 (2006). | Article | PubMed | ChemPort |
  23. Gretarsdottir , S. et al. The gene encoding phosphodiesterase 4D confers risk of ischemic stroke. Nat. Genet. 35, 131–138 (2003). | Article | PubMed | ISI |
  24. Grant , S.F. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nat. Genet. 38, 320–323 (2006). | Article | PubMed | ISI | ChemPort |