Clin Invest Med 1997; 20(1): 41-49.
[résumé]
This article was presented as part of the Canadian Society for Clinical Investigation's theme symposium, Susceptibility to Common Disease, held Sept. 14, 1995, in Montreal.
Reprint requests to: Dr. L. Leigh Field, University of Calgary Health Sciences Centre, 3330 Hospital Dr. NW, Calgary AB T2N 4N1; fax 403 283-4841; or Email Dr. L. Leigh Field field@acs.ucalgary.ca
[Table of contents]
Résumé
Le diabète sucré insulino-dépendant (DSID), aussi appelé diabète de type 1 ou diabète juvénile, est un des premiers troubles d'origine génétique complexe que les chercheurs ont commencé à percer. Il y a plus de 20 ans, on a découvert que la région HLA contenait un site important qui agit sur la prédisposition au DSID et l'on a identifié il y a une décennie un site à effet plus restreint dans la région du gène de l'insuline. Avec l'avènement de nombreux marqueurs satellites microbiologiques qui conviennent au dépistage du génome, on a signalé, depuis la fin de 1994, 6 autres sites qui agissent sur la susceptibilité au DSID. Ce document résume les progrès réalisés et insiste plus particulièrement sur la recherche effectuée par Field et ses collaborateurs. Certains des nouveaux sites semblent prédisposer des sujets au DSID indépendamment du HLA et peuvent être des facteurs importants chez les familles atteintes de DSID qui ne sont pas très susceptibles au HLA. D'autres sites peuvent interagir pour provoquer la susceptibilité et certaines combinaisons peuvent être particulièrement diabétogènes. Même si les gènes prédisposants réels du DSID sont plus difficiles à isoler que ceux qui agissent dans des cas de troubles génétiques à site unique, le fait qu'on peut identifier les gènes en utilisant un nombre raisonnable de familles est très encourageant pour les recherches à venir sur d'autres troubles génétiquement complexes.
[Table of contents]
The genetic basis of IDDM
Insulin-dependent diabetes mellitus (IDDM) results from T cell-mediated autoimmune destruction of the insulin-producing ß-cells of the pancreas. There appear to be two phases of the disease: a preclinical phase involving infiltration of T cells into the islets of the pancreas and development of insulitis (marked by autoantibodies to a variety of islet-cell components), followed by ß-cell destruction and clinically overt diabetes mellitus. Daily administration of exogenous insulin is then required for survival. The islet-specific antigen that induces the autoimmune response is as yet unidentified, but it is thought that this autoimmune response occurs only in people who are genetically susceptible. The prevalence of IDDM is highest among people of European ancestry, whom we will treat as the referral population for the discussions that follow.
Evidence of a genetic basis for IDDM susceptibility comes from studies showing a higher prevalence in close relatives of people with IDDM (about 6% in siblings by age 30, according to a review by Thomson and associates[1]) than in the general population (about 0.4% by age 30[2]). Of course, close relatives also share a similar environment, which could account for the higher frequency of IDDM among relatives of people with IDDM. A more specific demonstration of a genetic basis for IDDM predisposition is that there is higher frequency of IDDM in monozygotic than in dizygotic twins of people with IDDM.[3] However, the importance of nongenetic factors can be appreciated by considering the concordance rate in monozygotic twins, which has been estimated at 34% by age 30.[4] This corresponds to an average penetrance (the probability of being affected, given a susceptible genotype) of about 50%.[5] Little is known about the nature of these nongenetic factors, but stochastic processes (e.g., in the development of the immune repertoire or in pancreatic ß-cell growth) or environmental agents (e.g., exposure to proteins in cow's milk[6] or to viruses) may be involved.
[Table of contents]
Methods for detecting predisposing genes
Before describing the current knowledge regarding genes influencing IDDM susceptibility, we review the primary methods used to detect genes that predispose people to complex disorders. These methods include association analysis (either population- or family-based methods) and linkage analysis (either maximum-likelihood "lod" score or affected sibling pair methods).
Association analysis (population-based)
Standard association tests compare marker allele frequencies in unrelated cases and in unrelated controls. Identification of a significant difference between cases and controls suggests that a genetic locus at or near the marker locus influences disease predisposition, the association being the result of linkage, with linkage disequilibrium between the marker and disease loci. Linkage disequilibrium means that combinations of alleles at 2 linked loci are nonrandom -- in this case, chromosomes with the mutant allele at the disease locus carry certain marker alleles more often than others. However, the population-based method can produce a spurious association if the cases and controls are not accurately matched for ethnic background; inaccurate matching can occur if there is unrecognized stratification in the population being sampled. This phenomenon is illustrated by association studies of non-insulin-dependent diabetes mellitus (NIDDM) in Mexican-Americans. Initial studies reported numerous significant associations between NIDDM and various genetic markers. However, NIDDM is more prevalent in native American peoples than in Europeans, and Mexican-Americans with NIDDM (cases) tended to have a higher proportion of native American ancestry than those without NIDDM (controls). Thus, due to the ethnic stratification in the Mexican-American population, differences in marker frequencies between cases and controls actually reflected differences between native American peoples and Europeans, and were not relevant to NIDDM.
Association analysis (family-based)
To avoid a spurious association resulting from the use of an ethnically mismatched control group, Field and associates[79] and Thomson and associates[10,11] have developed an association analysis method involving family data, which uses an internal control group. In the affected-family-based controls (AFBAC) method,[12] the four parental alleles in each family are categorized into 1 of 2 groups: those transmitted to any child in the family with the disease (T) and those not transmitted to any affected child (N). Alleles of group N are an unbiased sample of genes from the general population.11 The null hypothesis of no association is tested by comparing allele frequencies in groups N and T, using a 2 × n contingency table (where n is the number of different alleles). If an association is found, differences in maternal versus paternal contributions to susceptibility are tested by comparing transmitted (T group) alleles that originate from mothers with those that originate from fathers. In families with 2 children with the disease, a computer program created for the AFBAC method[11] allows researchers to either: (1) weight the transmissions to affected children so that alleles transmitted to 2 affected children are given twice the weight of alleles transmitted to only 1 affected child (this "multiplex" technique produces unbiased control allele frequencies); or (2) consider each affected child independently, as if he or she were in his or her own single-affected-child family (this double simplex technique is more powerful for detecting association, but produces conservative control allele frequencies). There is a simple test to verify a suspected association; like AFBAC, this test can be used to detect parental sex-specific effects. The frequency of transmission of the "associated" allele to affected children from parents heterozygous for that allele is compared with the expected frequency of 50%.[8,1315] This association test, involving only parents who are heterozygous for specific marker alleles, is less sensitive to very recent strong admixture effects, which may adversely influence the AFBAC method.[11] However, we prefer the AFBAC test for screening for disease association at microsatellite loci, since it is more powerful and it considers all marker alleles simultaneously.
Linkage analysis (lod score method)
Linkage analysis is designed to determine whether there is significant evidence of cosegregation of alleles at a marker locus and of those at a hypothetical disease-susceptibility locus. In the maximum-likelihood lod score method, the probability of linkage between marker and disease loci (at a certain recombination fraction, theta) is compared with the probability of no linkage (theta = 50%). The base-10 logarithm of this odds ratio is the "lod" (log of odds) score. By convention, when analysing linkage between a marker locus and a simple single-locus disorder, a lod score greater than or equal to 3.0 is considered significant evidence of linkage. The theta value at which the lod score is maximal is the best estimate of the true recombination fraction between the disease and marker loci. The lod score method requires the user to specify parameters for the hypothetical disease locus (e.g., mode of inheritance, penetrance and disease allele frequency). Hence, this linkage analysis method is often called "model-dependent."
Linkage analysis (affected sibling pair method)
For linkage analysis of a genetically complex disorder, for which the mode of inheritance is unknown, model-independent methods have been advocated. One such method is to compare the mean proportion of genes at the marker locus that sibling pairs with the disease share, by inheriting the same gene from their mother or father or both, with the shared proportion expected by chance (50% in the case of siblings).[16] If affected sibling pairs share a signficantly higher-than-expected proportion of genes at the marker locus, this suggests that the region containing the marker locus also contains a locus influencing disease susceptibility. The observed marker sharing in affected sibling pairs can also be converted to a maximum lod score equivalent (MLS).[1719]
HLA class II region (IDDM1)
For more than 2 decades, we have known that certain HLA genotypes increase the risk of a person having IDDM. Many studies have shown that people of European ancestry with IDDM have higher frequencies of HLA-DR3 and DR4 (variants at the class II HLA-DRB1 locus on chromosome 6p21) than the general population. For example, 96% of IDDM patients in Calgary had at least 1 of these alleles (compared with an expected frequency of 45%) and, even more dramatic, 38% had both alleles (compared with an expected frequency of 3%).20 This excess of DR3/DR4 heterozygotes, relative to either homozygote, led to the hypothesis that there are at least 2 different HLA-region alleles that affect susceptibility and have synergistic actions: DR3 and DR4, or genes in linkage disequilibrium with them. The DR1/DR4 genotype was also found at increased frequencies among people with IDDM.[7] Among DR4 haplotypes found in people with IDDM, there was an increased frequency of DQw8 at the nearby HLA-DQB1 locus.[21] Todd, Bell and McDevitt[22] and others showed that several HLA haplotypes associated with IDDM (including DR4-DQw8) encode an amino acid other than aspartate at position 57 of the DQB chain, suggesting that DQB1 is the primary susceptibility locus. However, Sheehy and associates[23] showed that DR4 haplotypes encoding both DQw8, and Dw4 or Dw10 T-cell-defined subtypes of DR4, are more diabetogenic than DR4 haplotypes encoding only 1 of these, which suggests that DQB1 and DRB1 together may confer susceptibility. HLA-DQA1 may also be involved;24 interaction between DQA1 and DQB1 may explain the synergy of DR3- and DR4-associated susceptibility. In addition, the HLA-DP locus, which does not show significant linkage disequilibrium with DR-DQ, may confer susceptibility independently.[25,26] All of these studies were population-based association studies. Numerous linkage studies have also conclusively shown that a gene in the HLA region influences IDDM susceptibility. Payami and associates27 presented a compilation of data from the literature showing that 54% of 538 affected sibling pairs shared 2 HLA haplotypes, a highly significant finding when compared with the expected frequency of 25%.
The precise identity of the HLA-region predisposing gene or genes is not yet clear. Different combinations of HLA-region alleles may confer varying degrees of susceptibility and resistance. There is also evidence that the sex of the parent transmitting the HLA-region gene may influence the strength of the susceptibility effect. For example, Field[14] reported that children with IDDM inherited DR4 from a DR4-heterozygous mother less often than from a DR4-heterozygous father (p = 0.001). This finding suggests that DR4-associated susceptibility transmitted through mothers is less diabetogenic than that transmitted through fathers.
Non-HLA-region genes
Theoretical modelling suggests the existence of IDDM-predisposing genes outside the HLA region.[28] This possibility is also suggested by the higher risk of IDDM among monozygotic twins of people with IDDM (34% chance of IDDM) than among HLA-identical siblings of people with IDDM, who share only half of their non-HLA genes (14% chance of IDDM). The total number of IDDM-susceptibility loci is unknown. A genome screen of the nonobese diabetic (NOD) mouse, a model for human IDDM, uncovered linkage evidence for 10 distinct loci;[29] there could be as many or more predisposing loci in humans.
Insulin-gene region (IDDM2)
About a decade ago, studies of unrelated patients with IDDM produced evidence that a variable number tandem repeat (VNTR) marker in the 5´ flanking region of the insulin gene (INS) on chromosome 11p15 was associated with IDDM. The frequency of short repeat "class 1" alleles of the VNTR marker was significantly greater in people with IDDM than in unrelated controls (e.g., 88% versus 67%, according to a study by Bell, Horita and Karam[30]). Paradoxically, analysis of INS-region gene sharing in pairs of affected siblings has revealed no evidence of linkage between IDDM and INS-region genes,[8,31] which initially caused investigators to believe that the association results were erroneous, a consequence of mismatched controls. However, association analysis with the AFBAC family-based method verified the association between IDDM and the INS VNTR marker.[9,10] Indeed, the class 1 transmitted and nontransmitted frequencies were remarkably similar (e.g., 89% versus 71%9) to the class 1 frequencies reported among randomly selected people with IDDM and controls.
These results suggest that association analysis may be more sensitive than linkage analysis for detecting the effects of some genes that predispose people to complex disorders. These findings also emphasize the importance of performing careful association studies before considering any region of the genome void of IDDM-susceptibility loci. In other words, "linkage exclusion maps" for complex traits may be counterproductive.
The cause of the INS-region association with IDDM remains unclear. The 5´ VNTR marker itself may be a regulatory element;[32] recent studies have suggested that high-risk class 1 alleles are correlated with increased insulin-gene transcription levels,[33,34] although some data are inconsistent with this hypothesis.[35] The VNTR marker may affect transcription of some other gene, such as the tightly linked insulin-like growth factor 2 (IGF2) locus. Insulin autoantibodies (IAA), found in about 40% of newly diagnosed cases of IDDM,[36] were not associated with the INS-region genotype in a study of 105 people with newly diagnosed IDDM (Field and Palmer: unpublished data, 1990).
[Table of contents]
Search for new IDDM-susceptibility genes
Since late 1994, rapid progress has been made in identifying new genes that contribute to IDDM predisposition. Reports of at least 8 additional susceptibility loci (IDDM3, IDDM4, IDDM5, IDDM7, IDDM8, IDDM11, IDDM12 and IDDM13) bring to 10 the current total number of IDDM-susceptibility genes assigned unique symbols by the Human Gene Mapping Nomenclature Committee and published with complete experimental details.
Field and associates have been performing a genome screen using 250 families with 2 or more children with IDDM. These families were identified through diabetes clinics across Canada, the British Diabetes Association (BDA) Warren Repository37 and the Human Biological Data Interchange (HBDI) in Philadelphia.[38] The genetic markers tested for linkage with susceptibility to IDDM were primarily simple sequence repeat polymorphisms such as (CA)n repeats, also called microsatellites.[39] In these families, the expected strong linkage between IDDM and the HLA-DR/DQ loci (IDDM1) was demonstrated; with a disease allele frequency of 0.25 and an intermediate penetrance model, the maximum lod score was 17.4.[40] The study of these families failed to demonstrate significant evidence of linkage between IDDM and IDDM2 with the use of a microsatellite at the tyrosine hydroxylase (TH) locus near the insulin-gene VNTR marker, either by lod score or affected sibling pair linkage analysis.[31] However, Field and Nagatomi detected a significant association of TH with IDDM through AFBAC analysis ("double simplex" analysis, p = 0.013) (unpublished data, 1995). Thus, this set of families again showed that IDDM2 was detectable by association, but not by linkage, with the TH rather than the VNTR marker.
IDDM3 was localized to chromosome 15q26 by linkage to marker D15S107, with a maximum lod score of 2.5 and p value from analysis of affected sibling pairs of 0.001.[40] Its localization has been confirmed in 2 independent datasets, 1 consisting of 31 US families, with a p value from analysis of affected sibling pairs of 0.020,[41] and another consisting of 81 Danish families, with a p value from analysis of affected sibling pairs of 0.010.[42] An interesting feature of IDDM3 is that most of the evidence of linkage was found in families who had little evidence of predisposition through HLA-region genes (i.e., in whom affected sibling pairs did not show increased sharing of HLA genes).[40] These results suggest that the IDDM3 susceptibility locus and the HLA-region predisposition act in a biologically independent manner. Field, Tobias and Magnus[40] did not find a significant association between IDDM and D15S107, indicating that the marker is not close to IDDM3 or that linkage disequilibrium is weak or undetectable.
The localization of IDDM4 near the fibroblast growth factor 3 (FGF3) locus on chromosome 11q13 was reported simultaneously and independently by 3 research groups.[18,40,43] In the dataset of Field, Tobias and Magnus,[40] linkage between IDDM4 and FGF3 (maximum lod score 1.1, affected sibling pairs p = 0.004) was not as strong as that between IDDM3 and D15S107, and there was no evidence of differences in IDDM4 linkage between HLA-defined subsets of families. Further analysis by Field and associates has produced evidence of association between IDDM and D11S480 (p = 0.012). D11S480 is located about 6 cM proximal of FGF3. In the HBDI subset of families, D11S480 demonstrated stronger evidence of linkage than did FGF3. It is possible that IDDM4 is closer to D11S480 than to FGF3. Interestingly, Luo and associates[41] reported that the strongest linkage to IDDM4 was obtained with D11S1337, a marker 5 cM proximal to FGF3.
Localization of IDDM5 near the estrogen receptor (ESR) locus on chromosome 6q25 was reported by Davies and associates.[18] Field and Tobias found weak evidence of linkage to ESR in their 25 Canadian families (maximum lod score 0.15), whereas evidence from all 250 families was clearly positive (maximum lod score 2.0, analysis of affected sibling pairs p = 0.001). No evidence of association with ESR was detected.
IDDM7 on chromosome 2q31-q33 was localized independently by 2 groups, Owerbach and Gabbay,[44] by linkage to the HOXD8 locus (MLS 2.6), and Copeman and associates,[45] by association and linkage with marker D2S152 (MLS 1.3). Antibodies to glutamate decarboxylase (GAD) appear early in the preclinical phase of IDDM, leading some to suggest that GAD could be the initiating target of autoimmunity.[46] GAD1, which encodes GAD67, has been mapped to 2q31[47] and is thus a candidate for IDDM7. Field and Tobias have confirmed linkage to 2q with the use of the marker D2S103 in their families (maximum lod score 1.4, analysis of affected sibling pairs p = 0.001).
Luo and associates[41] localized IDDM8 to 6q25-q27, more than 28 cM distal to IDDM5, by linkage to D6S446 (MLS 2.8) and D6S264 (MLS 2.0). Davies and associates[18] also reported evidence of linkage to marker D6S264. It therefore appears that there may be 2 distinct IDDM-susceptibility loci on the long arm of chromosome 6.
Recently, Field, Tobias and Thomson[19] reported evidence of a new IDDM-susceptibility locus (assigned the symbol IDDM11 by the Human Gene Mapping Nomenclature Committee), for which there is highly significant evidence of linkage to D14S67 on chromosome 14q24.3-q31 from affected sibling pair analysis (p = 0.00001) and by maximum likelihood analysis (lod score 4.0). As with IDDM3, this locus appears to be most important in families with less evidence of HLA involvement. In 99 families in whom affected sibling pairs did not have increased sharing of HLA genes (HLA sharing was 50% or less), the lod score rose to 4.6, whereas in 147 families in whom affected pairs had increased HLA sharing (HLA sharing was more than 50%), the lod score dropped to 0.4.[19] The difference in strength of linkage to D14S67 between the HLA-defined subsets of families was statistically significant (p = 0.009). Field's laboratory has typed markers in all reported IDDM regions except IDDM8; in the set of families studied, IDDM11 demonstrated stronger evidence of linkage than any susceptibility locus other than HLA. No significant association between IDDM and D14S67 was found.
The heterogeneity of IDDM11 linkage between families with and without increased HLA sharing suggests that IDDM11 and HLA act in a biologically independent manner (perhaps through independent biochemical pathways) to affect susceptibility. Thus, families with a strong HLA-region predisposition are less likely to carry strong IDDM11 predisposition (although they may), and families with strong IDDM11 predisposition are less likely to also carry strong HLA-region predisposition (although they may). Other IDDM-susceptibility genes may act together, synergistically increasing the risk of IDDM. For example, Field and associates[48] presented evidence that families with increased sharing at D15S107 (a marker for IDDM3) had a higher frequency of increased sharing at D14S67 (marker for IDDM11) than did families without increased sharing at D15S107. Furthermore, although in the total dataset there was no association between IDDM and D14S67, in the set of families with increased D15S107 sharing, IDDM was significantly associated with D14S67 (p = 0.015). These results suggest that IDDM11 and IDDM3 may interact to produce susceptibility.
Most recent reports suggest that there is more than 1 IDDM-susceptibility locus on the long arm of chromosome 2. Indeed, names have been assigned to 2 more (IDDM12 and IDDM13) in addition to IDDM7. IDDM12 was reported to be linked to (MLS 3.2) and associated with the CTLA-4 (cytotoxic T lymphocyte associated-4) locus at 2q33, about 10 cM distal to IDDM7, by Nistico and associates[49] in Italian and Spanish datasets, but not in US, British or Sardinian datasets. This group of investigators suggested that IDDM7 and IDDM12 are distinct loci, since evidence of association was detected at both regions in the Italian data. Morahan and associates[50] localized IDDM13 in Australian families by linkage to D2S164 at 2q34 (MLS 3.3), but found no significant evidence of linkage to CTLA-4. Interestingly, they reported that most of the IDDM13 linkage evidence was contained in affected sibling pairs who did not share both HLA haplotypes, a phenomenon identical to that which Field and associates[19,40] observed for IDDM3 and IDDM11.
[Table of contents]
Issues in the genetics of complex disorders
This review raises several issues that are complicating our attempts to dissect the genetics of complex disorders. One of the most problematic and controversial is What constitutes sufficiently strong evidence to conclude the existence of a susceptibility locus? It is now very apparent that there is linkage heterogeneity between datasets for the non-HLA IDDM loci, which may be due to (1) lack of true linkage, (2) sampling variation, causing successful replication to be more difficult than initial detection of linkage, (3) population variation (differing frequencies of various susceptibility loci between different ethnic groups studied) or (4) a combination of (2) and (3). For example, in the case of IDDM3, there is significant evidence of linkage from the combined Canadian, British and US data of Field, Tobias and Magnus,[40] confirmation of linkage by Luo and associates[41] in an independent US dataset and by Zamani and associates[42] in a Danish dataset, but lack of evidence of linkage in Italian families analysed by Luo and associates.[51] This may reflect interpopulation, i.e., ethnic, variation. Similarly, Field, Tobias and Thomson[19] saw no evidence of linkage to IDDM11 in their Canadian families, even though there was strong and significant evidence in British and US families. This finding may reflect sampling variation, or a mixture of sampling and ethnic variation. However, it is now clear that large numbers of families with differing genetic backgrounds are needed to determine the causes of observed linkage heterogeneity for any particular IDDM-susceptibility locus. Although guidelines have been proposed,[52,53] the scientific community has not yet reached consensus on which criteria to use to declare a disease locus "real" when we are faced with linkage heterogeneity for loci involved in multigene, complex disorders.
Since we do not know how many loci actually predispose people to IDDM, when do we conclude that a region of the genome can be excluded from further consideration? Given that IDDM2 was detected by association alone and has never been demonstrated through simple linkage analysis, we must be cognizant of the fact that other susceptibility loci may be missed if we rely solely on genome screening through linkage analysis. Furthermore, the ever-present possibility of sampling or ethnic heterogeneity means that exclusion mapping should be approached very cautiously.
A final issue looms on the horizon. How are we going to further localize these susceptibility loci, so that they can be cloned and characterized? Even if a strong candidate locus presents itself in the vicinity of a detected linkage, it may take a long time to establish or eliminate its involvement. For most susceptibility loci, we are not dealing with mutant alleles in people with IDDM and wildtype alleles in unaffected people. Rather, some alleles are more or less predisposing than others. Thus, recombinants cannot be identified within families to narrow the region containing a susceptibility locus. Furthermore, since the effects of these non-HLA loci are relatively weak, and families with multiple affected members usually have only 2 (as opposed to numerous) affected members, it is difficult to tell which families are segregating at which susceptibility loci. Researchers hope that linkage disequilibrium mapping, with the use of association analysis, will at least lead us closer to the culprit loci. However, it is not yet clear whether this approach will enable the disease-susceptibility genes to be physically isolated. Most likely a combination of approaches, including functional analyses at the molecular level, will be required.
The genetic basis of IDDM is beginning to be unravelled. Ten susceptibility loci have been localized, and undoubtedly a few more await discovery. HLA-region susceptibility was the first identified and remains the strongest. Therefore, we may characterize the inheritance of IDDM as due to a major locus, with several independent or interacting minor loci. The challenges of the future will be to determine which combinations of IDDM genes are particularly diabetogenic, to isolate the actual coding sequences involved and to delineate their biological functions and interrelations.
Acknowledgements
We thank Drs. Eleanor Colle, Heather Dean, Monique Gonthier, Robert McArthur, Ed Ryan, Dave Stephure, Laura Stewart and Becky Trussell for assisting in the collection of Canadian families; Elzbieta Swiergala and Jane Nagatomi for laboratory expertise; Tim Magnus, Mark Nelson, Glenys Thomson, and Zhao Zhang for computing contributions; Rose Hodorek for careful secretarial assistance; and all the families who generously participated in this research.
This research was funded by the Medical Research Council of Canada (grant MT-7910) and by the Canadian Genetic Diseases Network of the Network of Centres of Excellence program of the Canadian and Alberta governments. Dr. Field is a Scientist of the Alberta Heritage Foundation for Medical Research.
[Table of contents]
References