Environ Health PerspectEnviron. Health PerspectEnvironmental Health Perspectives0091-67651552-9924National Institute of Environmental Health Sciences17384770181770510.1289/ehp.8870ehp0115-000231ResearchA Statistical Model for Assessing Genetic Susceptibility as a Risk Factor in Multifactorial Diseases: Lessons from Occupational AsthmaDemchukEugene12*YucesoyBerran2*JohnsonVictor J.2*AndrewMichael3WestonAinsley2GermolecDori R.4De RosaChristopher T.1LusterMichael I.2 Division of Toxicology and Environmental Medicine, Agency for Toxic Substances and Disease Registry, Centers for Disease Control and Prevention, Atlanta, Georgia, USA Toxicology and Molecular Biology Branch and Biostatistics and Epidemiology Branch, Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Centers for Disease Control and Prevention, Morgantown, West Virginia, USA Toxicology Operations Branch, Environmental Toxicology Program, National Institute of Environmental Health Sciences, National Institutes of Health, Department of Health and Human Services, Research Triangle Park, North Carolina, USAAddress correspondence to B. Yucesoy, Chronic Inflammatory and Immune Disease Team, Toxicology and Molecular Biology Branch, Health Effects Laboratory Division, NIOSH/CDC, 1095 Willowdale Rd., M/S 3014, Morgantown, WV 26505-2888 USA. Telephone: (304) 285-5993. Fax: (304) 285-5708. E-mail: byucesoy@cdc.gov

These authors provided equal contributions.

The authors declare they have no competing financial interests.

2200713112006115223123421112005131120062007Publication of EHP lies in the public domain and is therefore without copyright. All text from EHP may be reprinted freely. Use of materials published in EHP should be acknowledged (for example, ?Reproduced with permission from Environmental Health Perspectives?); pertinent reference information should be provided for the article from which the material was reproduced. Articles from EHP, especially the News section, may contain photographs or illustrations copyrighted by other commercial organizations or individuals that may not be used without obtaining prior approval from the holder of the copyright. Background

Incorporating the influence of genetic variation in the risk assessment process is often considered, but no generalized approach exists. Many common human diseases such as asthma, cancer, and cardiovascular disease are complex in nature, as they are influenced variably by environmental, physiologic, and genetic factors. The genetic components most responsible for differences in individual disease risk are thought to be DNA variants (polymorphisms) that influence the expression or function of mediators involved in the pathological processes.

Objective

The purpose of this study was to estimate the combinatorial contribution of multiple genetic variants to disease risk.

Methods

We used a logistic regression model to help estimate the joint contribution that multiple genetic variants would have on disease risk. This model was developed using data collected from molecular epidemiology studies of allergic asthma that examined variants in 16 susceptibility genes.

Results

Based on the product of single gene variant odds ratios, the risk of developing asthma was assigned to genotype profiles, and the frequency of each profile was estimated for the general population. Our model predicts that multiple disease variants broaden the risk distribution, facilitating the identification of susceptible populations. This model also allows for incorporation of exposure information as an independent variable, which will be important for risk variants associated with specific exposures.

Conclusion

The present model provided an opportunity to estimate the relative change in risk associated with multiple genetic variants. This will facilitate identification of susceptible populations and help provide a framework to model the genetic contribution in probabilistic risk assessment.

asthmageneticspolygenic diseasesrisk assessmentsusceptibility genes

Common diseases of a chronic inflammatory nature such as asthma, Alzheimer disease, and cardiovascular disease are complex in nature, as they are variably influenced by genetic inheritance as well as environmental, physical, and lifestyle factors. Although genetic variants and their interactions probably define most interindividual variability in common disease susceptibility related to genetics (Moore 2003; Newton-Cheh and Hirschhorn 2005), they generally possess low or incomplete penetrance and consequently show low-risk associations in epidemiologic studies [e.g., odds ratios (ORs) ~ 1.5–2] (Hirschhorn et al. 2002; Lohmueller et al. 2003). Thus, for genetic variants to significantly affect disease severity or incidence, they must act cumulatively. Applying the composite genetic contribution to the risk assessment process would allow for identifying the most genetically susceptible groups in the population. In light of this, a multiplicative gene–gene interaction model was developed to allow for estimating the combinatorial contribution of multiple genetic variants to disease risk. To illustrate the utility of this model, asthma was selected as an example of a common multifactorial disease as the pathological processes have been well established and a number of genetic variants that influence the disease have been identified in association studies. Data were compiled from 14 genetic association studies linking 16 susceptibility variants in inflammatory, immune, and chemical metabolism genes to the risk of developing disease. Our model predicts that a broad heterogeneity exists in the population disease risk defined by genetic variation. The broadened risk profile is amenable, however, to segregating the population by relative risk level, which should allow for identification of the most susceptible populations. The current limitations and assumptions of this approach, which include lack of joint distributions, limited information on epistasis and the influence of other potential variables, such as exposure, are discussed.

Materials and MethodsStudy design

Population-based genetic association studies deal with relatively small effects against a complex background. Therefore, association studies are often statistically under-powered and poorly standardized. General concerns include a lack of attention to sampling and study design, inconsistent criteria for clinical assessment, population stratification, the use of genetic markers that are only modestly correlated with disease, and publication bias. Considering these concerns, we extracted data from a public database (PubMed 2004) using the terms “asthma,” “polymorphism,” and “gene.” We included studies that followed standard asthma diagnosis criteria (physician-diagnosed asthma), used case–control study design, and described associations with p-values < 0.05 in the analyses to help limit potential false positive associations. The genetic variants we selected were not intended to be an exhaustive list of published variants of candidate genes that have been associated with asthma but rather representative of those in which significant associations have been repeatedly observed, known to cause changes in protein expression, and act through established pathways for allergic response (Blumenthal 2005; Malerba and Pignatti 2005). As reflected in the published literature, most of the variants included in the analyses are associated with increased risk for developing asthma rather than decreased risk. Hence, we included only one variant that is considered protective.

Although published genetic association studies have used a variety of methods for presenting results, we selected disease-associated variant genotypes as opposed to allele frequencies, as the relationship of the latter to disease has not been clearly defined. Most of the genes and chromosomal regions that have been associated with disease are linked to chromosomes 5q, 11q, 12q, and 6p. We stratified candidate genes into three groupings based on their role in the pathogenesis of asthma. The first group (12 variants) included genes related to inflammation and immune cascades known to be involved in allergic asthma, such as the interleukin 4 (IL-4) receptor variant R567. The second group consisted of atopy-associated gene variants contained within the human leukocyte antigen (HLA) class II family. The third grouping consisted of variants associated with chemical metabolism, represented by the N-acetyltransferase (NAT) polymorphism associated with slow acetylation. The genes and variants used in the analyses are presented in Table 1.

Statistical model

We modeled the single-gene variants listed in Table 1 as binary outcomes and generated polygenotypes from single-gene genotypes using a recursive binomial scheme. Under this scheme all possible permutations of single-gene polymorphisms are considered, and the total number of polygenic profiles is 2n, where n is the number of genes used in the analysis (sixteen in the present study). We estimated the frequencies of the genotype profiles from single-gene frequencies as a product of epidemiologically derived single-gene frequencies. Susceptibility to disease was expressed in terms of ORs. Polygenetic ORs were calculated from single-gene ORs under the assumption of genetic independence (absence of linkage disequilibrium); that is, for each variant, the enrichment or depletion of cases with that variant does not affect the frequency of any other variant. Therefore, single-gene frequencies multiply to estimate the frequency of polygenotypes. The model we proposed also assumes that the selected genes are biologically independent and thus, no epistasis at the level of protein function is considered. Thus, we used a logistic regression model without interaction cross-terms. This results in a multiplicative OR for a polygenotype in which the combinatorial genotype OR is generated simply by multiplying individual ORs for the variants that are present for a specific genotype profile.

Results

ORs obtained from 16 genetic variants reported to be associated with allergic asthma were used to estimate the contribution of genetic variation in disease risk. Each possible genotype in the population was assigned a categorical binary variable representing either the wild-type (0) or the variant (minor) genotype (X) identified from each of the selected studies. Thus, each possible combination can be represented as a 16-dimensional profile where, for instance, {XXXXXXXXXXXXXXXX} denotes a genotype profile that contains only minor variants. We obtained the frequency for each profile from the reported frequencies in each original study (Table 1). Control frequencies from each study were reported to be consistent with those found in the general population with similar ethnicities. Figure 1 summarizes the relationship between the frequency of each of the 65,536 (216) potential genotypic profiles and risk of developing allergic asthma under the described model and illustrates the concept that susceptibility variants can shift the risk distribution to the right or left depending upon whether the variant has an adverse or protective role, respectively. The various genotype profiles represented in Figure 1 are enriched with those genotypes that increase the risk of asthma, thus accounting for the right-sided skew in the scat-terplot. The arrow in this diagram indicates the location of the wild-type genotype profile {OOOOOOOOOOOOOOOO} with its associated OR of 1. It is evident that the frequency and magnitude of risk are highly correlated, such that very high-risk genotypes are exceedingly rare in the population and, in fact, the highest risk polygenotype is so rare that it is unlikely to even exist. The genotypes that have an OR < 1 are due to the inclusion of the protective –627 polymorphism in the interleukin 10 (IL-10) gene (Hang et al. 2003), which reduces the overall risk for developing asthma. The right-sided skew shown in Figure 1 is consistent with current evidence that the vast majority of identified variants have been associated with an adverse rather than protective contribution (Ober and Hoffjan 2006). It is not known whether these variants are evolutionarily driven or because adverse variants are more actively studied and identified than those that are protective.

Examination of a single susceptibility gene can separate the study population into only two risk groups, those with and those without the mutation. In contrast, modeling the impact of multiple disease variants associated with immune and inflammatory mediators of allergic asthma (group 1 variants) provides a pseudo-continuous log-normal relative disease risk distribution in the population (Figure 2A). Inclusion of variants associated with atopy (Figure 2B) and acetylation rate (Figure 2C) further shifts the distribution toward the higher risk. Equally evident is the impact of combining variants on the standard deviation of disease risk in the population. As we added more disease variants to the model, the risk distribution broadened, allowing better distinction of the population into high and low risk categories. The frequencies associated with such risk levels will be important in defining susceptible populations that need increased protection with respect to exposure, as well as for risk management.

The present model provided an opportunity to quantify the relative change in risk associated with the presence of genetic variants in the general population. This is exemplified in Figure 3 where the dashed gray line represents the risk profile for the most common genotypes modeled from the 12 asthma susceptibility genes (group 1 variants) and the solid blue line shows the risk profile when the NAT1 variant is added. These curves indicate that in individuals carrying the NAT1 mutation, the risk of asthma increases approximately 2-fold or more in 20% of the possible polygenotypes present in a population of workers exposed to diisocyanates. Acetylation rate is thought to affect the metabolism of diisocyanates, which in turn correlates with differences in diisocyanates-induced asthma rates (Wikman et al. 2002). If only those variants common to allergens (first group) are considered, one would estimate that 20% of the population would have at least 6-fold increase susceptibility relative to the referent genotype profile. Thus, this model allows for incorporation of exposure information as an independent variable, illustrating why variants such as those involved in atopy or chemical metabolism, would need to be included separately in identifying the number of individuals in a population at increased risk.

Discussion

We used a logistic regression model to estimate the joint contribution of multiple genetic variants on the risk of developing allergic asthma. Allergic asthma data sets were used because disease prevalence is relatively high—estimated to be approximately 7.5% (range, 5.2–10.3%) among the U.S. population (Mannino et al. 2002)—and the pathological processes as well as many of the disease mediators have been identified (Barrios et al. 2006). The latter allowed for an additional level of confidence in that the genetic variants selected for modeling are associated with well-established pathological processes. Although data sets from other common polygenic diseases may have sufficed, such as Alzheimer or cardiovascular disease, their pathological processes are less well defined.

Single-genotype ORs provided by genetic association studies is the available input to model the polygenotype–disease association. ORs are functions of the logistic regression coefficients. Thus, the logistic regression model, which is commonly used in epidemiology studies, provides a straightforward approach for combining single genotype ORs to model the combinatorial genotype ORs (Kleinbaum and Klein 2002). However, the accuracy of this model to capture true polygenic susceptibility remains to be determined. Currently, our laboratory in conjunction with a National Institute for Occupational Safety and Health–funded multicenter asthma genotype program (RO1 OH008795-01) centered at the University of Cincinnati is collecting data on multiple variants in a single population to help establish the validity of this model.

A major limitation of using a multiplicative interaction model to derive polygenic risk from single-gene studies is that epistatic relationships are not considered. Although the model assumes there is no statistical interaction, it does not account for potential biological interactions at the protein level that may modify risk. For example, epistasis likely plays a role in determining complex phenotypes such as allergic asthma. However, epistatic relationships can be generated only from efforts to genotype functional variants in all potential target genes in a single population. This presents a potential problem because the population frequency of polygenotypes is generated from the product of single-gene frequencies, making complex polygenotypes very rare. Therefore, as the number of genes increases, the number of individuals required in order to estimate polygenic risk markedly increases, thereby necessitating the need for a modeling approach. This is especially true for occupational populations, given the low number of employees exposed to a given occupational allergen and the even lower incidence of disease. It is possible that the effects of epistasis in multifactorial diseases are relatively modest. For example, a recent epidemiologic study of breast cancer demonstrated that only 17% of three gene combinations showed statistical evidence of epistasis (Aston et al. 2005). More simple schemes to help define epistasis may involve interactions derived from genomic and proteomic data, which can allow for decoding transcriptional and posttranscriptional interaction networks (Johnson et al. 2004). As more reliable biological and epidemiologic information regarding joint effects and epistasis becomes available, new patterns of interaction can be added to the model, which will allow for more accurate risk estimates.

Genetic independence is another assumption when using this model. Linkage disequilibrium is the deviation from probabilistic independence between alleles at two different loci. This deviation from independence can have different causes, such as a lack of independent segregation or recombination, or any number of other evolutionary forces. Therefore, an association of a certain genetic marker with disease may reflect the etiologic role of the locus of interest but not of the marker itself. Since a multiplicative approach for the joint effects of genotypes between loci was assumed in this model, only the gene variants known not to be in linkage disequilibrium were considered.

The choice of mode of inheritance (allelic or genotypic) used for analyses can have a marked impact on risk estimates. Most genetic association studies reduce three genotypes to two by using recessive (assuming heterozygotes have no increased risk), co-dominant (a per-allele effect that places heterozygotes halfway between minor and major homozygous genotypes), or dominant genetic models (in which heterozygotes have the same increased risk as minor homozygous genotypes). However, some studies ignore the heterozygotes and compare only minor and major homozygous genotypes. Because the biological function of the variations is rarely known, it is difficult to determine the mode of inheritance. As indicated by Minelli et al. (2005), if the assumption of genetic model is in doubt, then the best approach would be to perform joint pair-wise comparison, that is, genotype associations. Therefore, using the disease-associated variant genotypes identified in the individual studies as opposed to decomposing the population into allele frequencies is an appropriate approach to capture and model the impact of multiple variants. As biological data regarding the inheritance modes of variants become available, a biologically justified strategy for incorporating each susceptibility variant can be applied.

In conclusion, the increased risk for developing a multifactorial disease based upon disease-susceptibility variants with moderate effects was estimated using a logistic regression model assuming multiplicative gene–gene interactions. Although limited by our current lack of knowledge regarding the role of gene–gene and gene–environment interactions in multifactorial common diseases, such a model, without interaction cross-terms, is the first step in the development of a comprehensive polygenic risk model. These types of analysis can provide information on the relative changes in risk associated with genetic variability found inherently in the population and help provide a framework to model the genetic contribution in probabilistic risk assessment. Such information may also provide opportunities for targeting preventative or therapeutic actions to high-risk populations. In a broader context, the polygenic model for genetic susceptibility contributes to the design of a virtual toxicology testing laboratory, which would help to reduce animal testing and adverse human exposures. With rapid advances in the identification of genetic variants in the population, underscored by the Human Genome and HapMap Projects (The International HapMap Consortium 2003; Pennisi 2001), advances in high throughput genotyping methodology and improved understanding of the molecular events involved in disease processes, key susceptibility polygenotypes driving risk for common complex diseases may be identified.

The findings and conclusions in this report are those of the authors and do not necessarily represent the views of NIOSH.

These studies were supported in part by an inter-agency agreement with the NIEHS, Division of Intramural Research (Y1-ES-69277266), and a grant from the CDC Office of Genomics and Disease Prevention (921Z4FP).

ReferencesAronYDesmazes-DufeuNMatranRPollaBSDusserDLockhartA1996Evidence of a strong, positive association between atopy and the HLA class II alleles DR4 and DR7Clin Exp Allergy2678218288842557AstonCERalphDALaloDPManjeshwarSGramlingBADeFreeseDC2005Oligogenic combinations associated with breast cancer risk in women under 53 years of ageHum Genet116320822115611867BarriosRJKheradmandFBattsLCorryDB2006Asthma: pathology and pathophysiologyArch Pathol Lab Med130444745116594736BlumenthalMN2005The role of genetics in the development of asthma and atopyCurr Opin Allergy Clin Immunol5214114515764904CuiTWangLWuJXieJ2003The association analysis of FRIβ with allergic asthma in a Chinese populationChin Med J (Engl)116121875187814687477Entrez Gene 2006. Entrez Gene Home Page. Bethesda, MD: National Center for Biotechnology Information. Available: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=gene [accessed 11 August 2006].GaoJLinYXiaoYXuKXuWZhuY2000Polymorphism of angiotensin-converting enzyme gene and genetic susceptibility to asthma with familial aggregationChin Med Sci J151242812899394HangLWHsiaTCChenWCChenHYTsaiJJTsaiFJ2003Interleukin-10 gene -627 allele variants, not interleukin-I beta gene and receptor antagonist gene polymorphisms, are associated with atopic bronchial asthmaJ Clin Lab Anal17516817312938145HigaSHiranoTMayumiMHiraokaMOhshimaYNambuM2003Association between interleukin-18 gene polymorphism 105A/C and asthmaClin Exp Allergy3381097110212911784HirschhornJNLohmuellerKByrneEHirschhornK2002A comprehensive review of genetic association studiesGenet Med42456111882781JohnsonCDBalagurunathanYTadesseMGFalahatpishehMHBrunMWalkerMK2004Unraveling gene-gene interactions regulated by ligands of the aryl hydrocarbon receptorEnviron Health Perspect11240341215033587KleinbaumDGKleinM 2002. Logistic Regression—A Self-Learning Text. New York:Springer-Verlag.LazarusRRabyBALangeCSilvermanEKKwiatkowskiDJVercelliD2004TOLL-like receptor 10 genetic variation is associated with asthma in two independent samplesAm J Respir Crit Care Med170659460015201134LohmuellerKEPearceCLPikeMLanderESHirschhornJN2003Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common diseaseNat Genet33217718212524541MalerbaGPignattiPF2005A review of asthma genetics: gene expression studies and recent candidatesJ Appl Genet4619310415741670ManninoDMHomaDMAkinbamiLJMoormanJEGwynnCReddSC2002Surveillance for asthma—United States, 1980–1999MMWR Surveill Summ511113MinelliCThompsonJRAbramsKRThakkinstianAAttiaJ2005The choice of a genetic model in the meta-analysis of molecular association studiesInt J Epidemiol3461319132816115824MooreJH2003The ubiquitous nature of epistasis in determining susceptibility to common human diseasesHum Hered561–3738214614241Newton-ChehCHirschhornJN2005Genetic association studies of complex traits: design and analysis issuesMutat Res5731–2546915829237OberCHoffjanS2006Asthma genetics 2006: the long and winding road to gene discoveryGenes Immun729510016395390PennisiE2001What’s next for the genome centers?Science29155071204120711233440PubMed 2004. PubMed Home Page. Bethesda, MD:National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health. Available: http://www.ncbi.nlm.nih.gov/entrez [accessed 10 December 2004]Rosa-RosaLZimmermannNBernsteinJARothenbergMEKhurana HersheyGK1999The R576 IL-4 receptor alpha allele correlates with asthma severityJ Allergy Clin Immunol10451008101410550746SilvermanESPalmerLJSubramaniamVHallockAMathewSValloneJ2004Transforming growth factor-beta1 promoter polymorphism C-509T is associated with asthmaAm J Respir Crit Care Med169221421914597484SzalaiCKozmaGTNagyABojszkoAKrikovszkyDSzaboT2001Polymorphism in the gene regulatory region of MCP-1 is associated with asthma susceptibility and severityJ Allergy Clin Immunol108337538111544456The International HapMap Consortium2003The International HapMap ProjectNature426696878979614685227van der Pouw KraanTCvan VeenABoeijeLCvan TuylSAde GrootERStapelSO1999An IL-13 promoter polymorphism associated with increased risk of allergic asthmaGenes Immun11616511197307WikmanHPiirilaPRosenbergCLuukkonenRKaariaKNordmanH2002N-Acetyltransferase genotypes as modifiers of diisocyanate exposure-associated asthma riskPharmacogenetics12322723311927838WitteJSPalmerLJO’ConnorRDHopkinsPJHallJM2002Relation between tumour necrosis factor polymorphism TNFalpha-308 and risk of asthmaEur J Hum Genet101828511896460WooJGAssa’adAHeizerABBernsteinJAHersheyGK2003The -159 C→T polymorphism of CD14 is associated with nonatopic asthma and food allergyJ Allergy Clin Immunol112243844412897754YaoTCKuoMLSeeLCChenLCYanDCOuLS2003The RANTES promoter polymorphism: a genetic risk factor for near-fatal asthma in Chinese childrenJ Allergy Clin Immunol11161285129212789231

Frequencies and ORs of genotypes in a control population calculated using 16 gene variants listed in Table 1. Each point represents a unique genotype combination. Referent genotype profile is identified by the arrow (OR = 1). Genotypic profile composed of all minor variants is identified by the circle.

Distribution of relative disease risk calculated using asthma-associated gene variants grouped by their biological attribution: (A) 12 group I variants only; (B) with three group II variants added to A; (C) with group III variant added to B.

The low end of cumulative distribution of ORs calculated using asthma-associated genetic variants (Table 1). The dashed gray line corresponds group I variants; the solid blue line represents risk distribution following addition of the group III variant.

Genes related to immune/inflammatory processes and environmental/occupational exposures in asthma.

Gene (Entrez Gene ID)aVariationFrequencyOR (mean)p-ValueReference
Group I (immune, inflammatory)
TGF-β (7040)−5090.1172.4560.0102Silverman et al. 2004
TLR-10 (81793)23220.0342.2370.0235Lazarus et al. 2004
TNF-α (7124)−3080.2231.5050.0444Witte et al. 2002
MCP-1 (6347)−25180.0892.7030.0055Szalai et al. 2001
IL-13 (3596)−10550.0197.7560.0081van der Pouw Kraan et al. 1999
CD-14 (929)−1590.0983.1430.0355Woo et al. 2003
IL-18 (3606)1050.1091.8300.0068Higa et al. 2003
IL-10 (3586)−6270.2890.2780.0222Hang et al. 2003
RANTES (6352)−280.2192.2330.0006Yao et al. 2003
IL-4R (3566)R5760.0188.1850.0429Rosa-Rosa et al. 1999
ACE (1636)Ins/del0.1604.4720.0018Gao et al. 2000
FcɛRIβ (2206)E237G0.2522.1550.0003Cui et al. 2003
Group II (atopy)
HLA-DQA1 (3117)03010.0818.7740.0010Aron et al. 1996
HLA-DQB1 (3119)03020.0836.7940.0039Aron et al. 1996
HLA-DRB1 (3123)40.02624.5880.0023Aron et al. 1996
Group III (metabolism)
NAT1 (9)Slow/fast0.2508.6250.0059Wikman et al. 2002

Ins/del, insertion/deletion

Gene loci and gene identification numbers are from Entrez Gene (2006).