Informative priors can be a useful tool for epidemiologists to handle problems of sparse data in regression modeling. It is sometimes the case that an investigator is studying a population exposed to two agents,

Associations estimated from sparse data tend to be highly imprecise and can be biased.^{1} Informative priors—external knowledge used to inform and stabilize measures of association—may help address these problems. Although the benefits of informative priors have been well documented in other fields, they are rarely utilized in occupational and environmental epidemiology.^{2,3}

We introduce a simple approach using order-constrained priors to incorporate prior knowledge regarding exposure-disease associations from toxicologic research into epidemiologic analyses. This approach informs parameter estimation based on the direction of effect of one parameter relative to another within the same regression model. We illustrate this approach by estimating associations between two exposures (beta radiation from tritium intakes and external exposure to gamma radiation) and leukemia mortality among workers employed at a nuclear facility.

A Bayesian analysis offers a coherent method to incorporate information from prior research when estimating an association in an epidemiologic study. A Bayesian analysis may often be an improvement over one confined to information within a single study.^{4,5} Informative priors for the parameters describing an exposure-disease association are often obtained from prior epidemiologic studies in comparable populations. When there is no previous epidemiologic research on human health effects of exposure to an agent, experimental or toxicologic evidence on whole organisms, tissues, cells, or molecules may be informative.

Experimental studies that use nonhuman animals or cell lines draw strength from their ability to control the exposures of interest, experimental conditions, and assessment of outcomes. However, use of evidence obtained from experimental studies to inform estimation of an association in an epidemiologic study may be complicated by differences in physiologic and pathologic responses across species^{6} and differences in the endpoints under study. For example, in radiation research, molecular and cellular studies often evaluate endpoints such as chromosomal aberrations, DNA strand breaks, or cell death, whereas epidemiologic studies often focus on cancer incidence or mortality.^{7–9} Nonetheless, experimental research may provide useful information for specifying an informative prior for the parameters describing an exposure-disease association.

An order-constrained prior can provide an intuitive way of integrating toxicology results while avoiding the pitfalls of trying to directly apply effect estimates across species or outcomes. By utilizing an order constraint, the researcher imposes a structure to the relationship between the exposure and outcome of interest. For example, if we want to impose a monotonic order constraint for categories of exposure, _{1}, _{2} and _{3}, we could specify an order-constrained model such that _{1} ≤ _{2} ≤ _{3}. In a Bayesian setting, which often relies on drawing large numbers of samples from the posterior distribution, we can impose that constraint by ensuring that each sample adheres to the specified ordering.^{10,11}

Order-constrained parameters have a history of use in the dose-response literature^{10–12}; however, their utility need not be limited to scenarios where the researcher is interested in specifying the direction and magnitude of a dose-response relationship for categories of a single exposure. Suppose that the investigator is studying a population exposed to agents _{0}+_{1}_{0}+_{1}

Suppose that the results from experimental research suggest that _{1} > _{1}). The investigator may incorporate such evidence by specifying an ordered-constrained prior informed by the rank ordering of the exposure effects regarding outcome _{0} + _{1}_{2}_{1} may be vague, while the prior for the parameter of primary interest, _{2}, reflects the ordered constrained prior assumption _{2} ≥ _{1}.

A major distinction between order constraints and priors typical in Bayesian analysis is that the former assigns a probability of zero to those parameter values that do not adhere to the constraint. Therefore, a researcher who uses this approach should have a high degree of confidence in the evidence that informs such priors. When this is the case, specifying an order-constrained prior may yield substantial gains in estimation of the effect of the agent of primary interest.

Conditions for using an ordered-constrained prior are encountered in some important, interesting settings in occupational and environmental research. Considering investigations of the health effects of various congeners of polychlorinated biphenyl, where people are exposed to two or more congeners, exposure intensities vary (and are not perfectly correlated), and toxicologic data suggest prior expectations for differences in biologic effects between congener types. Studies of respiratory health effects associated with inhalation of asbestos fibers provide another setting in which these conditions may hold, because people are typically exposed to fibers of various dimensions, and toxicologic data suggest prior expectations or differences in biological effects as a function of fiber dimension. Although available human evidence may be insufficient to posit a prior for the association between the agent of primary concern and outcome of interest, toxicologic data may be informative regarding ordered constraints for two or more parameters.

We present an example utilizing an order-constrained prior to estimate the association between tritium exposure and leukemia mortality in a cohort of workers who were also exposed to gamma radiation at the Savannah River Site nuclear facility. The Savannah River Site has been identified as one of the largest US occupational cohorts with potential tritium exposure. Nonetheless, examination of the association with cancer risk is hindered by the fact that tritium is received at low levels and occurs with exposure to other occupational hazards.^{13,14} Incorporation of prior knowledge via order-constrained priors was investigated as a method to stabilize risk estimates.

The Savannah River Site is a nuclear facility near Aiken, SC. Activities began in 1951, with the first production reactor going critical in December 1953. E. I. du Pont de Nemours and Company operated the site until March 31, 1989, when Westinghouse Savannah River Company took over.^{15,16} Between 1950 and 1986, 21,204 people were known to have been hired by DuPont to work at the site. We restricted our cohort to those who worked at least 90 days, and had no history of employment at another Department of Energy facility.^{13} Additionally, workers were excluded if they were missing information on sex, date of birth, name, Social Security Number, or date of first hire. This leaves a cohort of 18,883 workers for whom individual, annual dose records have either been computed or estimated in previous research. Workers were followed through 2002 to obtain vital status information. In the current analysis, the outcomes, leukemia, and leukemia excluding chronic lymphocytic leukemia are based on International classification of disease in the United States codes (International classification of disease in the United States 9 codes 204–207; those with code 204.1 represent cases of chronic lymphocytic leukemia). Analyses excluding chronic lymphocytic leukemia are conducted because of potential differences in latency of chronic lymphocytic leukemia compared with acute and myeloid forms of leukemia.^{17}

The primary type of external penetrating radiation exposure at the Savannah River Site was gamma rays. Neutrons were present in some areas, but constituted a small fraction of collective dose; therefore, we do not attempt to assess independent neutron effects.^{18} Penetrating forms of ionizing radiation were measured with film badges until 1970 and thermoluminescent dosimeters thereafter.^{16} Radiation doses from tritium intakes were estimated from urinalysis. We consider tritium and gamma doses independently in units of Gray (Gy).^{19} Estimated annual whole-body dose values in Sieverts (Sv)—the sum of gamma, tritium, and neutron doses—are available for all employment years; 6% of these records were estimated using a “nearby” method, previously described in Richardson et al.^{16} Annual whole-body dose estimates were utilized to derive annual tritium dose estimates for those person-years with missing information regarding the contribution of tritium to their cumulative annual whole-body dose. In total, there are 56.2 Gy of individual, annual tritium dose records, of which 4.3 Gy (7.7%) were estimated through use of a job-exposure matrix.^{20} Imputed values of tritium dose were validated by combining gamma and tritium doses to recalculate the excess relative rate (RR)/10mSv whole-body radiation obtained by Richardson et al.^{13} Point estimates were the same in the dataset with imputed tritium dose values, with a modest gain in precision.

Dose-response relationships are estimated using an excess relative rate model of the form:
^{α}^{i} indexes the baseline rate within stratum ^{13} All relevant confounders were matched in the study design, so no additional covariates are included in this analysis.

In a recent systematic review of animal and cellular studies, Little and Lambert^{21} conclude that a reasonable value for the biological effectiveness of an absorbed dose arising from tritium intake is two to three times that for an absorbed dose from external exposure to gamma radiation. In a more common Bayesian analysis, a researcher might incorporate the estimate of effect from previous work as a prior in the current analysis. Because these results are exclusive to leukemia events in animals and cancer precursors in animal and human cells, researchers may feel uncomfortable specifying a value for the effect (ie, the prior mean) based on these studies. However, although the outcomes of interest varied across studies, the direction of effect of tritium relative to gamma radiation dose is consistent based upon evidence from in vivo and in vitro studies^{21–23}; that is, the biologic effectiveness of tritium is always greater than that of gamma radiation. Therefore, we specify an order-constrained prior that _{2} ≥ _{1}
^{10} The order constraint allows us to specify that, although we are uncertain about the specific magnitude of the difference between _{2} and _{1}, we are certain that tritium causes more biologic damage than gamma radiation. In a Markov chain Monte Carlo implementation of the analysis this translates to drawing samples of _{2} that are never less than _{1}. The form of the distribution for the parameters is specified as normal but noninformative over admissible values, where _{1} ~ N(^{2} = 100,000) and _{2} ~ N(^{2} = 100,000), truncated below by _{1}.

All analyses were conducted using SAS procedure Markov chain Monte Carlo (V 9.2, SAS Institute, Cary, NC). Posterior distributions are presented as 90% highest posterior density intervals for consistency with radiation epidemiology literature. All models were run three times as a diagnostic check, to ensure that results are not sensitive to starting values of the _{1} and _{2}. In addition, to minimize simulation error, these models were run for 5 million iterations with a burn-in of 100,000 iterations, and all sampled values after the burn-in period were retained.

_{1} and no order constraint on _{2}, the estimate of the excess RR/10mGy due to gamma radiation (_{1}) for leukemia and leukemia excluding chronic lymphocytic leukemia are 0.053 (90% highest posterior density = −0.025 to 0.142) and 0.176 (0.011 to 0.375), respectively. The estimated excess RR/10mGy due to tritium (_{2}) for leukemia and leukemia excluding chronic lymphocytic leukemia are 0.141 (−0.323 to 0.649) and −0.281 (−1.136 to 0.548), respectively. The _{1} and _{2} when the order constraint is excluded.

When we integrate the order-constrained prior so that _{2}≥ _{1}, retaining a noninformative prior for _{1}, estimates of the excess RR/10mGy due to gamma radiation for leukemia and leukemia excluding chronic lymphocytic leukemia are 0.034 (−0.031 to 0.110) and 0.082 (−0.017 to 0.206), respectively. The estimated excess RR/10mGy due to tritium for leukemia and leukemia excluding chronic lymphocytic leukemia are 0.298 (0.027 to 0.702) and 0.344 (0.049 to 0.817), respectively. The width of the confidence bounds indicates the change in precision that results from integration of the order-constrained prior. The posterior correlations between _{1} and _{2} when examining leukemia and leukemia excluding chronic lymphocytic leukemia are −0.0118 and 0.1635, respectively. Additionally, the _{2} relative to _{1}, the latter of which is clearly better identified in this data.

In certain settings, use of an ordered-constrained prior can help bridge the gap between toxicologic studies and epidemiologic research. The appealing aspect for epidemiologists attempting to apply evidence from animal and cellular models is that this approach incorporates prior information only on the relative magnitude of effect of various exposures—exactly the type of information that is often available. We present a case where a highly imprecise parameter (tritium) is informed by a more well-identified parameter (gamma) based on a wealth of

It is important to consider the strength of information that supports the use of an order constraint. In epidemiologic analyses, counterfactuals are unobservable; however, experimental research can come quite close. Use of cell lines and genetically identical animals allows researchers to observe the outcomes that occur under controlled exposure scenarios, while characterizing the physical properties that may explain why they act as they do. We focus on the effect of radiation doses from intakes of tritium and from external exposure to gamma rays. Tritium is a radioisotope of hydrogen that emits beta radiation via the release of electron energy as it decays from H^{3} to He^{3.24} Unlike external penetrating radiation such as gamma- and x-rays, the low-energy beta radiation emitted by tritium travels only a short distance in tissue. However, beta radiation from tritium is known to have a higher average ionization density than gamma radiation; this leads to higher cellular level damage per unit dose.^{25–27} Both physical and experimental research provides strong evidence that the carcinogenic effectiveness per unit dose from intakes of tritium is as great as or greater than that for gamma radiation, a result that has been repeated over many years.^{9,26,28–30} Thus, changes in the estimates of parameters should be regarded as an improvement when informed by a constraint based on highly defensible and repeated research.

Current research treats the sum of gamma and tritium dose as a single exposure metric, assuming identical biologic effectiveness. It is possible to reweight tritium based on its relative effectiveness to gamma radiation and sum these values as a single exposure metric; this is common practice in radiation epidemiology. However, this requires assigning a fixed value to the relative effectiveness of tritium to gamma radiation. The order constraint of our analysis is more flexible and allows for the variation present in each exposure measure to characterize its distribution. By not summing doses based on a fixed weighting factor, the investigation also avoids exposure misclassification that would occur if the fixed value is incorrect. This is a concern because experimental research supports tritium’s increased biologic effectiveness relative to gamma radiation, but does not support a fixed value for any one outcome of interest.

When comparing a model without an order constraint prior to a model that integrates the constraint _{2} ≥ _{1}, the precision of all model parameters improves. The parameter with the largest increase in precision from the use of truncation is the estimate of the relationship between tritium and leukemia excluding chronic lymphocytic leukemia. When we truncate estimation of tritium in this model, the estimate shifts from an implausible negative value to a positive value. The observation of a negative relationship between tritium and leukemia risk is counter to both the evidence of its biologic effectiveness relative to gamma radiation and evidence of radiation as a leukemogen.^{31,32} The confidence limit ratio for _{2} in the unconstrained model for leukemia excluding chronic lymphocytic leukemia is more than twice the value in the constrained model, whereas the confidence limit ratio in the unconstrained model for leukemia including chronic lymphocytic leukemia is larger by less than half (^{33}; such a program was not in place for all workers at Savannah River Site over the entire history of the site’s operation.^{18} As a result, records may not accurately represent individuals’ tritium doses, leading to attenuation of its relationship to cancer risk. In fact, misclassification of either exposure will influence the effectiveness of the constraint. Since we implement our constraint using a Bayesian approach, if knowledge regarding misclassification of either exposure is available, this information may be added to the model to further inform the estimation of the exposure that is subject to influence from the constraint.

In our empirical example, the fact that both tritium and gamma radiation are measured in the same units (Sieverts) allows for a direct application of the order constraint to the data as is. However, in the case where the user intends to specify an ordered structure to exposures measured in different units, it may be necessary to recalculate the units for exposures so that their coefficients are measured on a single scale, such as parts per million for particulate matter. Additionally, radiation dose is recorded and examined as a single, continuous term. When exposures are measured in categories of exposure, it is still possible to utilize the order constraint, although more difficult to specify. Additionally, it would be important for the categories of exposure to match one another.

Although we present an example from radiation epidemiology, an order constraint can be utilized in many scenarios. We mentioned the case of different congeners and inhaled fibers previously^{34,35}; however, any exposures that are the focus of repeated experimental (or epidemiologic) research may benefit from the use of an order constraint, if knowledge external to a particular study is available. In the case that the investigator has prior knowledge that the effect of second exposure is x units larger than the first exposure, the constraint may be readily adapted: for instance, instead of specifying _{2} ≥ _{1}, the user may specify _{2}≥ _{1} +

When utilizing the order constraint for distinct exposures, researchers should ensure that its use is well supported by prior research whose results have been repeated. If the assumptions that are integrated into analyses via the order constraint are faulty, as with any other incorrect assumption, the results can be biased. In general application, when the parameters do not violate the order constraint and both are well informed by the data, the order constraint will have diminishing impact on the parameter estimates. However, in the case that the data are in violation of the constraint, the parameter estimates will be biased (see

In conclusion, order-constrained priors may be a useful tool when the researcher has prior knowledge concerning the direction and magnitude of a parameter estimate of interest relative to another parameter in the same regression model. An appealing aspect of this approach is that the researcher is not required to synthesize evidence from multiple sources to calculate a prior distribution for any parameter in the regression model. Rather, simple knowledge of the direction of effect of one parameter relative to another is sufficient. Thus, implementation of this method is straightforward, and we hope that this approach will be appealing to Bayesians and frequentists alike. The

Supported by Centers for Disease Control and Prevention (grant number 1R03OH009800-01).

Supplemental digital content is available through direct URL citations in the HTML and PDF versions of this article (

Posterior distribution of estimates of risk of leukemia (left column) and leukemia excluding chronic lymphocytic leukemia (right column) associated with gamma (solid line) and tritium (dashed line), excluding (upper row) and including (lower row) an order constraint.

Distribution of All Leukemia Cases by Gamma and Tritium Dose Categories in mGy (n = 84)

Outcome | Dose Category (in mGy)
| ||||||||
---|---|---|---|---|---|---|---|---|---|

0 | >0–4.9 | 5–9.9 | 10–19.9 | 20–39.9 | 40–79.9 | 80–159.9 | 160–319.9 | ≥320 | |

Leukemia | |||||||||

Estimated gamma radiation dose | 5 | 27 | 8 | 9 | 10 | 7 | 13 | 4 | 1 |

Estimated tritium dose | 30 | 34 | 9 | 3 | 5 | 3 | 0 | 0 | 0 |

Leukemia excluding chronic lymphocytic leukemia | |||||||||

Estimated gamma radiation dose | 4 | 20 | 6 | 6 | 4 | 6 | 11 | 4 | 1 |

Estimated tritium dose | 24 | 25 | 4 | 2 | 4 | 3 | 0 | 0 | 0 |

Parameter Estimates for the Excess Relative Rate/10mGy due to Gamma Radiation (_{1}) and Tritium (_{2})

Outcome | Truncation | Gamma Radiation
| Tritium
| ||||
---|---|---|---|---|---|---|---|

_{1} | (90% Highest Posterior Density Intervals | Width of Confidence Interval | _{2} | (90% Highest Posterior Density Intervals | Width of Confidence Interval | ||

Leukemia | _{2} ≥ _{1} | 0.034 | (−0.031 to 0.110) | 0.141 | 0.298 | (0.027 to 0.702) | 0.676 |

None | 0.053 | (−0.025 to 0.142) | 0.168 | 0.141 | (−0.323 to 0.649) | 0.972 | |

Leukemia excluding chronic lymphocytic leukemia | _{2} ≥ _{1} | 0.082 | (−0.017 to 0.206) | 0.224 | 0.344 | (0.049 to 0.817) | 0.768 |

None | 0.176 | (0.011 to 0.375) | 0.266 | −0.281 | (−1.136 to 0.548) | 1.684 |

Highest posterior density intervals are the Bayesian complement to the confidence interval.