Environmental Health Perspectives Vol. 74, pp. 211-221, 1987 Estimation of the Time Component in the Movement of Chemicals in Contaminated Groundwater by Stan C. Freni* and Donald L. Phillips* For a proper analysis of the potentially causal relationship between exposure to volatile organic chem- icals (VOCs) in drinking water and health events, it is essential to know T1, the time when exposure started, and C = 1(T), which is the change of the VOC concentration C as a function of time T and the total accumulated exposure (TAE) to VOCs to which an individual was exposed. In the typical situation of incidentally detected pollution of groundwater, no such information is available. This paper describes the development of a method for estimating T1, C = A(T), and TAE as part of an epidemiologic study of the health effects of VOC contamination of an aquifer serving public and private wells. Pooled test results of city wells, tested periodically since 1981, provided the data base for developing a statistical model for estimating C = 1AT). This model was then applied to private wells, for which the data of only one water sample were available, to retrospectively estimate their T1. The best-fitting model was a multiple linear regression equation consisting of the natural logarithm of the VOC concentration as the response variable, with the time of sampling, the distance of the wells from the source (expressed as coordinates), the well depth, and the well capacity as determinants. The TAE was calculated by integrating the area under the time-concentration curve. Introduction Except in prospective cohort studies, epidemiologists usually have the problem of individuals failing to recall exposure data or the problem of poor quality of existing exposure records. This explains, in part, why most stud- ies suffice with dichotomous or ordinal exposure data, or with assuming that exposure was constant over time during the period of residence or employment in an area with contaminated air, water, or soil. In the search for better exposure data, little attention has been given to the time factor in exposure assessment. In risk assess- ment and epidemiologic studies, the time factor can be important in several ways: * The starting date of exposure determines whether a disease antedates exposure and should therefore be excluded from analysis. * The starting date of exposure determines whether the observation time was long enough for a disease to be attributable to the exposure of interest. * The end date of exposure marks the beginning of the exposure-free observation period, in which the dose- response curve is expected to be different from that when exposure was ongoing (1). *Center for Environmental Health, Centers for Disease Control, U.S. Public Health Service, U.S. Department of Health and Human Services, Atlanta, GA 30333. * The dates when exposure started and ended and the changes in exposure level between these two points in time (exposure as a function of time) allow esti- mation of the duration and total amount of exposure. Both are important for a proper dose-response analy- sis and for judging the biological plausibility of an observed statistical relationship between exposure and a health event. * The age at which exposure started and ceased is nec- essary information in estimating the risk of cancer through mathematical models for a less-than-lifetime exposure (2). Clearly, in epidemiology, neglecting the time factor is likely to result in misclassification with regard to ex- posure and eligibility of an observed disease and can lead to decreased statistical power and to erroneous inferences from study results. In the case of contaminated groundwater, once a chemical spill has occurred, a certain period of time (dTl) is needed for a chemical to reach a distant well. The magnitude of dTl is determined by many factors, such as depth of the water table, solid structure, di- rection and velocity of the groundwater flow, the dis- tance of the well from the axis of the groundwater flow and from the spill site, withdrawal rate and amount, soil affinity and other properties of the chemicals, etc. Once the compound reaches a well at a time T1 (the time that a compound reaches the detection limit), a period of time dT2 is needed to reach a sample concentration FRENI AND PHILLIPS C8 at sampling time Ts (Fig. 1). Contrary to a predicted dTl or dT2 as a future event, estimation of T1, dTl, and dT2 is retrospective when one is confronted with a situation of polluted groundwater. Estimating these values is important in tracing the generator of the pol- lution to seek recovery for cleanup costs or material damage, and to determine possible adverse health ef- fects. Retrospective estimation of T1 and C = f(P, which is a mathematical expression of the changes in concentration C as a function of time T, is necessary when the potential health effect of chemicals in water is being studied. In the typical scenario of an inciden- tally detected pollution of residential wells, however, existing information is limited to the results of testing a single water sample; routine monitoring of public water supplies for a wide variety of toxic chemicals was begun only a few years ago. Naturally, prospective (predictive) models might be used retrospectively. Several models have been ad- vanced for mathematically describing the transport of chemicals in groundwater, but all of these require ex- tensive data on hydrogeologic parameters, which were not available for the scenario described in this paper, as they do not exist in virtually all sites with incidentally detected contamination of groundwater. Moreover, as described in "Discussion," such models are inherently incapable in discriminating between individual wells in a crowded well field. This paper describes the devel- opment and use of a method for retrospectively esti- mating T1 and C = f(T) as part of an ongoing epide- miologic study of the potential health effects of human exposure to volatile organic chemicals (VOCs) in drink- ing water. The method is based on the hypothesis that - dTo -I - monitoring data for city wells, if available, can be used for estimating C = f(T) for city wells, which in turn can then be applied to neighboring residential wells for the retrospective estimation of T1, as depicted in Figure 1. Materials and Methods In September 1981, an unscheduled survey led to the discovery of VOCs in the municipal drinking water of Battle Creek, MI. A routine monitoring program for city wells was set up, and neighboring private wells were also tested, but not more than once in the period 1981 to 1983. Contamination of a number of wells was confirmed. There were no historic data on the presence of VOCs in city or private wells. The source was found to be a distributor of industrial solvents located upgra- dient of the affected wells (3). The major cause of the spill appeared to be leaking underground storage tanks. Soil samples taken at the site showed extremely high concentrations, and VOCs were found freely floating on top of the water table. Thus, the contamination of the aquifer was not from a one-time spill but from contin- uous spilling. However, this may not necessarily be true for all VOCs, as not all VOCs were stored at the same time. Figure 2 depicts a map view, constructed from an aerial photograph and from maps supplied by the city engineer. Groundwater flow in the Verona area is in three aqui- fers. From land surface down the aquifers are sand and gravel units within unconsolidated glacial deposits and upper and lower sandstones of the Marshall Formation. Horizontal hydraulic conductivities of the aquifers are about 100, 150, and 550 feet/day for the sand and gravel, C C z f(T) CITY WELL T dTl dTl dT2 private well dT2 city well FIGURE 1. Application of function C = f(T). Dots on the right side represent actual sampling values of the time series of city well 28 (see Fig. 2) for trichloroethane (TCA). The curve represents expected values from a linear regression of concentration C. on time T8 for well 28 and TCA. By shifting the curve to fit sampling point S, time T1 at C = 1 ppb can be calculated for the private well. Time lapses dTo, dTl, and dT2 are determined by the time the spill of the chemical began (Tpill), T1, and the time of sampling (T.). dTl comprises the time lag for the spilled chemical to reach the groundwater and the time for the chemical to travel through the aquifer until it reaches the well at C. = 1 ppb. 212 MOVEMENT OF CHEMICALS IN GROUNDWATER FIGURE 2. Area map depicting X-Y grids, city wells (O); polluted private wells (0) that had no VOCs at the time of sampling, and the source of pollution. Numbered city wells (x) were selected for modeling. upper sandstone, and lower sandstone, respectively. Secondary permeability of the sandstones, due to frac- turing, is the cause of the relatively high hydraulic con- ductivities. Most private well water is pumped from the sand and gravel or the upper sandstone aquifers. Mu- nicipal water is drawn from the upper and lower sand- stones. Although the vicinity of the river west of the wells would lead one to expect groundwater to flow from the point source in a westerly direction, the actual flow has been found to be in a north to northwesterly direc- tion. This can be explained by the heavy water with- drawal in the city well field. The velocity of the flow has been estimated at 1 to 4 feet a day. This hydrogeo- logic information is abstracted from a study by Gran- nemann and Twenter, conducted when the contamina- tion of the city well field was detected (4). Estimation of T1 requires C = f(T), and thus, re- peated testing of water from the same well over time. 213 FRENI AND PHILLIPS Such time series, covering a period from September 1981 to the present, were available for 30 city wells, but not for the 116 private wells tested. Data on well descriptors and water quality were provided by the Michigan Department of Public Health and the De- partment of Natural Resources. The VOCs found most frequently, their abbreviations as used in the text, and the ranges of concentrations are listed in Table 1. The following sections describe the steps in determining C - fiT) for city wells to be applied to individual private wells for retrospectively estimating T1. Selection of Wells Minor remedial changes in the management of the city well field started in 1982, but reached major pro- portions in 1984 when newly constructed city wells north of the well field came into operation. The new well field caused, as intended, a sharp drop in VOC concen- trations in the pre-existing wells, and the flow pattern in the aquifer was drastically altered (5). As declines and leveling of concentrations were the results of the change in the management of the city well field, it was presumed that residential wells continued to be affected by increasing contamination levels at least until the end of 1983. Accordingly, the selection of city wells for de- veloping VOC-specific C = fiT) was limited to wells showing increasing VOC concentrations over one year or more during the interval 1981 to 1983. Not all wells were polluted, and if they were, contamination did not necessarily involve all VOCs. In particular, ethylene dichloride (12-DCA) and 1,1,2-trichloroethylene (TCE) occurred rarely in city wells, although these compounds were common in private wells (Table 1). Of the 30 city wells, 11 were either not polluted, tested positive a few times Qnly, or were contaminated as late as 1984. VOC levels in these 11 wells ranged from 1 to 11 ppb, and in nonie of these wells was more than one or two VOCs present. Of the remaining 19 wells, 8 showed low VOC levels with single sample peaks, or the concentrati9n versus time curve (CT curve) did not show an increasing trend. Eventually, 11 city wells provided a total of 27 time series suitable for modeling. These wells are depicted in Figure 2 (num- bered wells). In total, these 11 wells provided 27 chem- ical- and well-specific concentration-time series (Table 2). If increasing trends were followed by a decline in 1984 (attributable to remedial actions), the descending part of the time series was deleted prior to statistical processing. Thus, no time series extended beyond April 1984. Variables for Developing a Time- Concentration Model A grid pattern with a central X-Y cross-axes system was laid over the map of the study area with a north- south Y-axis and the intersection point of the X and Y axes located just northwest of the spill site (Fig. 2). The X and Y coordinates for each well were expressed as the number of grid intervals from the X and Y inter- section point. The rationale underlying the X-Y coor- dinates was the hypothesis that, although the move- ment of the plume would follow the main direction of the groundwater flow (Y-axis), microscale deviations were likely to occur as a result of diffusion or local differences in the soil structure or flow characteristics. Such deviations are projected as components of X and Y. In addition, the depth of the well was considered important because a downward component of the groundwater flow and a vertical gradient in VOC levels were found in hydrologic studies (5). The data for developing C = f(T) were arranged as a line listing in which each single water sample repre- sented one observation with the following information: an identification number unique for a single water sam- ple; the well number (for city wells) or street address (for private wells); the X and Y coordinates; a number indicating the quadrant of the X-Y grid system; month and year of sampling, from which the variable T8 was calculated as the number of months since January 1, 1970 (an arbitrarily chosen date); the sample concen- tration C, of each chemical in ppb (parts per billion); the well depth D in feet (midpoint of the screened or uncased portion); and the well pump capacity G in gal- lons per minute. Table 1. Contamination levels of volatile organic chemicals (VOCs) in the groundwater of the Verona well field and adjacent neighborhoods. Number of contaminated private wells (n = 90)' Number of contaminated city wells (n = 27)' Sample concentration, ppb Maximum concentration, ppb VOCb 0 1-4 5-24 25-99 100+ 0 1-4 5-24 25-99 100+ 11-DCA 38 17 17 16 2 7 8 7 5 12-DCA 42 16 12 16 4 23 3 1 DCE 56 20 11 3 14 7 6 - CIS 24 14 14 7 31 11 2 9 3 2 PGE 53 11 8 11 7 10 7 3 6 1 TCA 46 23 16 4 1 7 6 7 5 2 TCE 47 8 8 21 6 13 12 2 a In addition to the wells cited, 3 city wells remained free of VOCs during the study period, and 48 private wells were free of VOCs in the one sample taken in the same period. bDCA = 1,1-dichloroethane; 12-DCA = ethylene dichloride or 1,2-dichloroethane; 11-DCE = 1,1-dichloroethylene or vinylidene chloride; CIS = 1,2-cis-dichloroethylene; PCE = perchloroethylene or tetradtloroethylene; TCA = 1,1,1-trichloroethane; TCE = 1,1,2-trichloroethylene. 214 MOVEMENT OF CHEMICALS IN GROUNDWATER Table 2. Well- and VOC-specific slopes (bl) and correlation coefficients (r2) for simple regression equations of C(oncentration) on T(ime).a C= T logC= T Well VOC bi r2 bi a; 13 CIS 1.860 0.66 0.375 0.95 20 CIS 0.758 0.55 0.186 0.67 21 CIS 0.982 0.72 0.148 0.77 22 11-DCA 0.272 0.80 0.118 0.80 CIS 3.018 0.90 0.174 0.65 TCE 0.631 0.73 0.194 0.65 23 CIS 1.965 0.59 0.187 0.49 27 11-DCA 0.839 0.71 0.053 0.76 DCE 0.259 0.46 0.067 0.41 PCE 2.931 0.78 0.094 0.86 TCA 5.013 0.78 0.110 0.87 28 11-DCA 0.606 0.52 0.095 0.70 DCE 0.172 0.37 0.076 0.50 PCE 2.858 0.59 0.152 0.93 TCA 4.880 0.65 0.166 0.95 29 11-DCA 0.337 0.64 0.099 0.57 PCE 1.533 0.88 0.163 0.85 TCA 2.590 0.85 0.177 0.85 33 11-DCA 0.500 0.69 0.095 0.76 CIS 5.539 0.91 0.167 0.94 TCE 1.035 0.91 0.131 0.92 35 12-DCA 0.681 0.51 0.105 0.36 CIS 5.286 0.30 0.032 0.23 38 11-DCA 1.031 0.80 0.200 0.86 DCE 0.261 0.71 0.163 0.74 PCE 1.706 0.88 0.270 0.93 TCA 0.824 0.67 0.150 0.76 a Number of samples per series ranges from 9 to 19. The period per time series varies from 10 to 19 months. For well locations, see Figure 2. Development of the Time-Concentration Model The statistical models tested are based on linear regression of C on T as of a common starting date, arbitrarily set at January 1, 1970. This date is 1.5 years after the solvent company started its operations. This time lapse, composed of dTo (from when the company started its operations to the date of spilling) plus dTl (Fig. 1), was considered a minimum length of time for spilling to occur and for the spilled VOCs to reach the wells at a level above the detection limit. Most city wells demonstrated VOC-positive water samples later than September 1981, implying that in the respective time series, one or more leading samples were free of VOCs (CS = 0). In such cases, all but the last leading zeros were censored. The last leading zero, and zeros occur- ring after at least one positive sample, were converted to an arbitrary Cs = 0.5 ppb for two reasons: it allows the transformation of C into logC (the natural logarithm of C); and 0.5 is closer to the detection limit (between 0.2 and 1 ppb). Univariable regression (the only re- gressor variable being 7) was applied to determine the model that best fits the individual time series. Multi- variable models were employed for data pooled from all selected wells, using stepwise backward elimination procedures with a significance threshold of p = 0.1 (6). The following basic models were tested: C = a + bT log C = a + bT Multivariable Models C = a + bjT + b2X + b3Y + b4D + b5G log C = a + bjT + b2X + b3Y + b4D + b5G In the above models, "a" is the intercept, and "b" is the regression coefficient. In this paper,' b1 (the coefficient of 7) will also be referred to as the slope of the regres- sion. Model 1 will be referred to as a C = T model, and Model 2 as a logC = T model. Application of Models for Estimating T1 (Starting Date of Contamination) Let C8 and T7 denote the concentration and time of a water sample from a well, and let Tl. denote the time in the past when a chemical reached the lower concen- tration Clow of interest. Tl1 can then be estimated as follows. Select the best final model from the stepwise regression for the chemical of interest, say, a C = T model comprising T, X, and Y for: perchloroethylene (PCE) and apply it to the target private well for both C8 and C1. subtract, and solve for T1ow Cs = a + bjT8 + b2X + b3Y -Clow=a+ b1TiOw+ b2X+ b3Y C8 - Cl.0 = biT. - bjTj. Tlow= (Clw + blT, - C,)/b1 The same derivation can be used for a logC = T model to show that: Tjow = (logCj. + b1T. - logC8) / b, (2) Equations 1 and 2 show that all variables but T cancel out because of their constancy over time. It is therefore not necessary to know a, b2, and b3 for the well and compound of interest. Besides, if these values were known for individual residential wells, there would have been no need for developing a model from city wells. As Cl,w represents any concentration for which the as- sociated Tlo,, has to be estimated, and thus also C1W = 1 ppb, T, can be estimated by replacing Clo with 1, and deleting logC10,,, as log (1) = 0. Thus, for estimating Tl, Equation (2) can be condensed into the following simpler equations: T1 = (1 + bT78 - Cg)lb1 for C = T models T, = T. - (logC8Ibl) for logC = T models (3) (4) Evaluation of the Validity of the Models The following procedures were used to select the best model for each VOC: (a) Judge for each single city well and VOC how well univariable C = T and log C = T models fit the data from which they were derived, using 215 \, FRENI AND PHILLIPS the r2 value and the distribution of residuals. (b) Pool the well data and judge how multiple regression models fit the pooled city well data from which they were de- rived, using the model r2 and the distribution of resid- uals. Let an observed Tl1w and C1lw denote the time and concentration of the first positive sample in a time se- ries. Using Equations (1) and (2), project for each sub- sequent sample of the time series an expected Tl0w. The magnitude and the distribution of the differences be- tween predicted and observed T10, are good indicators of the validity of a model for estimating Tl,0. (c) Repeat the previous step by comparing the observed and es- timated Tlow for the private wells that were sampled twice before 1984 (Table 3). (d) Compute T1 for the other private wells if C. is more than 1 ppb. As these wells lack the reference value of a second sample, a relatively narrow (a few years) range of T1 estimates for a group of neighboring wells should serve as an alternative eval- uation criterion for the validity of the model. (e) Repeat steps 2 through 4, now using weighted regression models, in which the data of an individual well are weighted by 1/SE2 of that well (SE is the standard error of the slope of the univariable regression). Results The following paragraphs refer to the results of ap- plying the five evaluation procedures for model fitting, described above. Of the 27 VOC- and well-specific time series, most showed curved scatter plots of C versus T, resulting in 17 series with higher values for logC = T models, 2 series with equal r2 for C = T and logC = T models, and 8 series with higher r2 for the C = T model (Table 2). Figure 1 depicts a typical logC = T fitting scatter plot, representing the time series of city well no. 28 for 1,1,1 trichloroethane. C = T models showed a tendency toward underestimating C at both ends of the curve, especially at the upper end, and toward overestimating in the middle part of the curve. LogC = T models tended to slightly overestimate C at the lower end of the curve. Except for the model r2, the results of unweighted multivariable regression models were again in favor of logC = T models. LogC = T models with a higher r2 were found for TCA, PCE, and 11-DCA (1,1-dichloro- ethane), whereas DCE (1,1-dichloroethylene), CIS (1,2- cis-dichloroethylene), and TCE had a higher r2 associ- ated with C = T models. No multivariable regression model could be developed for 12-DCA, a chemical found only occasionally and at very low levels in city wells (maximum concentration of 10 ppb). In the case of C = T models, the difference between observed and esti- mated Tl0w increased with inclining C8, resulting in a mean difference of + 2.6 months with large variations (SD 24.5). In contrast, logC = T models showed dif- ferences scattered within a narrow range around a mean of zero months (SD 6.5). In 56% of the cases, the esti- mated Tlo, was closer to the observed Tl,,,, if a logC = T model was used, compared with 32% if the basis was a C = T model. No difference between the two models was seen in 12% of the estimates. As another test of accuracy, the models were applied to the two private wells that, coincidentally, were sam- pled twice before 1984. Table 3 shows that in 10 of the 14 pairs of observed and estimated Tl,,, estimates de- rived from the logC = T model were the closest to the observed value. The range of differences between es- timated and observed Tl0w is by far the widest for C = T model estimates. In one case the C = T model yielded a Tl0,, estimate, pointing to a time still to come (1997). Table 4 depicts T1 estimates for private wells derived from unweighted models. C = T models yielded the widest range of estimates (134 years) and many large negative values, that is, estimates far earlier than 1970. In extreme cases, T1 went back in the past many dec- ades to a century, before the VOCs came in commercial production. T1 estimates for neighboring wells differed by many years to decades with no consistent pattern. T1 estimates from logC = T models ranged from 1977 to 1983. Figure 3 is a map view of the location of wells with T1, in time lapses of 1 year, estimated (or observed if the sampling result was 1 ppb) for CIS and 11-DCA using logC = T models. The close correlation of each well's T1 with its topographical location is reflected in the clumping of the earliest T1 estimates in an area closest to the spill site. From here, protrusions with later T1 estimates spread out, pointing mostly in a wes- terly to northwesterly direction, in agreement with the main direction of the groundwater flow. When analyzing pooled data, weighted regression models are conceptually preferred over unweighted models. Indeed, r2 values and the accuracy in estimating Tl0, improved for logC = T models, although not for Table 3. Observed and estimated T10, of private wells sampled twice before 1984. Well 11-DCA 12-DCA DCE CIS PCE TCA TCE 260 Observed Cl,ow-Chigh 36-58 119-284 1-16 1100-674 16-42 0-12 37-46 Observed TlOW-Thi,h 145-155 145-155 145-155 145-155 145-155 145-155 145-155 Estimated T7ow for C = T 116 60 85 306 144 T, = 151a 145 Estimated T1ow for logC = T 150 151 142 158 149 T, = 139a 153 620 Observed Clow-Chigh 13-16 17-24 3-4 549-488 55-77 2-4 60-68 Observed Tlo.-Thigh 145-147 145-147 145-147 145-147 145-147 145-147 145-147 Estimated T1ow for C = T 142 137 142 169 137 146 138 Estimated T1ow for logC = T 145 144 143 148 145 142 146 a Because the first sample of well 260 has a Clow of 0 ppb TCA, a T, was estimated rather than T1ow. The observed reference for this estimate is between' tLow and Thigh- 216 MOVEMENT OF CHEMICALS IN GROUNDWATER Table 4. Predicted T1 for private wells. C = T-based T1 log C = T-based T1 VOC Number of wells Earliest Latest Earliest Latest 11-DCA 52 -95 168 94 163 Aug62 Dec 83 Oct 77 Jul 83 12-DCA 48 - 333 152 87 152 Apr 42 Aug 82 Mar 77 Aug 82 DCE 34 -234 149 88 149 Jul 50 May82 Apr 77 May82 CIS 66 -1322 166 92 155 Nov 1849 Oct 83 Aug 77 Nov 82 PCE 37 43 163 108 163 Jul 73 Jul83 Dec 78 Jul83 TCA 44 102 173 113 173 Jun 78 May84 May 79 May84 TCE 43 -578 158 100 152 Nov 21 Feb 82 Apr 78 Aug82 11-DCA * * * 5 CIs FIGURE 3. Distribution of residential wells west of the source with a T1 (estimated time when well pollution started at 1 ppb) in periods of 1 year for 11-dichloroethane (11-DCA) and cis-dichloroethylene (CIS). For source location, see Fig. 1. T1 estimates are coded 1-6. Shaded areas indicate wells with the same T1 in periods of 1 year, starting April 1977-March 1978 (code 1), and ending April 1982-March 1983 (code 6); e.g., code 2 area comprises wells with T1 April 1978 through March 1979, the code 3 area comprises wells with T1 April 1979 through March 1980, etc. Uncoded wells (0) were not contaminated at time of sampling (mostly in 1982). Contour lines are for illustrative purposes only, although they approximate the actual path of progression of the plume of contamination. C = T models. However, weighted models resulted in larger differences between observed and predicted Tl1u for both city wells and the two private wells listed in Table 3. Apparently, giving more weight in the pooled data to wells with smaller standard errors of the slope does not guarantee better accuracy. We concluded that unweighted logC = T models yielded the best results when retrospectively estimat- ing T1 and C = f(T). The widest confidence range for T1 estimates was for chemicals with the smallest num- ber of time series suitable for modeling, which is a log- ical consequence of using well constants such as distance and depth. Confidence limits around a regression estimate would set limits around Cs and not T1, rendering conventional methods for estimating these limits not applicable. Since 217 FRENI AND PHILLIPS calculation of both Cs and T1 are based on b1, the esti- mated regression coefficient for T, a sort of 95% confi- dence range was established by using b1 plus or minus 1.96 times its standard error. Ignoring 12-DCA, the standard error (SE) of b1 for logC = T models was not more than 6 to 15% of b1, resulting in a confidence range of a few months to a year, with an occasional outlier of 2 years. This is quite satisfactory, in view of the period covered (1970-1984) and the limited data available. The models for 12-DCA were derived from only one well, with a very low level of contamination and a monitoring period of only one year (well no. 35), which resulted in an SE of b1 for 12-DCA that is 50% of b1. This does not invalidate the T1 estimates for 12-DCA. The large SE means that the T1 estimate has a wide confidence range, but it is still the best estimate. C = f(T) can be used for estimating the total accu- mulated exposure (TAE) in the period T1 to T7, indi- cated by the area under the CT curve (Fig. 1). For Clow - 1 ppb and T = T1, Equations 1 and 2 can be written as: C, - 1 = bj(T8 - T1) for C = T models log(C) - log(1) = bj(T8 - T1) for logC = T models Integrating the area under these curves from T, to Ts yields TAE equal to: TAE - (C8 + 1)(T, - T1)/2 ppb-months for C = T models TAE = [exp(b1(T8 - T1)) - l]lb ppb-months for logC = T models Using the confidence range of b, for logC = T models described above, the lower limit of TAE was 11 to 27% lower than the point estimate, with an upper limit (for VOCs other than 12-DCA) of 14 to 61% higher than the point estimate. The widest confidence range was ob- served for DCE and TCE, of which the models were derived from 3 and 2 city wells only. This limited the number of explanatory variables (other than T) allowed in the multivariable regression models to 2 and 1, re- spectively. The upper limit for 12-DCA was high, con- sistent with the large standard error of b1 for this chem- ical. Discussion A great many factors govern groundwater move- ment. Most of these factors cannot be measured di- rectly, but some can readily be inferred from associated measurable features such as porosity, hydraulic con- ductivity and gradient, and the amount of recharge and discharge from the aquifer. These features are used to describe the flow in aquifers and to understand more basic characteristics of the soil or rock matrix and the water that flows through the aquifer. For example, hy- draulic conductivity reflects the friction and surface ten- sion between soil or rock particles and water molecules, which in turn depends on the size, shape, and distri- bution of particles, as well as properties of the water (7). An accurate prediction of groundwater movement re- quires a large number of measurements in the study area. Flow models are often applied to manipulate the large number of variables. Flow models predict the movement of water, not of chemicals, but nevertheless have been used to predict movement of chemicals (8). Other, more complex models are available to simulate the transport of chemicals in a flow system. Some of these complex models have been used successfully (9) or are still in an experimental stage (10,11), and might theoretically be applied to the retrospective estimation of C = fl(T) and T1, but require extensive data on hy- drogeologic parameters. In a typical setting of ground- water contamination found incidentally, such as in the Battle Creek situation, these data simply do not exist. Assuming that current parameters also prevailed in the past decade would certainly lead to erroneous results, as some parameters did change over time, e.g., the chemical gradient and the flow direction. Because of their general concept, these prospective models are applicable to a wide variety of scenarios. They cannot discriminate, however, between individual wells as close to each other as in a well field or in a residential area. Accordingly, estimates of T1 or C = f(T) inevitably refer to relatively large areas or to groups of wells rather than to individual wells. This is not because of a deficiency in the concept of the models, but because today's technology for estimating geohy- drologic parameters is essentially a macroscale tech- nology. Dealing with an individual well amidst a whole field of wells in the same aquifer would require micro- scale parameters. Estimating C = f(T) and T1 for in- dividual wells would require not only hydrogeologic tests for each single well, but also the assessment of chemical-specific aquifer and soil properties still un- known. For instance, Gold and Roberts (10) suggested the presence of immobile water compartments in an ;quifer. If immobile water compartments exist, there is no method to assess their location or size. Neither are the causes of the immobility known, nor whether such compartments may become mobile again. Questions may arise about how aquifer properties re- late to T1 and C = f(T) as expressed in Equations 1 through 4, but our method does not require that these questions be answered. The retrospective C = fiT) method described in this paper is not based on hydro- geologic principles, neither is it to be interpreted as a mathematical hydrological flow model. It is a pragmatic and empirical approach, using statistical tools to eval- uate observed chemical- and well-specific changes in concentration over time. The estimates are based ont and can often be validated by, observed data. Analyzing a concentration-time curve is merely accepting that whatever factors affect the movement of a chemical will eventually be reflected in T1 and in C = fiT) for that compound. The regression coefficient of T reflects the 218 MOVEMENT OF CHEMICALS IN GROUNDWATER speed with which chemicals move through the aquifer after being slowed down, accelerated, or detoured by the various characteristics of the aquifer, wells, and chemical. The time T in C = f(T) ultimately represents all geohydrologic, chemical, and physical factors that influence the movement of chemicals. In the presence of a concentration gradient caused by a chemical spill, the chemical inevitably travels to places lower on the gradient, which yields increasing C8 in wells. It is equally inevitable that transportation is gov- erned by time and characteristics of the wells and the aquifer, some of which may be known. A combination of flow, diffusion, and sorption factors, as described by Cameron and Klute (11), may explain the monotonic nature of C = f(T), but does not explain why neigh- boring wells have different slopes for the same chemical (see Table 2). Acceptance of the existence of immobile compartments and sorption factors in the aquifer, as proposed by Goltz and Roberts (10), would provide a reasonable explanation for differences in the slope of C = f(T). Immobile compartments would force the water flow to circumvent the area, resulting in increased val- ues for dTl and dT2 (see Fig. 1), and thus result in a later T1 and lower slope. Immobile compartments also explain why some wells were hardly contaminated and others not at all, although they were in the main path- way of the plume of contamination. The adsorption/de- sorption ratio is likely to influence not only dTl, but also the shape of the concentration versus time curve. VOCs were stored and distributed at different times and volumes, depending on supply and demand, and not all tanks were corroded and leaked at the same time and at the same rate. This explains why different chem- icals started to show up in the samples of the same well at different times and reached different peak concen- trations. Our approach of developing C = f(T) from city wells selected for showing increasing C on T, however, cir- cumvents these issues, but some essential assumptions are made: the private wells experienced increasing C over T at the time of sampling; city wells selected for an increasing C over T better represent contaminated residential wells than do a hypothetical average of clean and polluted city wells; and private wells, once contam- inated, will behave the same as city wells, that is, show the same dependency of C over T. There are some ar- guments in support of these assumptions. First, Table 3 shows that the two private wells tested more than once did show increased levels in the second (later) sample, with CIS as the only exception. How- ever, the decline of CIS most likely reflects a fluctuation rather than a true decline, as large fluctuations in the CIS levels of city wells were commonly observed. Sec- ond, the assumption that private wells, once contami- nated, behave as city wells is supported by the close agreement between estimated (modeled) and observed concentrations in these private wells. Third, since the withdrawal by private wells is minimal relative to that by city wells, the groundwater flow in the residential area is determined by the activities in the city well field (4,5). Hence, if the city wells showed increasing con- centrations under prevailing flow conditions, it is rea- sonable to assume that private wells in the same period with the same flow conditions were also affected by increasing VOC levels. Fourth, an increasing trend in VOC concentration was also apparent in most of the other contaminated city wells not selected for modeling. Finally, nearly all private wells were tested in 1981 and 1982 (three wells were tested mid-1983), while none of the city wells showed a definitive decline in VOC levels prior to 1984. That not all wells displayed the expected increase in C over time may also be explained by other factors than distance, immobile aquifer compartments, and differ- ences in sorption properties. Pumping a well causes a cone of depression in the aquifer through a pressure gradient, resulting in water from all sides being drawn to the pump. At the side of the chemical spill, the VOC concentration will b- l igher than on the other sides of the cone. The 11 city wells not polluted, or barely so, were either in the vicinity of the river, or in the northern rim of the well field (Fig. 2). In other words, large supplies of untainted water dominated in the cones of depression of these wells. Wells may also be affected by pumping of neighboring wells if they are within the cone of depression of those wells. Still other assumptions are common to all regression techniques. Since our database consisted of time series, autocorrelation may pose a source of errors (12). How- ever, plots of residuals versus time showed random scat- ter around zero over the time gradient, evidence that our models were not complicated by autocorrelation. Homogeneity of slopes of individual wells was assumed when the time series were pooled to derive a multivar- iable model. Testing for heterogeneity of slopes (13) showed that only one well (no. 38, Table 2) appeared to have a significantly different (steeper) slope, and only for PCE and 11-DCA. However, deleting that well from the data set would have resulted in T1 estimates earlier by not more than 0 to 2 months, too small an improve- ment to justify exclusion of the well from the pooled data. We found that logC = T models had a better overall fit to the observed data than do C = T models. LogC = T models proved better in another aspect as well: errors in estimating Tlou} were hardly dependent on the sample concentration C8, an important feature in view of the much wider range of Cs in private wells than in city wells (Table 1). Further, using logC = T models, the range of observed deviations of estimated from ob- served Tl0,,, in city wells of 1 month to 2 years is con- siderably narrower than the range of errors in assessing the date disease was diagnosed. Our study showed dif- ferences in the reported date of disease onset of up to one decade, even for well-known diseases such as dia- betes (14). Evidence of the reasonable degree of accuracy in es- timating T1 may also be construed from the velocity of the groundwater. A velocity of 1 to 4 feet/day or about 1000 feet/year was measured (4). Well no. 38 in the 219 FRENI AND PHILLIPS northern rim of the well field (Fig. 2) had an observed T1 varying from June 1982 to May 1983 for TCA, 11- DCA, and DCE. Given a straight distance of 1 mile from the VOC source, it can be concluded that water from this source would require 5 years to reach the well, and thus, that VOCs, if traveling at the same speed as groundwater, could have entered the aquifer at the source approximately mid-1977 to mid-1978 to reach well no. 38 by the above dates. Based on similar data on observed T1 for various VOCs in well nos. 32 and 33 in the southern rim of the field, VOCs could have en- tered the aquifer between mid-1977 and the end of 1979. Thus, if VOCs started their journey through the aquifer between mid-1977 and end of 1979, they would have reached the closest private wells about 1200 feet north- west of the source sometime in 1978. This closely agrees with the T1 estimates from the logC = T models for the private wells of 1977 to 1978 (see Fig. 3). Admit- tedly, this way of estimating when wells became con- taminated is crude and ignores sorption and other issues causing VOCs to travel slower than the watery medium. Yet, since this way of reasoning is independent of the C = f(T) approach, the results add to the credibility of the outcomes of the logC = T models. In contrast, C = T models resulted in T1 estimates many decades ear- lier than reasonably possible, especially if the sample concentration was over 50 ppb. We have investigated the significance of a great num- ber of modifications of our method for retrospectively estimating C = f(T), many of which are of a statistical nature. Examples of such variations include expanding the number of wells (or even pooling all wells without selection) to provide the data for slope estimation, re- gressing T on C rather than C on T, replacing the X-Y coordinates with the direct distance and the angle of the well in relation to the point source, rotating the X and Y axes 300 counterclockwise, deleting influential outliers in the time series, pooling sampling results weighted by the inverse of the standard error of the slope of the wells, and using regression models with interaction between T and the coordinates (to account for possible secular trends). None of these variations provided better results with regard to the magnitude of the difference between estimated and observed Tl0,, the ultimate parameter of accuracy, although in some instances a significant increase in the model r2 (the square of the multiple correlation coefficient) was ob- tained. The conclusions that can be drawn from these failures are that pooling wells without applying proper selection criteria is doomed to result in a heterogeneous popu- lation of wells rendering any modeling meaningless, and that well characteristics alone are incomplete predictors of chemical movement in an aquifer. One modification, however, did yield improved results, that is, the addi- tion of water withdrawal parameters to the regressor variables in the statistical model. The number of pump- ing hours (H per month and per well, the pump capacity G of a well, and the product of these two variables (HG), which is the monthly amount of water withdrawn from the aquifer, were used as additional regressors. In ad- dition to improved r2 values, the regression slopes in- creased slightly, resulting in earlier T1 estimates by 2 months or less. Since no information on H was available for private wells, this expanded model could not be ap- plied to residential wells. One possible explanation for the only limited success of this modification is that the compound effect of the interaction of cones of depression of neighboring wells, and the relation of these cones to the proximity of the river and the main axis of the groundwater flow, are complex matters requiring a procedure more sophisti- cated than simply adding the parameters to the model. We intend to further investigate the possible use of withdrawal parameters interacting with data on neigh- boring wells and on the proximity to the river. Clearly, our method cannot, and should not, be used to predict whether, when, and to what degree water from a given well may become contaminated with a chemical of interest. On the other hand, our results do show that when no significant alterations in the local management of an aquifer have been made in the past, the results of a current simple monitoring program for comparable wells can be applied to unmonitored wells to yield meaningful information on the estimated time that the plume of contamination reached those unmon- itored wells and on how the chemical concentrations changed over time. Evidence has been provided of a reasonable accuracy on the results of our approach, re- sults that, given the lack of required data, could not have been obtained by any of the existing hydrologic flow models. Our approach to estimating C = f(T) and T1 is not presented as the ultimate solution for retro- spectively estimating the movement of chemicals in groundwater. It is a pragmatic tool for quantifying ex- posure for epidemiologic purposes superior to the con- ventional, but erroneous, approach of assuming a con- stant exposure level throughout the period of residence in the area. We also feel that there is a good possibility that our approach is applicable to other sites with a similar scenario. Finally, we deliberately refrained from reporting ac- tual estimates of the slopes obtained from the pooled well data in Table 2 to avoid their use for other sites. These values, ranging from 0.08 to 0.17 for the various VOCs, are strictly specific to the Battle Creek site. Obviously, slope values should be estimated for each site separately, as they are determined by site-specific well properties, amount of spilled chemicals, the length of the period during which spilling occurred, and con- centration-time series. Conclusions Under the conditions of the study area, well-moni- toring data can be used to estimate the changes in the concentration of a chemical in well water as a function of time, that is, C = f(T). A chemical-specific C = f(T) developed from moni- toring data can be applied to nearby contaminated wells 220 MOVEMENT OF CHEMICALS IN GROUNDWATER 221 lacking such data to retrospectively estimate when the contamination started ( T1). Under the conditions of the study site, the best of the models studied is logC = a + b1T1 + b2X + b3Y+ b4D + b5G. Specificity to the chemical of interest can be achieved through a backward version of stepwise multivariable regression. Inclusion of variables that reflect some characteristic of the cones of depression of individual wells and of the interaction of the cones of neighboring wells is expected to further improve the accuracy. More data and more studies under comparable con- ditions are needed to determine whether a logC = T model is universal for all wells that are not subject to changes in pumping regime and to determine the de- pendency of C = f(T) of a well on the operational status of neighboring wells. This work was supported in part by funds from the Comprehensive Environmental Response, Compensation, and Liability Trust Fund (Superfund) and from the Michigan Department of Public Health. We are indebted to Mrs. Arthur W. Bloomer, Adrian S. Oudbier, and John W. Bloemker of the Michigan Dept. of Public Health, and Mr. Ted R. Havens from the Calhoun County Health Department for their assistance in compiling the necessary data. We thank Norman G. Grannemann, hydrologist, U.S. Geological Survey, for his assistance with regard to the hydrologic aspects of our manuscript. REFERENCES 1. Littlefield, N. A., Farmer, J. H., Gaylor, D. W., and Sheldon, W. G. Effects of dose and time in a long-term, low dose carcin- ogenic study. In: Innovations in Cancer Risk Assessment (J. E. Staffa and M. A. Mehlman, Eds.), Pathotox Publishers, Park Forest South, IL, 1979, pp. 17-34. 2. Brown, C. C., and Chu, K. C. Implication of the multistage theory of carcinogenesis applied to occupational arsenic exposure. JNCI 70: 455-463 (1983). 3. CH2M-Hill. Remedial action master plan, Verona Well Field, Battle Creek, Michigan. Report to the U.S. Environmental Pro- tection Agency, 1983. 4. Grannemann, N. G., and Twenter, F. R. Geohydrology and groundwater flow at Verona Well Field, Battle Creek, Michigan. U.S. Geological Survey Report 85-4056, 1985. 5. CH2M-Hill. Technical Memorandum Phase I: Drilling and soil sampling Verona Well Field, Battle Creek, Michigan. REM/FIT report C11185 to the U.S. Environmental Protection Agency, 1985. 6. SAS. Stepwise procedure. In: User's Guide: Statistics, 5th ed. SAS Institute, Cary, NC, 1985, pp. 763-774. 7. Shervani, J. W. Groundwater development. In: Elements of Water Supply and Wastewater Disposal, 2nd Ed. (G. M. Fair, J. C. Geyer, and D. A. Okun, Eds.), Wiley & Sons, New York, 1971, pp. 112-162. 8. Bond, F. W., Cole, C. R., and Sanning, D. Evaluation of remedial action alternatives-demonstration/application of groundwater modeling technology. Proc. Natl. Conf. on Management of Un- controlled Hazardous Waste Sites, HCMRI, Silver Spring, MD, November 1982, pp. 118-123. 9. Konikow, L. F. Modeling chloride movement in the alluvial aqui- fer at the Rocky Mountain Arsenal, Colorado. U.S. Geological Survey Water Supply paper 2044, 1977. 10. Goltz, M. N., and Roberts, P. V. Interpreting organic solute transport data from a field experiment using physical nonequili- brium models. J. Contam. Hydrol. 1: 77-93 (1986). 11. Cameron, D. R., and Klute, A. Convective-dispersive solute transport with combined equilibrium and kinetic adsorption model. Water Resour. Res. 13: 183-188 (1977). 12. Neter, J., Wasserman, W., and Kutner, M. H. Applied Linear Statistical Models, 2nd Ed. Richard D. Irwin Inc., Homewood, IL, 1985, pp. 444-465. 13. Freund, R. J., and Littell, R. C. Heterogeneity of Slopes. In: SAS for Linear Models. SAS Institute, Cary, NC, 1985, pp. 200- 205. 14. Freni, S. C., and Bloomer, A. W. Draft report on the Battle Creek Health Study. Centers for Disease Control, Atlanta, GA, 1987.