Although annual data are commonly used to model linear trends and changes in trends of disease incidence, monthly data could provide additional resolution for statistical inferences. Because monthly data may exhibit seasonal patterns, we need to consider seasonally adjusted models, which can be theoretically complex and computationally intensive. We propose a combination of methods to reduce the complexity of modeling seasonal data and to provide estimates for a change in trend when the timing and magnitude of the change are unknown. To assess potential changes in trend, we first used autoregressive integrated moving average (ARIMA) models to analyze the residuals and forecast errors, followed by multiple ARIMA intervention models to estimate the timing and magnitude of the change. Because the variable corresponding to time of change is not a statistical parameter, its confidence bounds cannot be estimated by intervention models. To model timing of change and its credible interval, we developed a Bayesian technique. We avoided the need for computationally intensive simulations by deriving a closed form for the posterior distribution of the time of change. Using a combination of ARIMA and Bayesian methods, we estimated the timing and magnitude of change in trend for tuberculosis cases in the United States. Published 2012. This article is a US Government work and is in the public domain in the USA.

The incidence rate of tuberculosis (TB) in the United States declined from 52.6 cases per 100,000 persons in 1953 to 5.8/100,000 in 2000 [

Considering the steady decline in TB trend in previous years, we were interested to determine what statistical inference about the change in trend, and the timing and magnitude of the change, could be drawn. However, with only two newly observed annual data points outside of the prediction bounds of the linear trend, limited statistical inference was possible. Therefore, we used detailed monthly case count data [

Tuberculosis cases reported to the NTSS from January 1, 2000 through December 31, 2010 were included for analysis if TB treatment had started within the same reporting year. The use of the treatment start date and the report date for the selection of the TB cases ensured comparable monthly data throughout the study period, particularly for the last months of 2010 in comparison to earlier years. Cases were aggregated by month of treatment start date. As shown in

Because monthly time series data were seasonal, linear trend analysis without seasonal adjustment would not have been appropriate and more sophisticated models were required. To estimate the trend and change in trends in a time series data, autoregressive integrated moving average (ARIMA) models can be explored [

Because it is theoretically complex and computationally intensive to build a dynamic model combining the time series and Bayesian components, we approached the modeling in several stages. First, we used ARIMA models to analyze the residuals and forecast errors. Next, we used several ARIMA intervention models to estimate the timing and magnitude of the drop, specifying for each model that the intervention occurred at a different month in the vicinity of late 2008/early 2009. Finally, Bayesian analysis [

Time series methodology may be used to assess seasonal patterns in the monthly case data. We employed classical seasonal ARIMA models [_{t} is the expected number of cases at time _{t} is the white noise error term at time _{t} = _{t−1}. These models include: autoregressive parameters _{i}, β_{i} (seasonal) with orders _{i}, γ_{i} (seasonal) with orders

To select the best model for monthly TB case counts with associated orders (^{2} was also computed. We selected the best model and used it to forecast the monthly data for 2008 to 2010. Observed data from 2008 were compared with the model forecast for validation. The goal was to establish a model that fit 2000–2007 data and reliably forecast 2008 data. The model was used to predict the monthly case counts for 2009–2010 assuming no change in trend.

To evaluate the model and make statistical inferences from the monthly data, we examined residuals from the 2001–2007 data and differences of observed and expected cases for 2008 to 2010. We used

Intervention analysis has been used in economics and other disciplines [

Here, _{t} is an ARIMA model, β is the magnitude of the drop, and _{t} is a step function.

In the monthly TB data, however, the time of the drop in cases was unknown. Therefore, we constructed a series of intervention models, one model for each month from July 2007 to June 2010, with the month corresponding to the intervention time,

In the intervention model, confidence bounds for the time of the drop are inestimable because

To estimate the timing of the drop in TB cases and its confidence bounds, a Bayesian approach was developed. In this approach, the residuals of 2001–2007 and the differences of observed and expected data of 2008–2010 in the first ARIMA model were used for analysis. The monthly data of observed minus expected (_{1}, _{2}, …, _{n}) from 2001 to 2010 were assumed to follow two normal distributions: the first distribution before month

To facilitate calculation, we chose the conjugate prior distribution of _{i} given

We assumed a noninformative uniform prior distribution for

The corresponding joint prior densities, assumed to be independent of each other, are

The joint posterior density is given by

We integrated over the nuisance parameters (_{1}, _{2},

Hence, the posterior probability distribution of

Using this posterior density, the timing of the drop and its credible intervals can then be obtained.

All the computations were carried out by using the software SAS 9.2 (SAS Institute, NC, USA) and the R 2.13.1 software (

A seasonally adjusted model for the monthly 2000–2010 TB data is shown in _{12} model produced the best fit to the 2000–2007 log transformed data (^{2} = 0.96). Because of the first-order difference of the ARIMA model, the residuals start at 2001 (

To assess the validity of model forecast, the differences of observed data and expected values from the model for 2008 were tested against the residuals of 2001–2007. No significant difference was found by

When the differences of observed data and expected values for 2009 and 2010 were tested against the residuals of 2001–2007, significant differences were found by

A substantial drop in TB cases in late 2008 and early 2009 was observed (_{12} models with step intervention functions of different magnitudes at each intervention month were compared by computing their corresponding AIC values (

To estimate the credible intervals (CI) for the timing of the intervention, a Bayesian technique was developed. The residuals of 2001–2007 and the differences of the observed data and expected values from the ARIMA model in first stage, normalized by a factor of 0.01 (_{1} = 0, θ_{2} = −1, κ_{i} = 1, _{i} = 0.5, and

In this paper, we propose a combination of methods to model time series data to achieve higher resolution for statistical inferences. The three-stage modeling for the time series data in this paper provides a practical approach for statistical inferences about the details of the change or early detection of a change in the trend of time series data.

Beyond seasonal monthly data, time series data in public health may include other operational data (e.g., engineering or financial data) with higher resolution such as weekly or daily counts or in smaller units. Usually, time series data with higher resolutions may contain more random noise or have different patterns and therefore be harder to analyze than annual data. Considering the efficiency needed to produce analytical results or preliminary conclusions for public health data, such as rapid analysis of pandemic influenza trends, building a more general theoretical model than the one that we present might not be feasible or practical because of the complexity involved.

The usefulness of this approach is reflected in the modeling of the TB time series data. First, the time series data can be assessed at different stages to learn the patterns of the data and to ensure the validity of the modeling at each stage. Second, the approach by different stages will reduce the complexity of general modeling and produce interpretable results for each stage. Third, the methods for each stage may stand alone to produce estimates for different data series or for difference purposes.

The contributions of the Bayesian analysis include detecting the change in location and its credible interval for data series from another process and for the original data series. Because a closed form posterior distribution was derived, the effects of sampling distributions and values of prior parameters can be analyzed and interpreted in the process of updating the posterior distribution. This Bayesian approach can also be used to estimate the change in magnitude. Our Bayesian interval estimates are similar to results obtained from a Joinpoint analysis used in another related paper [

In summary, the three-stage approach in this paper combining ARIMA, ARIMA intervention, and Bayesian methods provides a practical and effective way for the estimation and interpretation of operational data in public health or other areas. Future work may involve building a single Bayesian model combining the ARIMA and the Bayesian components that we presented.

The authors would like to thank Andrew Hill for his proofreading and suggestions and thank Chad Heilig and Dylan Shepardson for their comments and suggestions. We thank the local and state health department personnel who collected national tuberculosis surveillance data used in these analyses. Routine data in these analyses have been determined to be public health surveillance and not human subjects research requiring oversight by an institutional review board.

The findings and conclusions are those of the authors and do not necessarily represent the views of the Centers for Disease Control and Prevention.

Annual tuberculosis incidence rates in the United States, 2000–2010.

Monthly TB cases in the United States and ARIMA model, 2000–2010.

Differences of observed and expected TB cases from ARIMA model.

AIC values of intervention models by timing of intervention.

Posterior probability distribution for timing of change.