We describe a simple statistical model that allows for a comparison of staging data from the Centers for Disease Control and Prevention’s (CDC’s) National Program of Cancer Registries during 1998–2008. In this program, cancers diagnosed during 1998–2000 were coded according to Summary Stage 1977, those diagnosed during 2001–2003 according to Summary Stage 2000, and those diagnosed during 2004–2008 according to the Collaborative Stage system. These changes in stage coding systems were associated with an abrupt shift in the distribution of extent of disease for colorectal cancer, particularly changes in the proportion of local vs regional stage disease, in some states. The method described here adjusts for the use of different staging systems over time so that temporal trends in the distribution of extent of disease can be evaluated. The method is applied to the proportion of localized stage colorectal cancer, but should be applicable to other cancers.
In the United States, the age-adjusted incidence rate for colorectal cancer (CRC) was 44.4 per 100,000 persons in 2008 for men and women combined.
Summary stage was developed to categorize the extent of disease (EOD)—how far a cancer has spread from its point of origin.
Collaborative stage (CS) was developed to consolidate the principles of 3 different coding systems: the American Joint Committee on Cancer (AJCC) primary tumor, regional lymph nodes, and distant metastasis (TNM) staging; Summary Stage 1977 (SS77); and Summary Stage 2000 (SS2000).
The purpose of this report is to describe a methodology by which the staging data from CDC’s National Program of Cancer Registries (NPCR) can be compared during 1998–2008. This method corrects for changes in coding systems during this period. In addition, we applied the methodology to 5 Surveillance Epidemiology and End Results (SEER) state registries to determine if stage coding changes had a similar effect on NPCR and SEER state registries. We illustrate the method by examining local stage diagnoses for CRC—cecum (International Classification of Diseases for Oncology, 3rd edition
The methodology was applied to data from CDC’s NPCR state registries and the 5 state registries supported by the National Cancer Institute through the Surveillance and Epidemiology and End Results (SEER) program. Data were used from 43 NPCR and the 5 SEER state registries. For each state, we computed the proportion of CRCs that was diagnosed at local stage each year during the 10-year period from 1998 to 2008. The proportion of CRC diagnosed as local stage was obtained by dividing the number of cancers classified as local by the total number of CRC cases, excluding unstaged cases. Mississippi and South Dakota were excluded from the analysis because these registries did not submit data to the CDC for 3 or more years during the period. The District of Columbia was not included because the analysis was restricted to state registries.
We estimated the annual change in local stage CRC for each of the 48 states by regressing the log-odds of local stage (ie, the logarithm of the number of cases with localized disease divided by the number with regional and distant disease) on year of diagnosis. To evaluate the potential effect of changes in CRC staging, we included a stage indicator variable for years 2001–2008, reflecting potential coding changes occurring because of SS2000 (an indicator variable taking the value 1 for years 2001–2008). In addition, we included an indicator variable for potential coding changes occurring in 2004 with CS coding through the use of another indicator variable taking the value 1 for years 2004–2008. The purpose of these indicator variables was to ensure that any trend in the log-odds of localized CRC with year was not affected by the use of a new cancer staging system incompatible with older coding systems. The methodology is described in more detail in the
We also fit a random effects logistic regression models adding indicator variables for older age (ie, ≥ 50 years at diagnosis), male gender, and black race to the model containing the 2 indicators for staging changes and the ordinal variable for age. The data was abstracted from the NPCR and SEER analytic file by using the program SEER*Stat,
We examined the data to assess how many central registries reported staging data with shifts in local stage that correlated by year with changes in the staging systems. On the basis of our models, 28 of 43 NPCR states displayed an abrupt and statistically significant increase (p < 0.05) in local stage CRC in 2004, which corresponds with the implementation of collaborative stage (data not shown). Six states displayed a decrease in local stage CRC in 2004, but none of these decreases was statistically significant. No SEER state registry (ie, Hawaii, Iowa, Connecticut, New Mexico, Utah) displayed a statistically significant change in the proportion of localized disease in 2004. The unweighted mean increase in the log-odds of localized disease for all 48 states in 2004 from the model is 0.152 (95% CI: 0.117, 0.188) with a range of −0.087 to 0.470.
The changes in the log-odds of localized disease in 2001 from the model associated with the coding changes introduced by SS2000 were considerably smaller. Thirty-six states displayed an increase in localized disease in 2001; 13 of these increases were statistically significant, and all were NPCR states (data not shown). Twelve states displayed a decrease in localized disease in 2001, but only 2 (Louisiana and New York) were statistically significant. The unweighted mean change in 2001 was 0.033 (0.007, 0.059) with a range −0.258 to 0.194.
The overall findings for the 48 states for white and black groups are shown in
In addition, we graphed the proportion of local, regional, distant, and unknown stage CRC without adjustment to assess the results of the new staging systems further for whites and blacks, as shown in
After adjusting for coding changes in stage, we observed that among whites, 31 of the 48 states included in the analysis displayed an increase in the proportion of local stage CRC over the period; however, only 4 increases were statistically significant (p < 0.05%, California, Connecticut, Texas, and Wisconsin;
For assessing stage among blacks, we restricted the analysis to states with 250 or more black CRC cases during the period and adjusted for changes in coding systems. Of the 35 states meeting this criterion, 23 displayed an increase in the proportion of local stage CRC, of which 6 were statistically significant (p < 0.05; Iowa, Louisiana, New York, North Carolina, and Virginia;
On the basis of the random effects logistic model described in the methods section, the odds of localized CRC for men vs women is nearly the same (OR = 1.04, men compared with women), whereas the proportion of localized disease is less common among black compared with white groups (OR = 0.85; 95% CI: 0.84, 0.86), and localized disease is more common among cases aged 50 years or older (OR = 1.42; 95% CI: 1.40, 1.44).
We repeated the analysis after removing all the data from 4 states that did not meet the NPCR quality standards for publication in the United States Cancer Statistics (USCS) for 3 or more years during 1998–2008 (Arkansas, North Carolina, Tennessee, and Virginia) and the data for 1998 from Georgia, New Hampshire, and Maryland that also did not meet USCS quality standards in that year. The exclusion reduced the study size by 9%. However, the overall findings did not change meaningfully. For example, in the reduced dataset, the corresponding OR for a 10-year increase among whites displayed in
As a check on the validity of our methodology, we attempted to duplicate our findings by using the smaller population covered by SEER 17 for the period 2000–2008. Because stage coding changes had little effect on EOD staging in the SEER states in our analysis, we used SS2000 for this analysis, and we did not include an indicator variable for the introduction of CS in 2004. We simply regressed the logarithm of the age-adjusted rates on year. The odds of local stage CRC among whites increased by 3% (OR = 1.03; 95% CI: 1.01, 1.06) during the 8-year period (OR = 1.04; 95% CI: 1.01, 1.08), projected for 10 years). Among blacks, the odds increased by 20% (OR = 1.20; 95% CI: 1.12, 1.29) during the 8-year period (OR = 1.25; 95% CI: 1.15, 1.37, for 10 years). The difference in this increase is statistically significantly different for whites and blacks (p < 0.001). Thus, the findings from SEER 17 agree well with those obtained by using the NPCR and SEER states.
When we assessed the proportion of CRC cases diagnosed at a local stage during 1998–2008 in NPCR, a shift in stage was apparent in 2004 in 28 of 43 states, which corresponds with the implementation of CS. The shift in 2001 related to the introduction of SS2000 was smaller, but still observable in 36 NPCR states. We developed a method to adjust for these shifts in order to assess changes in the proportion of CRC diagnosed at local stage over time. If the stage coding changes in 2001 and 2004 were not accounted for in an analysis, the resulting odds ratios for local stage would be incorrect at a national level and for most states.
On the basis of the random effects logistic model, we were able to examine differences in the odds of localized CRC by gender and race. Although the odds of localized disease was similar for men and women, the proportion of localized disease was less common among black compared with white groups. In addition, differences were noted among states in the rate of change in the proportion of local stage CRC over time. These differences need to be examined more closely to assess the possible reasons. Our focus was on demonstrating the methods for conducting these types of analyses.
We included data from NPCR states in our analyses that did not meet standards for publication in USCS for 3 or more years during 1998–2008 (Arkansas, North Carolina, Tennessee, and Virginia) and the data for 1998 from Georgia, New Hampshire, and Maryland that also did not meet USCS quality standards in that year. However, when we ran the analysis again excluding these data, the results did not change significantly. We compared results from this analysis with an analysis of SEER 17 data to validate of our methodology for the period 2000–2008. The SEER state registries in our analysis did not demonstrate the shifts in staging because of the introduction of SEER SS2000 or CS and, consequently, we assumed that the SEER 17 registries also did not require adjustment. The observation that the findings from our adjusted analysis agree well with those of the unadjusted SEER 17 analysis supports the validity of our methodology.
Other investigators have examined differences in cancer staging resulting from the adoption of the CS in 2004. Wu et al reported that staging differences observed during 2003–2004 for 18 cancers could not be explained by linear trends in stage distribution during 2001–2004 or by a decrease in the percentage of unstaged cancers.
The indicator variables in the model both for 2001 and 2004 had little effect for SEER states. The absence of abrupt changes in CRC staging in the SEER program associated with changes in staging systems probably reflects the fact that SEER has a long and extensive history of coding EOD and, hence, the changes did not affect SEER CRC stage distribution. However, the inclusion of the indicator variables in the regression model for the SEER registries would not bias the estimates associated with the annual change in the proportion of local stage CRC in the SEER states. On the other hand, it is critical to include the indicator variables for stage coding changes for NPCR states. Without the correction for staging changes, one would observe that the proportion of local stage CRC had increased considerably during the 10-year period both for whites and blacks and might attribute these large increases in local stage CRC to earlier detection through screening. However, the bulk of the increase in local stage CRC is an apparent artifact introduced by refinements to the stage coding systems. That part of the increase potentially attributable to screening is considerably less as evidenced by the adjusted NPCR data as well as by the unadjusted SEER 17 data.
In summary, we describe a methodology that allows for a comparison of cancer staging in the period 1998–2008 by NPCR state tumor registries and at a national level. The methodology employs standard regression techniques that are widely used and for which computer software is readily available. The method could be applied easily at state registries. Central cancer registries play an important role in the evaluation of the success of screening interventions. Stage at diagnosis is one important measure of the success of cancer screening. Assessing changes in stage at diagnosis over time has been hampered in NPCR by the changes in stage introduced by updated staging systems used during the past 10 years. Health departments and comprehensive cancer coalitions can use this methodology to adjust for these changes and properly evaluate stage at diagnosis within their region or within specific populations. Of course, it is also important to examine all possible reasons for shifts in cancer stage when interpreting results from proportional analyses. In addition, this methodology can be used for assessing rates of disease by stage; however, rates will reflect changes in population risk, as well as changes caused by earlier diagnosis.
These findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
These data were provided by the central cancer registries participating in either the National Program of Cancer Registries (NPCR) (January 2008 data submission) or the Surveillance, Epidemiology, and End Results (SEER) Program (November 2008 submission).
The regression model is the following:
To facilitate interpretation of the model results, we adjusted the staging data for all years to be comparable to the CS coding introduced in 2004. We accomplished this task by adding β̂2 to the observed log-odds of local stage CRC for years 2001–2003 and by adding β̂2 + β̂1 to the observed log-odds of CRC for years 1998–2000. We then fit a straight line to the adjusted values, which has the following form:
a. Log-odds of Observed and Adjusted Local Stage CRC Stage by Year among Whites
b. Proportion of Local, Regional, Distant, and Unknown CRC Stage by Year among Whites
a. Log-odds of Observed and Adjusted Local Stage CRC Stage by Year among Blacks
b. Proportion of Local, Regional, Distant, and Unknown CRC Stage by Year among Blacks.
| a. The Annual Change in the Log-odds of Local Stage CRC, P-value, and the 10-Year Change in the Odds of Local Stage CRC with 95% Confidence Limits by State Among Whites | ||||||
|---|---|---|---|---|---|---|
| State | Annual change | p-value | OR | ORlower | ORupper | N |
| Alabama | −0.0481 | <0.001 | 0.62 | 0.48 | 0.80 | 18,889 |
| Alaska | 0.0105 | >.20 | 1.11 | 0.45 | 2.71 | 1,510 |
| Arizona | −0.0043 | >.20 | 0.96 | 0.75 | 1.22 | 21,569 |
| Arkansas | −0.0127 | >.20 | 0.88 | 0.65 | 1.20 | 13,307 |
| California | 0.0147 | 0.004 | 1.16 | 1.05 | 1.28 | 123,164 |
| Colorado | −0.0129 | >.20 | 0.88 | 0.67 | 1.15 | 16,775 |
| Connecticut | 0.0267 | 0.039 | 1.31 | 1.01 | 1.68 | 19,848 |
| Delaware | 0.0120 | >.20 | 1.13 | 0.64 | 2.00 | 3,964 |
| Florida | 0.0046 | >.20 | 1.05 | 0.93 | 1.18 | 91,764 |
| Georgia | 0.0102 | >.20 | 1.11 | 0.90 | 1.37 | 27,294 |
| Hawaii | 0.0218 | >.20 | 1.24 | 0.56 | 2.78 | 1,787 |
| Idaho | 0.0096 | >.20 | 1.10 | 0.69 | 1.75 | 5,701 |
| Illinois | −0.0010 | >.20 | 0.99 | 0.86 | 1.14 | 60,330 |
| Indiana | −0.0021 | >.20 | 0.98 | 0.80 | 1.20 | 31,973 |
| Iowa | 0.0137 | >.20 | 1.15 | 0.89 | 1.47 | 20,362 |
| Kansas | 0.0070 | >.20 | 1.07 | 0.79 | 1.45 | 13,947 |
| Kentucky | −0.0006 | >.20 | 0.99 | 0.79 | 1.25 | 23,718 |
| Louisiana | 0.0204 | 0.128 | 1.23 | 0.94 | 1.59 | 18,170 |
| Maine | −0.0257 | 0.187 | 0.77 | 0.53 | 1.13 | 8,622 |
| Maryland | 0.0040 | >.20 | 1.04 | 0.80 | 1.35 | 18,486 |
| Massachusetts | 0.0110 | >.20 | 1.12 | 0.93 | 1.35 | 36,540 |
| Michigan | 0.0094 | >.20 | 1.10 | 0.92 | 1.31 | 42,374 |
| Minnesota | −0.0097 | >.20 | 0.91 | 0.73 | 1.13 | 24,907 |
| Missouri | −0.0043 | >.20 | 0.96 | 0.78 | 1.18 | 30,206 |
| Montana | 0.0299 | >.20 | 1.35 | 0.81 | 2.25 | 4,807 |
| Nebraska | 0.0075 | >.20 | 1.08 | 0.76 | 1.54 | 10,415 |
| Nevada | −0.0033 | >.20 | 0.97 | 0.66 | 1.42 | 8,522 |
| New Hampshire | 0.0307 | 0.161 | 1.36 | 0.89 | 2.09 | 6,639 |
| New Jersey | −0.0003 | >.20 | 1.00 | 0.84 | 1.18 | 44,883 |
| New Mexico | 0.0040 | >.20 | 1.04 | 0.69 | 1.57 | 7,359 |
| New York | −0.0063 | >.20 | 0.94 | 0.83 | 1.06 | 88,115 |
| North Carolina | 0.0001 | >.20 | 1.00 | 0.83 | 1.21 | 33,307 |
| North Dakota | 0.0265 | >.20 | 1.30 | 0.75 | 2.26 | 4,103 |
| Ohio | 0.0008 | >.20 | 1.01 | 0.87 | 1.17 | 56,275 |
| Oklahoma | −0.0135 | >.20 | 0.87 | 0.66 | 1.16 | 16,321 |
| Oregon | 0.0127 | >.20 | 1.14 | 0.87 | 1.49 | 17,264 |
| Pennsylvania | −0.0079 | >.20 | 0.92 | 0.81 | 1.05 | 79,263 |
| Rhode Island | 0.0263 | >.20 | 1.30 | 0.84 | 2.02 | 6,666 |
| South Carolina | 0.0154 | >.20 | 1.17 | 0.89 | 1.53 | 16,303 |
| Tennessee | 0.0092 | >.20 | 1.10 | 0.87 | 1.38 | 23,367 |
| Texas | 0.0145 | 0.027 | 1.16 | 1.02 | 1.31 | 74,649 |
| Utah | −0.0094 | >.20 | 0.91 | 0.61 | 1.37 | 7,209 |
| Vermont | 0.0372 | >.20 | 1.45 | 0.78 | 2.69 | 3,424 |
| Virginia | 0.004 | >.20 | 1.04 | 0.84 | 1.29 | 26,766 |
| Washington | 0.0123 | >.20 | 1.13 | 0.91 | 1.41 | 26,077 |
| West Virginia | −0.0153 | >.20 | 0.86 | 0.63 | 1.18 | 12,433 |
| Wisconsin | 0.0318 | 0.005 | 1.37 | 1.10 | 1.71 | 27,311 |
| Wyoming | 0.0497 | 0.184 | 1.64 | 0.79 | 3.43 | 2,333 |
| b. The Annual Change in the Log-odds of Local Stage CRC, P-value, and the 10-Year Change in the Odds of Local Stage CRC with 95% Confidence Limits by State Among Blacks | ||||||
|---|---|---|---|---|---|---|
| State | Annual change | p-value | OR | ORlower | ORupper | N |
| Alabama | −0.0601 | 0.011 | 0.55 | 0.34 | 0.87 | 5,547 |
| Arizona | 0.0948 | >.20 | 2.58 | 0.57 | 11.60 | 532 |
| Arkansas | 0.0406 | >.20 | 1.50 | 0.65 | 3.45 | 1,923 |
| California | 0.0405 | 0.02 | 1.50 | 1.07 | 2.11 | 10,699 |
| Colorado | −0.0743 | >.20 | 0.48 | 0.11 | 2.00 | 583 |
| Connecticut | 0.0067 | >.20 | 1.07 | 0.42 | 2.71 | 1,397 |
| Delaware | 0.0228 | >.20 | 1.26 | 0.32 | 4.87 | 728 |
| Florida | −0.0009 | >.20 | 0.99 | 0.69 | 1.42 | 9,356 |
| Georgia | 0.0045 | >.20 | 1.05 | 0.74 | 1.48 | 10,001 |
| Illinois | 0.0018 | >.20 | 1.02 | 0.71 | 1.47 | 9,546 |
| Indiana | 0.0135 | >.20 | 1.14 | 0.56 | 2.32 | 2,466 |
| Iowa | 0.4046 | 0.002 | 57.17 | 4.60 | 711.08 | 250 |
| Kansas | 0.0777 | >.20 | 2.17 | 0.58 | 8.16 | 710 |
| Kentucky | 0.0190 | >.20 | 1.21 | 0.51 | 2.89 | 1,659 |
| Louisiana | 0.0649 | 0.003 | 1.91 | 1.25 | 2.94 | 7,031 |
| Maryland | −0.0110 | >.20 | 0.90 | 0.57 | 1.42 | 5,883 |
| Massachusetts | −0.0410 | >.20 | 0.66 | 0.25 | 1.74 | 1,350 |
| Michigan | −0.0307 | >.20 | 0.74 | 0.47 | 1.14 | 6,821 |
| Minnesota | 0.1588 | 0.075 | 4.89 | 0.85 | 28.08 | 440 |
| Missouri | 0.0439 | >.20 | 1.55 | 0.84 | 2.88 | 3,273 |
| Nebraska | −0.0610 | >.20 | 0.54 | 0.06 | 4.85 | 259 |
| Nevada | 0.0387 | >.20 | 1.47 | 0.35 | 6.11 | 608 |
| New Jersey | −0.0050 | >.20 | 0.95 | 0.58 | 1.55 | 5,621 |
| New York | 0.0414 | 0.008 | 1.51 | 1.11 | 2.05 | 13,360 |
| North Carolina | 0.0386 | 0.047 | 1.47 | 1.01 | 2.15 | 8,359 |
| Ohio | 0.0118 | >.20 | 1.13 | 0.71 | 1.79 | 5,977 |
| Oklahoma | 0.0954 | 0.082 | 2.60 | 0.89 | 7.61 | 1,097 |
| Pennsylvania | −0.0076 | >.20 | 0.93 | 0.60 | 1.44 | 6,486 |
| South Carolina | 0.0197 | >.20 | 1.22 | 0.76 | 1.95 | 5,782 |
| Tennessee | 0.0067 | >.20 | 1.07 | 0.60 | 1.90 | 3,681 |
| Texas | −0.0060 | >.20 | 0.94 | 0.67 | 1.33 | 10,732 |
| Virginia | 0.0685 | 0.002 | 1.98 | 1.27 | 3.08 | 6,571 |
| Washington | 0.0647 | >.20 | 1.91 | 0.50 | 7.31 | 681 |
| West Virginia | −0.0464 | >.20 | 0.63 | 0.10 | 4.16 | 364 |
| Wisconsin | −0.0125 | >.20 | 0.88 | 0.31 | 2.49 | 1,110 |