An often overlooked problem in building statistical models is that of endogeneity, a term arising from econometric analysis, in which the value of one independent variable is dependent on the value of other predictor variables. Because of this endogeneity, significant correlation can exist between the unobserved factors contributing to both the endogenous independent variable and the dependent variable, which results in biased estimators (incorrect regression coefficients) (_{1} is reduced by a factor of (1-r^{2}_{(1|2,3,….)}), where r_{(1|2,3,….)} is defined as the multiple correlation coefficient for the model X_{1} = f(X_{2},X_{3},…), and all X_{i} are independent variables in the larger model (

The results of this study clearly show that the presence of bloody diarrhea is an endogenous variable in the model showing predictors of hemolytic uremic syndrome, in that the diarrhea is shown to be predicted by, and therefore strongly correlated with, several other variables used to predict hemolytic uremic syndrome. Similarly, Shiga toxin 1 and 2 (

This flaw is a particular problem with studies that use a conditional stepwise technique for including or excluding variables. The authors note that they excluded variables from the final model if the significance in initial models for those variables was less than an α level (p value) of 0.05. Given the inefficiencies due to the endogeneity of bloody diarrhea, as well as those that may result from other collinearities significant predictors were likely excluded from the study, although this cannot be confirmed from the data presented.

The problems associated with the endogeneity of bloody diarrhea can be overcome by a number of approaches. For example, the simultaneous equations approach, such as that outlined by Greene (

The underlying problem in the study is the theoretical specifications for the model, in which genotypes, strains, and symptoms are mixed, despite reasonable expectations that differences in 1 level may predict differences in another. For example, the authors’ data demonstrate that all O157 strains contain the _{2}

Model for determining virulence factors for hemolytic uremic syndrome.

_{2} gene, age of the patient, and occurrence of bloody diarrhea. The critique relates to the fact that bloody diarrhea and _{2} are not independent, since we showed that _{2} was strongly associated with progression to HUS (odds ratio [OR] = 18.9) and also weakly associated with development of bloody diarrhea (OR = 2.5) (

A) Representation of the original model (

A second line of critique of our methods apparently develops from the idea that virulence factors determine the serogroup. This idea, however, is a biological misconception. In fact, virulence genes and serogroup are independent at the genetic level, and an important point of our article is that HUS is determined by the virulence gene composition of the strain rather than the serogroup.

Regardless of the status of the bloody diarrhea variable, excluding it from the model doesn't change the conclusions of the article. A revised model contains only the significant variables age and _{2} (

Determinant | No. of patients | No. (%) with HUS | Original model, OR (95% CI) | New model, OR (95% CI) |
---|---|---|---|---|

Negative | 111 | 0 (0.0) | ||

Positive | 232 | 21 (9.1) | NI | NI |

_{2} | ||||

Negative | 159 | 1 (0.6) | 1 | 1 |

Positive | 184 | 20 (10.9) | 18.9 (2.4–146) | 24.6 (3.2–187) |

Age | ||||

| 178 | 3 (1.7) | 1 | 1 |

| 165 | 18 (10.9) | 11.4 (3.2–41.3) | 9.7 (2.7–34.1) |

Bloody diarrhea | ||||

No | 218 | 6 (2.8) | ||

Yes | 125 | 15 (12.0) | 4.5 (1.6–12.7) | EX |

O157 | ||||

No | 262 | 10 (3.8) | ||

Yes | 81 | 11 (13.6) | NS | NS |

*HUS, hemolytic uremic syndrome; STEC, Shiga toxin–producing