Conceived and designed the experiments: NK. Performed the experiments: MM RP. Analyzed the data: MM. Wrote the manuscript: PR MM. Compiled and organized data: RP. Served as reviewers for the systematic search: RP MM. Provided critical edits to analysis and interpretation of data: MG. Advised on article inclusion, background, analysis and interpretation of data, and contributed to revising the manuscript: TMU. Provided advice on design of work: JB. Provided professional translation for acquisition of Chinese data: YX. Conceptualized the project, defined research questions, collaborated on substantial portions of manuscript writing, and served as an arbitrator for the systemic search: NK.
Human infections with highly pathogenic avian influenza (HPAI) A (H5N1) viruses have occurred in 15 countries, with high mortality to date. Determining risk factors for morbidity and mortality from HPAI H5N1 can inform preventive and therapeutic interventions.
We included all cases of human HPAI H5N1 reported in World Health Organization Global Alert and Response updates and those identified through a systematic search of multiple databases (PubMed, Scopus, and Google Scholar), including articles in all languages. We abstracted predefined clinical and demographic predictors and mortality and used bivariate logistic regression analyses to examine the relationship of each candidate predictor with mortality. We developed and pruned a decision tree using nonparametric Classification and Regression Tree methods to create risk strata for mortality.
We identified 617 human cases of HPAI H5N1 occurring between December 1997 and April 2013. The median age of subjects was 18 years (interquartile range 6–29 years) and 54% were female. HPAI H5N1 case-fatality proportion was 59%. The final decision tree for mortality included age, country, per capita government health expenditure, and delay from symptom onset to hospitalization, with an area under the receiver operator characteristic (ROC) curve of 0.81 (95% CI: 0.76–0.86).
A model defined by four clinical and demographic predictors successfully estimated the probability of mortality from HPAI H5N1 illness. These parameters highlight the importance of early diagnosis and treatment and may enable early, targeted pharmaceutical therapy and supportive care for symptomatic patients with HPAI H5N1 virus infection.
Since 1997, human and poultry outbreaks of highly pathogenic avian influenza (HPAI) A (H5N1) have had devastating health, economic, and social impact in 15 countries in Asia, Africa, and the Middle East
Human cases of HPAI H5N1 virus infection with high mortality continue to be detected sporadically in several countries
We aimed to statistically model individuals at highest risk of mortality from HPAI H5N1 virus infection. We systemically searched for all available data on human infections with HPAI H5N1 viruses to create a database of cases reported since the initial 1997 outbreak in Hong Kong (SAR, China). To model demographic and clinical predictors of mortality in human infection, we developed a decision tree using Classification and Regression Tree (CART) methodology
We used World Health Organization (WHO) Global Alert and Response (GAR) updates and performed systematic searches of three databases (PubMed, Scopus, and Google Scholar) to compile all confirmed and possible human cases of HPAI H5N1 virus infection. We included all articles published between January 1, 1997 and April 19, 2013 with keywords “H5N1,” “human,” and “humans.” We excluded articles that described non-human cases (animal or molecular studies), did not report individual case data, did not include data on laboratory-confirmed HPAI H5N1 cases, or described asymptomatic infections (e.g., seroprevalence studies).
We defined confirmed human H5N1 cases using the World Health Organization guidelines, requiring isolation of HPAI H5N1 virus, a positive result by reverse transcription polymerase chain reaction (RT-PCR) testing of clinical specimens using H5-specific primers and probes, an elevated H5-specific antibody titer of ≥1∶80 (or equivalent using the WHO protocol), or at least a fourfold rise in H5N1 virus neutralization antibody titer in paired sera
We initially created a database of all cases published on the WHO Global Alert and Response (GAR) website, which includes only HPAI H5N1 cases reported from November 2003 to present. Although clinical laboratory data were not provided for each case, these were assumed to satisfy WHO reporting criteria. We then attempted to match all cases identified through literature sources to this database.
Two independent investigators (RP, MM) evaluated each article for inclusion; a third investigator (NK) resolved all disagreements. We included articles in all languages. A professional translator (YX) evaluated the numerous Chinese language articles. For Japanese, Russian, French, and Spanish languages, we verified inclusion with native-language speakers. For all other languages, we used PDF OCR X Community Edition for file conversion into text format (version 1.9.32, Burnaby, British Columbia) and a web-based translator for translation into English
We then extracted the predefined set of variables for each case (
Using our pre-defined
We obtained PCGEH at the average exchange rate (USD) for each country and year through the World Health Organization Global Health Observatory Data Repository
We performed all statistical analyses using R software (Version 3.0.0, Vienna, Austria) and defined statistical significance by an alpha level of 0.05. Our primary analytic goal was to develop a parsimonious decision tree model with optimal predictive ability for mortality following HPAI H5N1 virus infection. We first assessed bivariate associations between each predictor variable and mortality using logistic regression models. For continuous predictors (age, delay from hospitalization to symptom onset, and PCGEH), we visually assessed for the linearity assumption (
All continuous predictor variables had potentially nonlinear relationships with death, so we analyzed them both in continuous and in categorical form. We divided age into four categories similar to those associated with mortality from influenza A (H1N1)pdm09: 0–4 years, >4–18 years, >18–25 years, and >25 years
As we performed initial analyses, we found that several parameters were missing data. We therefore developed a decision tree using CART methods
CART procedures build a decision tree by selecting locally optimal splits that minimize “impurity” on the outcome measure of the two child nodes. Low impurity on the outcome measure indicates that the classifier performs well at separating observations with one outcome (e.g., death) from observations with another outcome (e.g., survival). For example, if sex is a strong risk factor for mortality, then mortality will be similar within each sex and different between sexes. All possible binary splits are considered for both continuous and categorical variables. The initial split is chosen as the single best classifier on the outcome measure; then, within each child node, the splitting procedure is recursively repeated until no further splits are possible. All observations, including those with missing data, are included in model-building: at each split, the impurity index is simply calculated over only those observations not missing the relevant predictor variable. To avoid over-fitting, the initial large tree is pruned based on a cost-complexity index, which captures the tradeoff between better fit and added complexity due to each additional node in the tree
We assessed model performance using a receiver operating characteristic (ROC) curve and corresponding area under the curve (AUC)
Our search identified 3,227 potentially relevant articles published since 1997 (
*: Total number of excluded articles is less than the sum of articles excluded by each criterion because most articles failed multiple criteria.
The quality of data reporting on HPAI H5N1 cases was inconsistent. Several variables had missing data, most with homogeneity among the non-missing values; these were not included in the analysis (
*: Variable was excluded from modeling. Each row represents one of 617 human cases; each column represents a variable abstracted from the literature. The color of each cell indicates whether the corresponding variable was missing (dark green) or observed (light green) for the given case.
Demographic and clinical characteristics of the 617 cases are presented in
| Characteristic | Literature search (n = 617) | Reported WHO HPAI H5N1 cases as of Oct 31, 2013 |
| 1997 | 24 (4%) | |
| 1998 | 4 (0.6%) | |
| 2003 | 7 (1%) | 4 (0.6%) |
| 2004 | 53 (9%) | 46 (7%) |
| 2005 | 75 (12%) | 98 (15%) |
| 2006 | 116 (19%) | 115 (18%) |
| 2007 | 90 (15%) | 88 (14%) |
| 2008 | 45 (7%) | 44 (7%) |
| 2009 | 53 (9%) | 73 (11%) |
| 2010 | 48 (8%) | 48 (7%) |
| 2011 | 62 (10%) | 62 (10%) |
| 2012 | 29 (5%) | 32 (5%) |
| 2013 | 10 (2%) | 34 (5%) |
| Missing | 1 (0.1%) | |
| Indonesia | 171 (28%) | 194 (30%) |
| Egypt | 169 (27%) | 173 (27%) |
| Vietnam | 96 (16%) | 125 (19%) |
| China | 50 (8%) | 45 (7%) |
| Cambodia | 36 (6%) | 44 (7%) |
| Hong Kong (SAR, China) | 29 (5%) | |
| Thailand | 27 (4%) | 25 (4%) |
| Turkey | 11 (2%) | 12 (2%) |
| Azerbaijan | 9 (1%) | 8 (1%) |
| Bangladesh | 6 (1%) | 7 (1%) |
| Pakistan | 5 (0.8%) | 3 (0.5%) |
| Iraq | 3 (0.5%) | 3 (0.5%) |
| Laos | 2 (0 3%) | 2 (0.3%) |
| Djibouti | 1 (0.2%) | 1 (0.2%) |
| Myanmar | 1 (0.2%) | 1 (0.2%) |
| Nigeria | 1 (0.2%) | 1 (0.2%) |
| 24.8 (13.7–49.0) | ||
| Female | 331 (54%) | |
| Male | 283 (46%) | |
| Missing | 3 (0.5%) | |
| 18 (6–29) | ||
| Missing | 9 (1%) | |
| Summer | 64 (10%) | |
| Fall | 85 (14%) | |
| Winter | 285 (46%) | |
| Spring | 177 (29%) | |
| Missing | 6 (1%) | |
| 4 (2–6) | ||
| Missing | 242 (39%) | |
| Yes | 356 (58%) | |
| Likely yes | 62 (10%) | |
| No | 20 (3%) | |
| Missing | 179 (29%) | |
| Death | 362 (59%) | 382 (59%) |
| Survival | 245 (40%) | 262 (41%) |
| Missing | 10 (2%) |
Data are frequency (%) or median (first quartile – third quartile).
Percentages are calculated including missing observations.
PCGEH = per capita government expenditure on health.
In bivariate logistic regression models, risk factors for mortality were longer delay to hospitalization, infection not in Egypt, older age, lower PCGEH, likely contact with poultry, female sex, and illness onset during summer months (
| Variable | Survived (n = 245) | Died (n = 362) | Odds ratio (95% CI) | p value coefficient | p value model | c-statistic | |
| 3 (1–5) | 5 (3–6) | 1.31 (1.20, 1.45) | <0.0001 | <0.0001 | 0.70 | ||
| Egypt | 108 (64%) | 61 (36%) | |||||
| Indonesia | 30 (18%) | 141 (82%) | 8.32 [5.08, 13.95] | <0.0001 | <0.0001 | 0.69 | |
| Other | 107 (40%) | 160 (60%) | 2.65 [1.78, 3.96] | <0.0001 | |||
| 0–4 | 89 (74%) | 32 (26%) | 0.14 [0.07, 0.25] | <0.0001 | |||
| >4–18 | 64 (33%) | 128 (67%) | 0.75 [0.42, 1.32] | 0.33 | <0.0001 | 0.65 | |
| >18–25 | 23 (27%) | 61 (73%) | |||||
| >25 | 66 (33%) | 136 (67%) | 0.78 [0.44, 1.35] | 0.38 | |||
| 9.5 (3–27) | 20 (12–30) | 1.03 [1.02, 1.04] | <0.0001 | <0.0001 | 0.64 | ||
| <13.7 | 62 (41%) | 89 (59%) | |||||
| >13.7–24.8 | 31 (20%) | 122 (80%) | 2.74 [1.66, 4.61] | 0.0001 | <0.0001 | 0.63 | |
| >24.8–49.0 | 84 (52%) | 77 (48%) | 0.64 [0.41, 0.998] | 0.05 | |||
| >49.0 | 68 (48%) | 74 (52%) | 0.76 [0.48, 1.20] | 0.24 | |||
| No | 10 (53%) | 9 (47%) | |||||
| Yes | 174 (49%) | 182 (51%) | 1.16 [0.46, 2.99] | 0.75 | <0.0001 | 0.59 | |
| Likely yes | 8 (13%) | 54 (87%) | 7.50 [2.38, 25.15] | 0.0007 | |||
| 35.1 (13.7–49.1) | 23.2 (16.4–38.0) | 0.994 [0.989, 0.998] | 0.007 | 0.004 | 0.59 | ||
| Male | 134 (55%) | 147 (45%) | 0.0007 | 0.0007 | 0.57 | ||
| Female | 110 (41%) | 213 (59%) | 1.77 [1.27, 2.45] | ||||
| Summer | 17 (27%) | 47 (73%) | |||||
| Fall | 37 (44%) | 48 (56%) | 0.47 [0.23, 0.94] | 0.03 | 0.02 | 0.57 | |
| Winter | 106 (38%) | 174 (62%) | 0.59 [0.32, 1.07] | 0.09 | |||
| Spring | 84 (47%) | 93 (53%) | 0.40 [0.21, 0.74] | 0.004 |
Data are presented as medians (IQR) or frequencies (%). Variables are ordered roughly by statistical significance. Row percentages are calculated excluding missing data. P-values were calculated using Wald's z-test for logistic regression coefficients and the likelihood-ratio test for regression models.
PCGEH = per capita government expenditure on health.
The decision tree, trained on all 607 cases with observed mortality, evaluated seven candidate predictor variables: age, PCGEH, country group, delay from symptom onset to hospitalization, sex, contact with poultry, and season. The variables are listed here in descending order of variable importance in the decision tree, a measure based on split quality. The first four were used as splitting variables in the final, pruned tree (
Model was trained on all n = 607 cases with observed mortality. The following variables were candidates for inclusion: age, PCGEH, country, delay to hospitalization, sex, season, contact with poultry.
The first node splits on age, with higher mortality in patients at least 4.5 years of age. In the second level of the tree, young children (<4.5 years) in high-PCGEH settings (≥32.65 USD) are predicted to survive, with the lowest mortality (4%) of all groups. Young children in low-PCGEH settings (<32.65 USD) are predicted to die (57% mortality).
Older patients (≥4.5 years) are further partitioned by country. Older cases in Indonesia are predicted to die, with the highest mortality (84%) of all groups. Older cases not in Indonesia are classified by one final split based on delay to hospitalization: cases with a short delay to hospitalization (<2.5 days) are predicted to survive (33% mortality), while cases hospitalized later after illness onset (≥2.5 days) are predicted to die (65% mortality).
We assessed the decision tree's performance with an ROC curve and corresponding AUC (
ROC curve represents performance of CART model on all cases without missing observations on any model variables (n = 301). Error bars represent bootstrapped 95% confidence intervals for sensitivity-specificity thresholds.
We also performed complete-case logistic regression and multiply-imputed logistic regression including all predictor variables that were candidates for inclusion in the CART model. All models yielded similar results (
We investigated factors associated with increased mortality following HPAI H5N1 virus infection to guide public health messages, resource distribution, and triage of infected individuals. We conducted a systematic search of all available literature describing human cases of HPAI H5N1 virus infection and developed a prognostic decision tree. We find that age, health expenditure, delay from symptom onset to hospitalization, and country are significant predictors of mortality. Additionally, we find that data reporting is incomplete and poorly standardized.
Our finding that HPAI H5N1 mortality is lowest in young children aged 0 to 4.5 years is different from established patterns observed for seasonal influenza in which mortality is high in infants and young children
Not surprisingly, we found that reduced national healthcare expenditure is associated with higher mortality from HPAI H5N1. This relationship is common with many diseases at a country-specific level. We were unable to delineate the complete mechanisms responsible for this finding, but healthcare quality may be a mediator. Maternal mortality, generally considered a sensitive indicator of overall quality and accessibility of healthcare, is widely discrepant across affected countries in our study. For example, maternal mortality per 100,000 live births is 55 in China, 95 in Vietnam, and 440 in Cambodia
We also find that a longer delay from HPAI H5N1 illness onset to hospitalization is associated with higher mortality, a finding previously reported in smaller, geographically restricted datasets
Consistent with WHO cumulative case counts and previously published analyses
We were surprised to find inconsistent case reporting in the literature. While the WHO maintains summary data on worldwide cases of HPAI H5N1, this aggregation provides minimal individual-level demographic and clinical characteristics
Reporting bias may exist; practitioners may consider a diagnosis of HPAI H5N1 only for more severe cases, limiting reports of subclinical or asymptomatic infections in the medical literature. Controversy exists over the extent to which such a bias may inflate HPAI H5N1 mortality estimates. A meta-analysis reported an average seroprevalence of HPAI H5N1 virus antibodies of 1 to 2%, potentially translating into a substantial number of unreported cases worldwide
Potential methodological limitations include variability in surveillance and clinical care, lack of data on antiviral treatment, and time from illness onset to start of antiviral treatment. Additionally, time from onset to hospitalization may not equal time to oseltamivir treatment onset. We noted a high proportion of unreported variables. Complete-case analysis can cause bias and imprecision in regression coefficient estimates, particularly if data are not missing at random. Nevertheless, we performed three analytic techniques with fundamentally different approaches to handling missing data, and they all yielded similar results. Like all classification trees, our model demonstrates some statistical instability
From a policy standpoint, improved recognition of disease (albeit rare) and early delivery of healthcare, especially antiviral treatment, could result in reduced hospitalization costs, decreased morbidity, and lower mortality from HPAI H5N1 virus infection. To facilitate analyses, the idiosyncratic case reporting process our study detected could be greatly improved by widespread adoption of a standardized data collection form, such as an online database. Currently, the WHO receives case report data from officials at Ministries of Health, which collect case report data from local hospitals. A convenient and efficient mode of data collection may enable improved communication at both reporting junctures.
We have established a predictive classification tree model to estimate human HPAI H5N1 mortality based on readily available clinical and demographic predictors: age, delay from symptom onset to hospitalization, country, and PCGEH. Our resulting publicly accessible online algorithm (
(PDF)
Click here for additional data file.
(EPS)
Click here for additional data file.
(PDF)
Click here for additional data file.
(PDF)
Click here for additional data file.
We thank Charlotte Chae for her invaluable assistance in creating graphics. We are also grateful to the many authors whose open-source R packages were indispensable: John Fox and Sanford Weisberg (package “car”); James Honaker, Gary King, and Matthew Blackwell (package “Amelia”); Frank Harrell Jr. (packages “Hmisc” and “rms”); Terry Therneau, Beth Atkinson, and Brian Ripley (package “rpart”); Xavier Robin, Natacha Turck, Alexandre Hainard, Natalia Tiberti, Frédérique Lisacek, Jean-Charles Sanchez, and Markus Müller (package “pROC”); Stef van Buuren and Karin Groothuis-Oudshoorn (package “mice”); and Gregory R. Warnes (package “gplots”).