^{*}

Conceived and designed the experiments: TDQ BMG ERU. Performed the experiments: TDQ BMG. Analyzed the data: TDQ BMG. Contributed reagents/materials/analysis tools: TDQ BMG ERU. Wrote the paper: TDQ BMG ERU.

Detection of multiple human papillomavirus (HPV) types in the genital tract is common. Associations among HPV types may impact HPV vaccination modeling and type replacement. The objectives were to determine the distribution of concurrent HPV type infections in cervicovaginal samples and examine type-specific associations. We analyzed HPV genotyping results from 32,245 cervicovaginal specimens collected from women aged 11 to 83 years in the United States from 2001 through 2011. Statistical power was enhanced by combining 6 separate studies. Expected concurrent infection frequencies from a series of permutation models, each with increasing fidelity to the real data, were compared with the observed data. Statistics were computed based on the distributional properties of the randomized data. Concurrent detection occurred more than expected with 0 or ≥3 HPV types and less than expected with 1 and 2 types. Some women bear a disproportionate burden of the HPV type prevalence. Type associations were observed that exceeded multiple hypothesis corrected significance. Multiple HPV types were detected more frequently than expected by chance and associations among particular HPV types were detected. However vaccine-targeted types were not specifically affected, supporting the expectation that current bivalent/quadrivalent HPV vaccination will not result in type replacement with other high-risk types.

Genital HPV is the most common sexually transmitted infection

The current HPV vaccines target the two HR-HPV types (HPVs 16 and 18) associated with 70% of cervical cancers. If, however, types display positive associations to inflate infection rates, broad HPV vaccination coverage may lead to reduction of HPV types not targeted by the vaccine, i.e. “cross-protection” not based on cross-reaction immunity but as a result of reduced fitness of positively associated types. Alternatively, negative associations among types may lead to type replacement of non-vaccine types as competing types targeted by vaccines are reduced

Associations among multiple HPV types have been examined in prior studies, but the conclusions are contradictory

The aim of the present study is to address overall and type-specific HPV associations by taking advantage of a large laboratory database of HPV results obtained using the same validated HPV typing assay. Aggregating multiple study datasets provides greater statistical power in analyzing potential HPV type combinations. We employed a permutation methodology to test first a complete null model of random type association, and then gradually less naïve models with preserved higher orders of data structure

The dataset includes anonymized HPV typing results from 32,245 cervicovaginal samples from six studies of women aged 11 to 83 years conducted between 2001 through 2011 (

Study | 1 | 2 | 3 | 4 | 5 | 6 |

General | Colposcopy | Colposcopy | General | General | General | |

Clinical Specimen Transport Medium™ | Clinical ThinPrep® | Clinical ThinPrep® | Clinical ThinPrep® | Clinical ThinPrep® | Self-Collected Swab | |

Cervical | Cervical | Cervical | Cervical | Cervical | Cervical-Vaginal |

HPV Types per Person | Number of People | k Strata Totals | |||||

6639 | 503 | 175 | 637 | 6547 | 4015 | 18516 | |

1664 | 598 | 592 | 232 | 1901 | 1384 | 6371 | |

965 | 310 | 531 | 110 | 892 | 768 | 3576 | |

477 | 174 | 373 | 74 | 410 | 378 | 1886 | |

225 | 74 | 205 | 37 | 206 | 199 | 946 | |

85 | 23 | 122 | 15 | 107 | 118 | 470 | |

51 | 12 | 49 | 7 | 40 | 71 | 230 | |

21 | 7 | 44 | 4 | 21 | 41 | 138 | |

11 | 3 | 13 | 1 | 8 | 17 | 53 | |

4 | 0 | 10 | 1 | 6 | 11 | 32 | |

3 | 1 | 3 | 0 | 2 | 8 | 17 | |

1 | 0 | 2 | 0 | 0 | 0 | 3 | |

0 | 1 | 1 | 0 | 0 | 1 | 3 | |

0 | 0 | 1 | 0 | 0 | 1 | 2 | |

1 | 0 | 1 | 0 | 0 | 0 | 2 | |

10147 | 1706 | 2122 | 1118 | 10140 | 7012 | 32245 |

All samples were extracted to yield DNA or total nucleic acids (DNA and RNA), and 0.5–1% of the total sample extract was tested for HPV. HPV typing was performed using the Linear Array HPV Genotyping Test (LA, Roche Diagnostics, Indianapolis, IN) according to the manufacturer's protocol. The LA detects 37 HPV types (6, 11, 16, 18, 26, 31, 33, 35, 39, 40, 42, 45, 51, XR(52), 53, 54, 55, 56, 58, 59, 61, 62, 64, 66, 67, 68, 69, 70, 71, 72, 73, 81, 82, 83, 84, 89, IS39). HPV types 33, 35, and 58 have type-specific probes, but HPV 52's probe (XR) also reacts with HPV 33, 35, and 58. Therefore when both a HPV 33, 35, or 58 probe and the XR probe is positive, the LA result is equivocal as to whether HPV 52 is being detected or the probe is just reacting with HPV 33, 35, or 58. Samples with equivocal results for HPV 52 were tested with a quantitative type-specific assay for HPV 52

In order to test the type specificity of the LA for HPVs 56 and 66, high copy numbers of their L1 genes were amplified from plasmids. The primers were designed to flank the LA target region (

To determine the expected number of concurrent infections, the matrix of observed results was randomized in five different ways, depending on the characteristics of the observed data being controlled, using R (

The observed data (

For the subsequent matrix randomizations, the ‘permatswap’ function created a series of matrix permutation models, each with increasing fidelity to the real data, where some of the higher order data structure is preserved during the randomization ^{7} for the ‘burnin’ parameter as this value maximized Bray-Curtis dissimilarity values, indicating effective matrix randomization

Counting occurrences of type combinations, whether in the observed data or in the randomized matrices for a given model, was done in Perl (Active Perl 5.8; ActiveState, Vancouver, BC). A key feature of the counting, whether in observed or permuted data, was that specific type combinations were counted whether or not additional types were present. The results for the randomized matrices were then matched to type combinations observed in the real data.

Results for the Perl scripts above were then read into a Mathematica program to assess statistical significance (Wolfram Research, Champaign IL). First, the expected counts of any given type combination in the permutation models were fit to a Poisson probability density function (pdf; ^{−4}, 10^{−6}, 10^{−8}, etc. were calculated for both the right (observed > expected) and left (observed < expected) tails of the Poisson distribution. Observed counts were then plotted against expected counts for a given permutation model so that type combinations falling outside the boundary value lines could be easily seen. Besides the hypothetical p-value boundaries, actual p-values and Z-scores were computed for each specific type combination using the Poisson fit mean values from the permutation runs. Because the type combination frequencies observed in the 1,000 randomized matrices precisely fit a known and well characterized distribution, p-values <0.001 can be reliably estimated.

Due to the number of HPV type combinations analyzed, the Benjamini and Hochberg false-discovery rate (fdr) was calculated using ‘p.adjust’ in R to control for spurious results

Further details on calculating the statistics are in

All code (R, Perl, or Mathematica) is available upon request.

Of 32,245 subjects, 13,729 were positive for ≥1 HPV type, and 7,358 were positive for multiple HPV types (

^{8} (3.22×10^{4} people/iteration × 1×10^{4} iterations) are expected to have this many concurrent HPV types detected, whereas the observed data contains a continuous distribution of subjects with up to 14 HPV types detected at the same time. Furthermore, infection with 0 and 3 through 14 HPV types occurred more than expected by the null model. In contrast, infection with 1 and 2 HPV types occurred less than expected.

Dots representing observed number of concurrent infections are plotted against boxplots of expected distribution from 10,000 randomized matrices. Boxplots indicate the minimums and maximums, 25% and 75% quartiles, and medians of randomized matrices. (

Because such differences in general trends between the observed data and null model would bias the analysis of specific HPV type associations, these trends were preserved in subsequent randomizations to create continuously less naïve models. This randomization resulted in the distribution of each of the 10,000 permutation matrices exactly matching the observed results (non-strata model;

The permutation models were used to determine which specific HPV type combinations are truly unexpected and which combinations are artifacts of study factors structuring the dataset.

The graphs depict the non-strata (^{−4} and 10^{−8}. The mean expected counts on the x-axis represents the average number of observations from 1,000 randomized matrices; the y-axis shows the actual observed counts for the type combinations. Single observations of 4 HPV type combinations are excluded.

A Observed/Expected (Z-score) for Models. | ||||||

HPV Types | Species | Observations | Non-Strata | Study Strata | k Strata | Study-k Strata |

α6, α6 | 164 | 2.2 (10.4) | 2.1 (9.9) | 1.8 (8.0) | 1.9 (8.1) | |

α5, α5 | 65 | 2.0 (5.5) | 2.0 (5.6) | 1.8 (4.9) | 1.8 (4.8) | |

α5, α5 | 20 | 3.0 (5.2) | 2.9 (5.0) | 2.7 (4.7) | 2.8 (4.8) | |

α9, α9 | 38 | 2.0 (4.5) | 1.7 (3.2) | 2.2 (5.0) | 2.0 (4.3) | |

α9, α9 | 129 | 1.5 (4.8) | 1.4 (4.0) | 1.5 (5.0) | 1.4 (4.0) | |

α5, α5 | 18 | 2.9 (4.7) | 2.7 (4.4) | 2.2 (3.4) | 2.2 (3.4) | |

α9, α7 | 194 | 1.3 (3.7) | 1.3 (3.3) | |||

α9, α7 | 75 | 1.5 (3.3) | 1.5 (3.3) | |||

α3, α3 | 110 | 1.4 (3.7) | 1.4 (3.2) | 1.4 (3.5) | 1.4 (3.2) | |

α10, α3 | 82 | 1.6 (4.2) | 1.5 (3.6) | 1.5 (3.6) | 1.4 (3.1) | |

α3, α3 | 57 | 1.6 (3.5) | 1.6 (3.5) | 1.5 (3.0) | ||

α3, α3 | 84 | 1.5 (3.8) | 1.5 (4.0) | |||

α15, α3 | 11 | 2.6 (3.2) | 3.2 (4.0) | |||

α3, α3 | 29 | 1.7 (2.8) | 1.9 (3.6) | |||

α3, α3 | 133 | 1.3 (3.4) | 1.4 (3.5) | |||

α3, α7 | 63 | 1.5 (3.4) | ||||

α3, α7 | 54 | 1.6 (3.3) | ||||

α3, α3 | 90 | 1.4 (3.3) | 1.4 (3.0) | |||

α7, α3 | 15 | 2.1 (2.9) | ||||

α15, α3 | 22 | 1.9 (2.9) | 1.8 (2.8) | |||

α9, α3 | 64 | 1.4 (2.8) | ||||

α3, α3 | 35 | 1.6 (2.8) | ||||

α3, α15 | 24 | 1.7 (2.8) | ||||

α9, α7 | 224 | 1.2 (2.7) | ||||

α6, α3 | 137 | 1.4 (3.7) | 1.3 (3.5) | |||

α10, α3 | 99 | 1.4 (3.0) | 1.4 (3.1) | |||

α5, α6 | 161 | 1.3 (3.3) | 1.3 (3.0) | |||

α9, α3 | 67 | 1.5 (3.1) | ||||

α13, α3 | 108 | 1.3 (3.0) | ||||

α10, α7 | 21 | 1.9 (3.0) | ||||

α1, α6 | 78 | 1.4 (2.9) | ||||

α5, α9 | 13 | 2.2 (2.9) | ||||

α1, α11 | 55 | 1.5 (2.8) |

B Observed/Expected (Z-score) for Models | ||||||

HPV Types | Species | Observations | Non-Strata | Study Strata | k Strata | Study-k Strata |

α9, α9 | 109 | 0.7 (−3.1) | 0.7 (−3.9) | |||

α9, α3 | 23 | 0.4 (−4.1) | 0.5 (−3.7) | |||

α9, α3 | 63 | 0.6 (−3.8) | 0.7 (−3.3) | |||

α9, α7 | 40 | 0.6 (−3.3) | 0.6 (−3.3) | |||

α6, α7 | 23 | 0.5 (−3.1) | 0.5 (−3.1) | |||

α6, α3 | 10 | 0.4 (−3.0) | 0.4 (−3.1) | |||

α9, α3 | 175 | 0.7 (−4.0) | 0.8 (−2.9) | |||

α9, α7 | 58 | 0.6 (−3.5) | 0.7 (−2.9) | |||

α7, α3 | 44 | 0.6 (−3.4) | 0.7 (−2.9) | |||

α5, α3 | 14 | 0.5 (−2.9) | 0.5 (−2.9) | |||

α9, α3 | 42 | 0.6 (−2.9) | 0.6 (−2.8) | |||

α9, α5 | 2 | 0.2 (−2.8) | 0.2 (−2.8) | |||

α6, α3 | 45 | 0.7 (−2.8) | ||||

α9, α3 | 113 | 0.7 (−3.2) | ||||

α9, α3 | 190 | 0.8 (−3.0) | ||||

α9, α3 | 14 | 0.5 (−2.9) | ||||

α9, α10 | 72 | 0.7 (−2.9) | ||||

α9, α7 | 21 | 0.5 (−2.9) | ||||

α9, α15 | 19 | 0.5 (−2.9) | ||||

α7, α3 | 83 | 0.7 (−2.8) | ||||

α7, α3 | 33 | 0.6 (−2.7) | ||||

α7, α3 | 84 | 0.7 (−2.7) | ||||

α7, α7 | 26 | 0.6 (−2.7) | ||||

α9, α3 | 157 | 0.8 (−2.7) |

≤0.05 for (

*HR-HPV types are

The data accommodate a more detailed scrutiny of multiple concurrent infections. The significant results for 2, 3, and 4 type combinations are interdependent (

Because HPV 56 and 66 are strongly associated with each other and are closely related genetically, it is important to exclude artifacts of the assay that could account for the association. We tested the specificity of the genotyping assay for these two types using high copy numbers of type-specific templates. PCR amplified HPV DNA only hybridized to the intended probe band without cross hybridization (

Observations from 6 different HPV genotyping studies were combined to address whether concurrent HPV infections vary from random assortment. Observations were compared with both a null and progressively less naïve models as row and column sums, particular study, and concurrent infection burden variables in the dataset were incorporated. The results of our null model are consistent with other reports of multiple infections being detected more than expected

Other groups have noted excess concurrent HPV type infections compared to null models. The null hypotheses included maximum-likelihood based on an assumed Poisson distribution

Some authors have controlled for risk factors of multiple HPV infections while analyzing type-specific interactions. People infected with HPV are more likely to acquire additional HPV types than are uninfected persons

This current analysis takes a different approach to test HPV type associations. The matrix permutation models presented in this manuscript are based on recommendations as discussed by Schwab, et al.

Another limitation of the previous work is the restriction of each analysis to single study datasets. Prior studies have relatively small specimen numbers compared to the number of possible HPV combinations. Often, HPV testing was performed for a small subset of genital HPV types. Even with multiple HPV types assayed, because of small sample size, statistical analysis was limited to a few HPV types. In the largest previous study with 13,961 women, only 1,451 were HPV positive

The approach we used gives greater confidence in interpreting the biologic significance of the identified type associations. Pairs of HPV types detected more frequently than expected were often from the same species

Fewer and less significant negative associations among HPV types were identified. No pairs of HPV types passed the statistical cutoffs for the stringent k strata and study-k strata models, and no negative associations for any model were found for combinations of 3 and 4 types. HPV 16 was frequently included as one of the types of pairs observed less frequently than expected in the non-strata and study strata models. The types observed less frequently than expected with HPV 16 are candidates for type replacement following reduction of HPV 16 by vaccination. The only HR type in this group was HPV 58; most negative associations with HPV 16 were with LR types. A nonavalent HPV vaccine in clinical trials is formulated to target HPV 6, 11, 16, 18, 31, 33, 45, 52 and 58, and thus HPV 58 would be targeted and at reduced risk of replacing HPV 16. Also, the study strata model indicates a possible negative association between HR types 58 and 59. However no negative associations were significant against the least naïve models, reducing the probability of type replacement.

A strength of the implemented method is that distribution functions were tightly fit to the permutated data to accurately calculate p-values. Typically for permutated data, p-values can only be calculated down to the inverse of the number of permutations. If a HPV combination does not occur even once at of 1000 permutations, the p-value would be <0.001. However because a distribution function can be fit to the permutated data, we can calculate p-values below the permutation limit. Thus p-values down to 10^{−8} and smaller can be calculated without needing 10^{8} permutations. This allows accurate significance testing while conserving computer resources.

Limitations exist with the current analysis. The contribution of each study to the analysis is proportional to the sample size of each study. Thus larger studies contribute more to the final results. A weighting factor for each study stratum would adjust this. We suggest that imposing the structure of the observed data in terms of column and row sums, and stratification, controls for the risk factors of multiple HPV types, e.g. risk factors for increasing HPV exposure. However without these variables in the dataset, this proposition cannot be tested directly.

The methodology presented can be applied to other data. Indeed, the current analysis is an expansion of a previous application. This randomization within strata was first used to test species' associations on archipelago islands

We have presented a novel approach to HPV concurrent infection analysis, which has allowed us to obtain greater statistical power to address the question of HPV type association as well as provide new methods to analyze aggregate datasets.

^{8} = 256 counts are expected in the given permutation model, then roughly 340 observations in the real data are needed to satisfy p = 0.0001. If only 16 counts are expected, the needed number observed counts increases (as a ratio relative to expected) to about 40 for the same level of significance. Rarely observed type combinations with very rare expected values in the permutation models (i.e. candidates for over observed combinations, not under observed) were excluded from the analysis of significance. This is because, for example, it is difficult to determine the true significance or impact of a 4 type combination that is seen in the observed data only once, even if it was expected <0.001 times (for a very large observed/expected ratio) in a database of 30,000 specimens.

(TIF)

Click here for additional data file.

(TIF)

Click here for additional data file.

(TIF)

Click here for additional data file.

(DOC)

Click here for additional data file.

(DOC)

Click here for additional data file.

(DOC)

Click here for additional data file.

(DOC)

Click here for additional data file.

(DOC)

Click here for additional data file.

(DOC)

Click here for additional data file.

(TXT)

Click here for additional data file.

Thank you to Juanita Onyekwuluje, Sonya Patel, and Martin Steinau for assisting with the HPV 56 and 66 Linear Array Assay. Thank you to all those who worked on the original studies without which the data would not be available for this analysis.