Resequencing microarrays rapidly identify influenza viruses.
Identification of genetic variations of influenza viruses is essential for epidemic and pandemic outbreak surveillance and determination of vaccine strain selection. In this study, we combined a random amplification strategy with high-density resequencing microarray technology to demonstrate simultaneous detection and sequence-based typing of 25 geographically distributed human influenza virus strains collected in 2004 and 2005. In addition to identification, this method provided primary sequence information, which suggested that distinct lineages of influenza viruses co-circulated during the 2004–2005 season, and simultaneously identified and typed all component strains of the trivalent FluMist intranasal vaccine. The results demonstrate a novel, timely, and unbiased method for the molecular epidemiologic surveillance of influenza viruses.
Influenza viruses are a major cause of respiratory infections in humans and result in substantial illness, death, and economic problems throughout the world. Along with regular seasonal epidemic outbreaks caused by common circulating strains, novel strains emerge sporadically because of reassortment in the segmented influenza RNA genome and have resulted in devastating influenza pandemics (
Automated Sanger/electrophoresis-based sequencing technology has been used as the standard platform for DNA and genome sequencing. Although conventional sequencing produces accurate data, the requirement for knowledge of template sequences and the inability to quickly process multiple targets hinder its practical application in epidemiologic and diagnostic investigations. As an alterative, high-density oligonucleotide resequencing microarrays represent a promising new technology that has been used to rapidly and accurately identify nucleotide sequence variants (
In an attempt to adapt resequencing microarray technology to surveillance and diagnostics, we developed the respiratory pathogen microarray (RPM) version 1 for detection and sequence typing of 20 common respiratory and 6 category A biothreat pathogens known to cause febrile respiratory illness (
Each tiled prototype sequence was selected to have an intermediate level of sequence homology across a group of microbial or viral strains, which allowed for efficient hybridization and unique identification of most or all subtypes of targeted pathogenic species. For each relevant base of a given prototype sequence, the array contains eight 25mer probes (4 sense and 4 antisense). Two of 8 probes represent perfect matches, while the others correspond to possible mismatches at the central (13th) position of the 25mers. The prototype regions targeting influenza viruses were composed of partial sequences from HA genes of influenza A virus subtypes (H1, H3, and H5) and influenza B virus, NA genes of influenza A virus subtypes (N1 and N2) and influenza B virus, and the M genes of influenza A virus (full-length M1 and partial M2) and influenza B virus (
| Gene | Prototype | GenBank accession no. | Tiled region | Length (bp) |
|---|---|---|---|---|
| A/HA1 | A/New Caledonia/20/99 (H1N1) | AJ344014 | 110–808 | 699 |
| A/HA3 | A/Denmark/59/03 (H3N2) | AY531939 | 120–913 | 794 |
| A/HA5 | A/Hong Kong/486/97 (H5N1) | AF102671 | 1106–1629 | 524 |
| A/NA1 | A/Chile/1/83 (H1N1) | X15281 | 4–1363 | 1,360 |
| A/NA2 | A/Panama/2007/99 (H3N2) | AJ457937 | 1–1446 | 1,446 |
| A/M | A/NWS/33 (H1N1) | L25814 | 1–923 | 923 |
| B/HA | B/Yamanashi/166/98 | AF100355 | 269–952 | 684 |
| B/NA | B/Yamagata/16/88 | AY1139081 | 1–896 | 896 |
| B/M | B/Yamagata/16/88 | AF100378 | 1–362 | 362 |
The influenza clinical specimens used in this study were collected through the Department of Defense Global Emerging Infections System during the 2004–2005 influenza season. Influenza throat swab specimens were collected in accordance with the case criteria previously described (
Total RNA amplification from cultured samples using a random (RT-PCR) protocol was performed as previously described (
Purified DNA amplicons were adjusted to 2 μg in 35 μL of EB buffer, mixed with 15.1 μL of fragmentation cocktail buffer (5 μL NEB buffer 4, 5 μL 10 mmol/L Tris, pH 7.8, and 0.1 μL GeneChip fragmentation reagent [3 U/μL], Affymetrix Inc.), and incubated for 10 min at 37°C and 15 min at 95°C. The fragmented products were then biotin labeled with 1.5 μL of Biotin-N6-ddATP (PerkinElmer Life and Analytical Sciences, Boston, MA, USA) and 1 μL of terminal transferase (20 U/μL) (New England Biolabs, Beverly, MA, USA) for 45 min at 37°C and 15 min at 95°C. RPM version 1 arrays were prehybridized with 200 μL of prehybridization buffer (10 mmol/L Tris, pH 7.8, and 0.01% Tween 20) for 15 min at 45°C. After the prehybridization step, 167.5 μL of hybridization cocktail master mix (3 mol/L tetramethylammonium chloride, 10 mmol/L Tris, pH 7.8, 0.01% Tween 20, 0.5 mg/mL bovine serum albumin, 0.1 mg/mL herring sperm DNA [Promega, Madison, WI, USA], 50 pmol/L Oligo B2 [Affymetrix Inc.]) and biotin-labeled DNA fragments were heated for 5 min at 95°C, equilibrated for 5 min at 45°C, and added to RPM version 1. All hybridizations were incubated for 16 h at 45°C in the GeneChip hybridization oven 640 at 60 revolutions per minute. The microarrays were then washed and stained with the GeneChip Fluidics Station 450 and scanned with the GeneChip Scanner 300 according to the GeneChip CustomSeq array protocol.
The hybridization intensities were analyzed with the GeneChip operating software to generate raw image files (.DAT) and simplified image files (.CEL) with intensities assigned to each of the corresponding probe positions. GeneChip DNA analysis software version 3.0 (GDAS), which implements the ABACUS algorithm (
Automated DNA sequencing was performed as previously described (
DNA sequences generated from RPM version 1 were searched against the Influenza Sequence Database (
To assess the performance of RPM version 1 with a real-world clinical isolate set, we tested 25 cultured strains collected from 4 continents during the 2004–2005 influenza season and previously diagnosed by culture and RT-PCR as influenza. One influenza subtype was identified in each tested sample based on the RPM version 1 hybridization profiles and sequence reads shown in
Hybridization images of the respiratory pathogen microarray (RPM) version 1 prototype regions for 3 influenza virus isolates and trivalent FluMist vaccine. A) A/H1N1, B) A/H3N2, C) influenza B, and D) trivalent FluMist vaccine. In A, B, and C, only the influenza-specific tiled prototype regions of RPM version 1 are shown. Hybridization-positive identifications are shown on the right. In D, the image of the entire RPM version when hybridized with FluMist vaccine is shown. The single influenza prototype region that was hybridization negative is denoted on the right. E) Magnification of a portion of profile B showing an example of the primary sequence data generated by the hybridization of randomly amplified targets to the RPM version 1 HA3 probe set. The primary sequence generated can be read from left to right. HA, hemagglutinin; NA, neuraminidase; IQEX, internal positive hybridization control (Affymetrix); M, matrix.
| Sample name | Base call rate† (%) | ||||||
|---|---|---|---|---|---|---|---|
| HA | NA | M | Strain identification from HA | GenBank accession no. | M1‡ | M2§ | |
| A/Colorado/360/05 | 84.4 | 72.7 | 63.2 | A/Nepal/1679/2004 (H3N2) | AY945284 | 0 | 8 |
| A/Qater/2039/05 | 88.4 | 74.0 | 68.5 | A/Nepal/1727/2004 (H3N2) | AY945272 | 0 | 8 |
| A/Guam/362/05 | 87.3 | 75.8 | 63.3 | A/Nepal/1679/2004 (H3N2) | AY945264 | 2 | 10 |
| A/Italy/384/05 | 83.3 | 69.6 | 63.5 | /Nepal/1727/2004 (H3N2) | AY945272 | 2 | 9 |
| A/Turkey/2108/05 | 77.9 | 67.6 | 59.2 | A/Nepal/1664/2004 (H3N2) | AY945265 | 2 | 12 |
| A/Korea/298/05 | 82.7 | 70.5 | 61.7 | A/Nepal/1727/2004 (H3N2) | AY945273 | 4 | 11 |
| A/Japan/1337/05 | 87.5 | 76.7 | 67.4 | A/Malaysia/2256/2004 (H3N2) | ISDN110616 | 4 | 14 |
| A/Japan/1383/05 | 92.1 | 84.9 | 74.8 | A/Malaysia/2256/2004 (H3N2) | ISDN110616 | 4 | 14 |
| A/Ecuador/1968/04 | 87.7 | 75.2 | 58.3 | A/New York/17/2003 (H3N2) | CY001053 | 0 | 4 |
| A/Iraq/34/05 | 84.4 | 72.7 | 65.6 | A/Christchurch/178/2004 (H3N2) | ISDN110530 | 1 | 9 |
| A/Peru/166/05 | 86.9 | 79.0 | 65.9 | A/Macau/103/2004 (H3N2) | ISDN64772 | 6 | 10 |
| A/New York/2782/04 | 82.7 | 68.0 | 63.1 | A/New York/391/2005 (H3N2) | CY002056 | 1 | 9 |
| A/England/400/05 | 88.3 | 55.3 | 61.1 | A/New York/227/2003 (H1N1) | CY002536 | 1 | 10 |
| B/Peru/1324/04 | 75.2 | 83.4 | 89.4 | B/Milano/66/04 | AJ842082 | 1 | 25 |
| B/Peru/1364/04 | 71.1 | 74.5 | 77.5 | B/Milano/66/04 | AJ842082 | 1 | 25 |
| B/Colorado/2597/04 | 81.1 | 84.3 | 85.8 | B/Texas/3/2002 | AY139049 | 4 | 27 |
| B/Japan/1905/05 | 76.2 | 76.5 | 76.6 | B/Texas/3/2002 | AY139049 | 2 | 25 |
| B/Japan/1224/05 | 80.0 | 78.2 | 83.7 | B/Texas/3/2002 | AY139049 | 2 | 25 |
| B/Alaska/1777/05 | 75.0 | 75.9 | 78.1 | B/Texas/3/2002 | AY139049 | 4 | 27 |
| B/England/1716/05 | 80.5 | 81.4 | 85.2 | B/Texas/3/2002 | AY139049 | 2 | 25 |
| B/England/2054 /05 | 81.1 | 80.1 | 78.7 | B/Texas/3/2002 | AY139049 | 1 | 24 |
| B/Hawaii/1990/04 | 51.7 | 82.9 | 83.7 | B/Tehran/80/02¶ | AJ784042 | 4 | 68 |
| B/Hawaii/1993/04 | 47.4 | 79.7 | 83.7 | B/Tehran/80/02¶ | AJ784042 | 4 | 69 |
| B/Arizona/148/04 | 42.4 | 78.2 | 82.5 | B/Tehran/80/02¶ | AJ784042 | 6 | 69 |
| B/Arizona/146/04 | 49.1 | 79.1 | 86.7 | B/Tehran/80/02¶ | AJ784042 | 6 | 69 |
*HA, hemagglutinin; NA, neuraminidase; M, matrix. †No. of base calls generated from the RPM version 1 divided by the length of the tiled probe sequence. ‡No. of mismatches between the actual sequence and the sequence of the top BLAST search hit. §No. of mismatches between the actual sequence and the tiled prototype probe sequence. ¶Influenza B group 2 isolates were identified as B/New York/1/2002 strain (accession no. AF532565) by the conventional sequencing method.
Microarray resequencing data and conventional sequencing data were searched by using the Influenza Sequence Database with the BLAST algorithm. Results for the highest bit scores were taken as strain identifications and are shown in
Based on sequences of HA genes, which are routinely used for genetic and antigenic characterization, microarray strain identifications of all 13 influenza A isolates correlated with identifications from the conventional sequencing method. Although A/H3N2 isolates were sometimes matched with different specific strain sequences from the Influenza Sequence Database based on the top BLAST hits for each isolate, all were redundant representatives of the same A/Fujian/411/02 lineage identified by conventional sequencing. These results indicate that ambiguous calls (Ns) did not affect the accuracy of BLAST identification. At most, only 6 mismatches occurred between the actual sequence of each isolate and sequence of its top BLAST search hit (
Alignment of the HA peptide sequences translated from RPM version 1–obtained DNA sequences for 12 A/H3N2 isolates (
Alignment of hemagglutinin peptide sequences containing an influenza A/H3N2 prototype and the translated sequences from 12 A/H3N2 isolates generated from respiratory pathogen microarray version 1. A, antibody-binding site; TH, antibody-binding site Fujian-like lineage amino acid substitutions threonine and histidine; B, antibody-binding site; D, antibody-binding site. Asterisks indicate conserved amino acids.
The 12 influenza B isolates were classified as belonging to 2 distinct subgroups based on BLAST searches of 3 genes generated from RPM version 1 analysis. The top BLAST hits for the RPM-obtained sequences of the HA gene identified subgroup 1 isolates as either B/Milano/66/04 or B/Texas/3/2002, both of which are B/Shanghai/361/2002-like strains and belong to the B/Yamagata/16/88 lineage. BLAST queries of conventional sequencing data yielded similar identifications for these isolates. Subgroup 2 isolates were identified as B/Tehran/80/02 by microarray and as B/New York/1/2002 by conventional sequencing. The query results of both methods were similar (different identification can be attributed to ambiguous base calls), and all isolates were members of the B/Victoria/2/87 lineage. This lineage is not covered by the 2004–2005 influenza B vaccine (L. Daum, pers. comm.). These results correspond to a Centers for Disease Control and Prevention (Atlanta, GA, USA) influenza activity report documenting that both of the identified influenza B lineages were reported worldwide and that the Yamagata lineage viruses predominated in the 2004–2005 influenza season (
RPM version 1 can differentiate a broad number of variants based on a single-tiled "prototype" probe region without relying on predetermined hybridization patterns (
| Position† | TN‡ | Mismatched nucleotides* | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||
| 25 | T | C | |||||||||||
| 46 | T | A | |||||||||||
| 61 | A | T | |||||||||||
| 62§ | G | A | |||||||||||
| 88 | G | T | |||||||||||
| 189 | G | A | |||||||||||
| 208 | T | C | |||||||||||
| 233§ | G | ||||||||||||
| 244 | G | A | |||||||||||
| 251§ | G | A | A | ||||||||||
| 262 | C | T | |||||||||||
| 274 | G | A | |||||||||||
| 293§ | A | G | |||||||||||
| 299 | G | A | A | A | A | A | A | A | A | A | A | A | A |
| 313§ | G | ||||||||||||
| 351§ | A | G | G | G | |||||||||
| 352 | A | C | C | C | C | ||||||||
| 385 | C | T | |||||||||||
| 393§ | A | ||||||||||||
| 407 | T | C | |||||||||||
| 429§ | A | G | |||||||||||
| 434§ | A | ||||||||||||
| 446§ | T | C | |||||||||||
| 466 | C | T | |||||||||||
| 469 | C | T | |||||||||||
| 473§ | G | A | |||||||||||
| 478 | T | A | A | ||||||||||
| 479§ | G | T | T | ||||||||||
| 483§ | G | A | A | ||||||||||
| 493 | C | T | |||||||||||
| 511 | A | ||||||||||||
| 559 | C | T | |||||||||||
| 564§ | A | G | |||||||||||
| 584§ | G | A | |||||||||||
| 571 | A | G | G | ||||||||||
| 593§ | G | A | A | A | A | A | A | A | |||||
| 596§ | T | C | C | C | C | C | C | C | C | C | C | C | |
| 602§ | A | C | |||||||||||
| 646 | T | C | |||||||||||
| 652 | T | A | |||||||||||
| 698§ | C | A | |||||||||||
| 734§ | C | T | |||||||||||
*Mismatched nucleotides obtained from comparison between the tiled probe sequence and the conventional sequence. Mismatched nucleotides identified by the RPM version 1 are shown in
Unrooted phylogenetic analysis of the hemagglutinin 1 (HA1) gene of A) 11 influenza A/H3N2 isolates and B) 12 influenza B isolates compared with vaccine and reference strains. All clinical isolates are available from GenBank under accession nos. DQ265706–DG265730. *denotes the 2005–2006 influenza A/H3N2 and influenza B vaccine strains.
Nearly every isolate was shown to have unique base mutations, many of which resulted in amino acid substitutions. Identification of these mutations reaffirms common knowledge that genetic drift is a frequent event during circulation of influenza viruses and that the RPM version 1 gene chip is an effective tool for tracking unique genetic changes within influenza strains.
To test the capability of RPM version 1 to detect multiple pathogens with the random amplification protocol, we analyzed total nucleic acid isolated from trivalent FluMist intranasal vaccine.
| Segment | Base call rate† (%) | Strain identification | GenBank accession no. |
|---|---|---|---|
| A/HA1 | 86.8 | A/New Caledonia/20/99 | AJ344014 |
| A/NA1 | 65.6 | A/New Caledonia/20/99 | AJ518092 |
| A/HA3 | 86.5 | A/Wyoming/3/03 | AY531033 |
| A/NA2 | 78.2 | A/Wyoming/3/03 | AY531034 |
| A/M | 75.9 | A/Ann Arbor/6/60 | M23978 |
| B/HA | 77.1 | B/Jilin/20/2003 | ISDN40908 |
| B/NA | 83.5 | B/Yamagata/1246/2003‡ | AB120256 |
| B/M | 78.4 | B/Ann Arbor/1/66 | M20175 |
*HA, hemagglutinin; NA, neuramindase; M, matrix. †No. of base calls generated from the RPM version 1 divided by the length of the tiled probe sequence. ‡The NA sequence of B/Jilin/20/2003 strain was not available in the influenza database.
Because of the relative ease of transmission of respiratory pathogens, tremendous pressure exists to develop rapid and sensitive tools to identify them. The surveillance of influenza virus outbreaks requires identification not only on the species level but also on the subtype or strain level. Current molecular methods, such as PCR and multiplex PCR, have dramatically improved detection sensitivities and efficiency compared with culture and serologic methods (
Currently, most microarrays used for microbial detection are spotted arrays that use redundant oligonucleotides as independent probes. For these methods, 2 types of probe targets are usually considered. The first are conserved gene sequences such as 16S rRNA and gyrase (
Since the antigen-encoding HA and NA genes are highly variable between different subtypes, sequences specific for HA1, HA3, HA5, NA1, and NA2 were all tiled on RPM version 1 independently so that influenza A H3N2, H1N1, and H5N1 viruses could be identified and resequenced. Further analysis of the generated sequences showed variations between target and prototype sequences and accurately identified tested isolates at the strain level and as members of recognized circulating variants (
Another powerful feature of RPM version 1 is its broad-spectrum detection capability, allowing simultaneous resequencing of dozens of gene targets from multiple pathogens in 1 assay. This capability, however, is dependent on an equally broad-spectrum amplification method. With 66 diverse gene probes tiled on RPM version 1 covering 20 common respiratory and 6 biothreat pathogens (
Correctly identifying 4 different influenza subtypes and their corresponding genes provided a simultaneous demonstration of 3 features of the resequencing microarray: strain identification through pattern recognition, sequence determination, and broad-spectrum capability. Conventional sequencing can determine DNA sequence and has been routinely used for genetic typing in surveillance investigations (
In conclusion, we have combined a random amplification strategy with a resequencing microarray to efficiently and simultaneously detect, type, and genetically characterize geographically diverse influenza viruses. Application of this and similar methods may aid in a better understanding of the incidence, prevalence, and epidemiology of influenza infections and simultaneously allow more rapid identification of epidemic and pandemic outbreaks.
Support was provided by the Air Force Medical Services (Office of HQ USAF Surgeon General) and the Office of Naval Research.
Dr Wang is a molecular biologist at the Naval Research Laboratory, Washington DC. His research interests include molecular diagnosis of infectious diseases, genomics, and bioinformatics.