Understanding the evolution of influenza A viruses in humans is important for surveillance and vaccine strain selection. We performed a phylogenetic analysis of 156 complete genomes of human H3N2 influenza A viruses collected between 1999 and 2004 from New York State, United States, and observed multiple co-circulating clades with different population frequencies. Strikingly, phylogenies inferred for individual gene segments revealed that multiple reassortment events had occurred among these clades, such that one clade of H3N2 viruses present at least since 2000 had provided the hemagglutinin gene for all those H3N2 viruses sampled after the 2002–2003 influenza season. This reassortment event was the likely progenitor of the antigenically variant influenza strains that caused the A/Fujian/411/2002-like epidemic of the 2003–2004 influenza season. However, despite sharing the same hemagglutinin, these phylogenetically distinct lineages of viruses continue to co-circulate in the same population. These data, derived from the first large-scale analysis of H3N2 viruses, convincingly demonstrate that multiple lineages can co-circulate, persist, and reassort in epidemiologically significant ways, and underscore the importance of genomic analyses for future influenza surveillance.
Evolution of the flu virus is analyzed via genomic phylogeny; humans are found to provide a reservoir of antigenic variability implicit in flu adaptation and virulence.
Influenza A viruses are negative-strand RNA viruses of the Family Orthomyxoviridae that infect a wide variety of warm-blooded animals, including domestic and wild birds and mammals (e.g., humans, pigs, and horses). The natural reservoir for influenza virus is thought to be wild waterfowl, and genetic material from avian strains episodically emerges in strains infectious to humans. These human viruses continually circulate in yearly epidemics (mainly during the winter months in temperate climates), and antigenically novel strains emerge sporadically as pandemic viruses [
While the risk of pandemic influenza poses a significant public health concern, inter-pandemic or epidemic influenza remains a major cause of morbidity and mortality. The influenza A surface glycoprotein hemagglutinin (HA) protein is under selective pressure for change in order to evade the host's immune system [
The importance of predicting the emergence of new circulating influenza strains for subsequent annual vaccine development cannot be underestimated [
A number of retrospective studies have been performed using partial HA gene sequences to understand, and sometimes predict, the evolution of human H3N2 strains [
Despite the wealth of data on the molecular evolution of influenza viruses, how the entire genome of influenza A virus evolves during epidemic years is unclear, particularly as past sample sizes have been inadequate. While antigenic drift of HA is clearly of vital importance in the survival of an influenza strain, other factors, including HA receptor binding specificity [
To this end the National Institute of Allergy and Infectious Diseases of the National Institutes of Health has funded the Influenza Genome Sequencing Project with several partners [
Three major clusters of sequences were apparent in phylogenetic trees of the 156 complete genomes of H3N2 influenza A viruses sampled from New York State. These corresponded to particular influenza seasons (winter months): (a) 1999–2000, (b) 2001–2002 and 2002–2003 together (although only five members of latter season are present in these data), and (c) 2003–2004 (
To investigate the evolutionary history of the outlier viruses in more detail we inferred phylogenetic trees for each of the eight individual gene segments (
Two more major phylogenetic displacements suggestive of reassortment involving other segments were similarly identified. First, isolate A/New York/11/2003, which fell within clade A in seven of the gene trees (including HA), clustered with clade B viruses in PB2. Consequently, isolate A/New York/11/2003 represents a reassortment of two segments between clades A and B. Second, isolate A/New York/182/2000, which clustered with the main set of viruses sampled during the 1999–2000 season in most of the gene trees, was very closely related to the divergent A/New York/137/1999 and A/New York/138/1999 isolates in PA and M1, although the high degree of genetic similarity among all viruses in M1 precludes a further analysis of reassortment in this case.
To determine the direction of the reassortment events in HA, we inferred phylogenetic trees of larger datasets comprising the New York State isolates and representatives of the other human and swine H3N2 viruses sampled during the same time period. Because sequences from the core genes have only been sporadically collected, this analysis necessarily focused on HA and NA. As expected from the phylogenetic analysis of the New York State viruses, the distinction between clades A, B, and C was apparent in the NA gene tree (
A very different evolutionary history was revealed in HA. In this case, clade A of the New York State viruses expanded to contain the majority of viruses sampled after 2002 and from a variety of locations (Asia, Australasia, Europe, and North America), as well as a number of Asian viruses from 2002 (
Because both clade A and clade B contain viruses sampled on a near global basis, it is important to determine possible phenotypic differences between them.
Our analysis of whole genomes of H3N2 influenza A viruses sampled during 1999–2004 has identified two key evolutionary patterns. First, although the majority of viruses isolated after 2002 fall into a single phylogenetic group (clade A), multiple, co-circulating viral lineages are present at particular time points. The genetic diversity of influenza A virus is therefore not as restricted as previously suggested, particularly when genes other than that encoding HA are analyzed. This co-circulation of lineages is most apparent with the identification of three clades of H3N2 viruses that appear to infect the same populations until 2002, after which they acquired a common HA gene through reassortment. Second, and more dramatically, these multiple, co-circulating lineages may have complex genealogical histories and interact through reassortment. Indeed, we have documented two reassortment events involving the HA gene of clade B: one in which it was acquired by the clade A viruses and another in which it was independently acquired by those isolates assigned to clade C. Two further reassortment events involving the PB2 and PA genes were also evident from our phylogenetic analysis. Given that we are only able to reliably detect reassortment when it is associated with major changes in tree topology, it is likely that reassortment among closely related lineages is also commonplace in influenza A viruses.
Reassortment between influenza A viruses has been described in both human and animal viruses [
Reassortant viruses were also described following the re-emergence of the H1N1 subtype in 1977 that did not replace the previously circulating H3N2 viruses. In this case, co-circulation of influenza viruses of both subtypes continued, and co-infection with both subtypes was reported [
Most prior phylogenetic studies of human influenza A have suggested that inter-pandemic evolution may be essentially described as a series of successions by variants of the previous season's dominant strain. These successions are largely determined by strong positive selection acting on the abundance of mutational diversity in the HA of the dominant strain. However, we found that at least four reassortment events occurred among human viruses during the period 1999–2004 and that two of these involved a major change in HA. Recently, Barr et al. independently provided phylogenetic evidence of the clade A–clade B reassortment described here in an analysis of predominantly southern hemispheric influenza A H3N2 isolates collected during the same period [
In the 2003–2004 influenza season, a major drift variant emerged in both the northern and southern hemispheres [
Although data are insufficient for precise determination of the timing of these two critical mutations, the available data are most consistent with these changes occurring in a relatively short time period before the reassortment event. The histidine to threonine change at site 155 and the glutamine to histidine change at site 156 are present in all the clade A reassortant isolates as well as in clade B isolates from 2003–2004, thus suggesting that they occurred prior to the reassortment event. No available clade B isolates prior to 2002–2003 have either of these mutations, and we were able to identify only three “intermediate” isolates from 2002–2003 (A/Kwangju/219/2002, A/Kwangju/243/2002, and A/Cheonnam/340/2002) with the replacement at site 155 but not at site 156. Overall, the data presented here, coupled with those recently reported [
Several questions remain unanswered by our study. Since the HA donated by clade B led to a major expansion of the reassorted clade A, it is uncertain why clade B did not initially out-compete clade A without reassortment. One possibility is that the HA of clade B had an intrinsically higher fitness than other HAs circulating at the same time but was unable to reach a high frequency in the New York population owing to linkage to mutations located in other segments that reduced the overall fitness of this genotype. According to this hypothesis, it was not until it was placed by reassortment into a more favorable genetic background, in this case the clade A viruses, that its fitness advantage was realized. Since clade B itself appeared to proliferate in other regions, it will be useful to analyze whole-genome sequence from these isolates when they are available.
More generally, it is clear that the genotypic basis to viral fitness has not been entirely elucidated. In particular, it is likely that interactions among viral proteins and between viral proteins and host factors play a key role. In this respect it is notable that of the 48 amino acid differences that distinguish the clade B viruses, nine fall in NP and 14 fall in NA (see
In summary, our study clearly demonstrates the utility of whole-genome analyses of influenza A viruses, and further makes clear that additional whole-genome analyses are required to understand fully the evolutionary mechanisms and epidemiological dynamics of this virus. While antigenic variance of HA is still the dominant selective pressure on human influenza A virus evolution, the finding that antigenically novel clades emerge by reassortment among persistent viral lineages rather than via antigenic drift is of major significance for vaccine strain selection.
The influenza virus isolates were collected as part of the diagnostic service provided by the Virus Reference and Surveillance Laboratory at the Wadsworth Center, New York State Department of Health. Viruses were received as part of outbreak investigations, through the reference function of the laboratory, and, since 2001, as part of a sentinel physician influenza surveillance program. Viruses were passaged minimally in primary rhesus monkey kidney cell culture and the RNA extracted from the clarified supernatant. Whole-genome sequence information was derived at the Institute for Genomic Research using methods described elsewhere (E. Ghedin, N. A. Miller, M. Shumway, J. Zaborsky, T. Feldblyum, et al., unpublished data). Use of the diagnostic samples in this study was approved by the New York State Department of Health Institutional Review Board.
Sequence data for 156 complete genomes of influenza A virus (H3N2) sampled from New York State during the period 1999–2004 were downloaded from GenBank (1,248 separate accessions, representing the eight gene segments of 156 individual influenza isolates; GenBank accession numbers available from
Phylogenetic trees were inferred for all of the datasets described above using the maximum likelihood method available in the PAUP* package [
Maximum likelihood phylogenetic trees depicting the evolutionary relationships of the remaining six segments (major coding regions only) from the H3N2 influenza A viruses sampled in New York State during the period 1999–2004 (unique sequences only) and the “background” viruses taken from GenBank or the Los Alamos Influenza Sequence Database. All trees are mid-point rooted for purposes of clarity only, and all horizontal branch lengths are drawn to scale. Bootstrap values are shown for clades A, B, and C. Colors are as in
(715 KB EPS).
Click here for additional data file.
(164 KB DOC).
Click here for additional data file.
(50 KB DOC).
Click here for additional data file.
The authors wish to acknowledge the excellent technical assistance of Sara Griesemer and Matthew Kleabonas. Viruses described in this study collected after 2001 include some isolates collected as part of the Sentinel Physician Influenza Surveillance Program, which is supported by Cooperative Agreement Number U50/CCU223671 from the Centers for Disease Control and Prevention. The work at the Institute for Genomic Research and EG, NM, SLS, and CMF were supported in whole or in part with federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under contract number N01-AI-30071. The contents of this manuscript are solely the responsibility of the authors and do not necessarily represent the official views of the Department of Health and Human Services or the Department of Defense.
cytotoxic T lymphocyte
hemagglutinin
neuraminidase
The maximum likelihood phylogenetic tree is mid-point rooted for purposes of clarity, and all horizontal branch lengths are drawn to scale. Bootstrap values are shown for key nodes. Isolates assigned to clade A (light blue), clade B (yellow), and clade C (red) are indicated, as are those isolates involved in other reassortment events: A/New York/11/2003 (orange), A/New York/182/2000 (dark blue), and A/New York/137/1999 and A/New York/138/1999 (green).
All maximum likelihood phylogenetic trees are mid-point rooted for purposes of clarity only, and all horizontal branch lengths are drawn to scale. Bootstrap values are shown for clades A, B, and C. Colors are as in
The maximum likelihood phylogenetic tree is mid-point rooted for purposes of clarity only, and all horizontal branch lengths are drawn to scale. Bootstrap values are shown for clades A, B, and C. Colors are as in
The maximum likelihood phylogenetic tree is rooted on a divergent set of human and swine viruses for purposes of clarity only, and all horizontal branch lengths are drawn to scale. Bootstrap values are shown for key nodes. Colors are as in
Citation: Holmes EC, Ghedin E, Miller N, Taylor J, Bao Y, et al. (2005) Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol 3(9): e300.